我正在预测一种疾病,并希望在我的验证和测试集上获得预测值的最高灵敏度分数。
可以使用什么停止指标来优化验证集的敏感度得分?
我有大约 400 个观察结果。响应变量是二进制 (0/1),我有 40 个预测变量。
我当前的设置使用 AUC 作为停止指标。
df <- as.h2o(df)
split <- h2o.splitFrame(data=df, ratios=c(0.6, 0.2)) # split 60, 20, 20%
train <- h2o.assign(split[[1]], "train.hex") # 60%
valid <- h2o.assign(split[[2]], "valid.hex") # 20%
test <- h2o.assign(split[[3]], "test.hex") # 20%
x <- setdiff(names(df), "disease")
y <- "disease"
gbm <- h2o.gbm(
x = x,
y = y,
training_frame = train,
validation_frame = valid,
ntrees = 10000,
learn_rate=0.01,
# Stopping parameters
stopping_rounds = 5, stopping_tolerance = 1e-4, stopping_metric = "AUC",
sample_rate = 0.8,
col_sample_rate = 0.8,
seed = 1234,
nfolds = 50,
score_tree_interval = 10
)
h2o.auc(h2o.performance(gbm, valid = TRUE))