在这里,我以伪代码形式重新陈述我从@Yuanning 的答案和@cbeleites 的评论中收集到的内容。这可能对像我这样的人有帮助。
为了衡量一个确定模型的性能,我们只需要训练和测试集:
function measure_performance(model, full_test_set, k_performance):
subset_list <- divide full_test_set into k_performance subsets
performances <- empty array
for each sub_set in subset_list:
test_set <- sub_set
training_set <- the rest of the full_test_set
model <- train model with training_set
performance <- test model with test_set
append performance to performances
end for each
return mean of the values in peformances
end function
但是如果我们需要做模型选择,我们应该这样做:
function select_model(data, k_select, k_performance):
subset_list <- divide data into k_select subsets
performances <- empty array
for each sub_set in subset_list:
validation_set <- assume that this sub_set is validation set
test_set <- one other random sub_set (Question: How to select test_set)
training_set <- assume remaining as training set
model <- get a model with the help of training_set and validation_set
performance <- measure_performance(model,test_set, k_performance)
end for each
return model with the best performance (for this, performances will be scanned)
end function