我知道如何使用引导重采样来查找样本内错误或 R2 的置信区间:
# Bootstrap 95% CI for R-Squared
library(boot)
# function to obtain R-Squared from the data
rsq <- function(formula, data, indices) {
d <- data[indices,] # allows boot to select sample
fit <- lm(formula, data=d)
return(summary(fit)$r.square)
}
# bootstrapping with 1000 replications
results <- boot(data=mtcars, statistic=rsq,
R=1000, formula=mpg~wt+disp)
# view results
results
plot(results)
# get 95% confidence interval
boot.ci(results, type="bca")
但是如果我想估计样本外错误(有点类似于交叉验证)怎么办?我可以为每个自举样本拟合一个模型,然后使用该模型来预测每个其他自举样本,然后对这些预测的 RMSE 进行平均吗?