机器算法验证 - 如何帮助基于树的模型进行推断？ - 吾爱随笔录

下面的例子借鉴了forecastxgb作者的博客，基于树的模型本质上是不能外推的，但是肯定有一些方法可以结合树模型（交互因素）的好处和线性模型的趋势外推能力。任何人都可以提供一些想法吗？

我看过一些kaggle解决方案，有人建议使用线性模型预测作为树模型的特征，它可以改善预测结果，但是如何改善外推？

另一个想法是使用xgboost预测线性模型的残差，这对预测有很大帮助。

到底有没有？

library(xgboost)  # extreme gradient boosting
set.seed(134) # for reproducibility
x <- 1:100 + rnorm(100)
y <-   3 + 0.3 * x + rnorm(100)
extrap <- data.frame(x = 101:120 + rnorm(20))

xg_params <- list(objective = "reg:linear", max.depth = 2)
mod_cv <- xgb.cv(label = y, params = xg_params, data = as.matrix(x), nrounds = 40, nfold = 10) 
# choose nrounds that gives best value of root mean square error on the training set
best_nrounds <- which(mod_cv$evaluation_log$test_rmse_mean == min(mod_cv$evaluation_log$test_rmse_mean))
mod_xg <- xgboost(label = y, params = xg_params, data = as.matrix(x), nrounds = best_nrounds)

p <- function(title){
  plot(x, y, xlim = c(0, 150), ylim = c(0, 50), pch = 19, cex = 0.6,
      main = title, xlab = "", ylab = "", font.main = 1)
  grid()
}

predshape <- 1
p("Extreme gradient boosting")
points(extrap$x, predict(mod_xg, newdata = as.matrix(extrap)), col = "darkgreen", pch = predshape)

xgboost 预测结果

mod_lm <- lm(y ~ x)
p("Linear regression")
points(extrap$x, predict(mod_lm, newdata = extrap), col = "red", pch = predshape)

线性模型预测结果