下面的例子借鉴了forecastxgb作者的博客,基于树的模型本质上是不能外推的,但是肯定有一些方法可以结合树模型(交互因素)的好处和线性模型的趋势外推能力。任何人都可以提供一些想法吗?
我看过一些kaggle解决方案,有人建议使用线性模型预测作为树模型的特征,它可以改善预测结果,但是如何改善外推?
另一个想法是使用xgboost预测线性模型的残差,这对预测有很大帮助。
到底有没有?
library(xgboost) # extreme gradient boosting
set.seed(134) # for reproducibility
x <- 1:100 + rnorm(100)
y <- 3 + 0.3 * x + rnorm(100)
extrap <- data.frame(x = 101:120 + rnorm(20))
xg_params <- list(objective = "reg:linear", max.depth = 2)
mod_cv <- xgb.cv(label = y, params = xg_params, data = as.matrix(x), nrounds = 40, nfold = 10)
# choose nrounds that gives best value of root mean square error on the training set
best_nrounds <- which(mod_cv$evaluation_log$test_rmse_mean == min(mod_cv$evaluation_log$test_rmse_mean))
mod_xg <- xgboost(label = y, params = xg_params, data = as.matrix(x), nrounds = best_nrounds)
p <- function(title){
plot(x, y, xlim = c(0, 150), ylim = c(0, 50), pch = 19, cex = 0.6,
main = title, xlab = "", ylab = "", font.main = 1)
grid()
}
predshape <- 1
p("Extreme gradient boosting")
points(extrap$x, predict(mod_xg, newdata = as.matrix(extrap)), col = "darkgreen", pch = predshape)
mod_lm <- lm(y ~ x)
p("Linear regression")
points(extrap$x, predict(mod_lm, newdata = extrap), col = "red", pch = predshape)