xgboost
我为回归实现了自定义目标和指标。为了查看我是否正确执行此操作,我从二次损失开始。实施似乎运作良好,但我无法从标准"reg:squarederror"
目标重现结果。
问题:
我想知道我目前的方法是否正确(尤其是一阶和二阶梯度的实现)?如果是这样,差异的可能原因是什么?
梯度和 Hessian 定义为:
grad <- 2*(preds-labels)
hess <- rep(2, length(labels))
最小示例(在 R 中):
library(ISLR)
library(xgboost)
library(tidyverse)
library(Metrics)
# Data
df = ISLR::Hitters %>% select(Salary,AtBat,Hits,HmRun,Runs,RBI,Walks,Years,CAtBat,CHits,CHmRun,CRuns,CRBI,CWalks,PutOuts,Assists,Errors)
df = df[complete.cases(df),]
train = df[1:150,]
test = df[151:nrow(df),]
# XGBoost Matrix
dtrain <- xgb.DMatrix(data=as.matrix(train[,-1]),label=as.matrix(train[,1]))
dtest <- xgb.DMatrix(data=as.matrix(test[,-1]),label=as.matrix(test[,1]))
watchlist <- list(eval = dtest)
# Custom objective function (squared error)
myobjective <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
grad <- 2*(preds-labels)
hess <- rep(2, length(labels))
return(list(grad = grad, hess = hess))
}
# Custom Metric
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
u = (preds-labels)^2
err <- sqrt((sum(u) / length(u)))
return(list(metric = "MyError", value = err))
}
# Model Parameter
param1 <- list(booster = 'gbtree'
, learning_rate = 0.1
, objective = myobjective
, eval_metric = evalerror
, set.seed = 2020)
# Train Model
xgb1 <- xgb.train(params = param1
, data = dtrain
, nrounds = 500
, watchlist
, maximize = FALSE
, early_stopping_rounds = 5)
# Predict
pred1 = predict(xgb1, dtest)
mae1 = mae(test$Salary, pred1)
## XGB Model with standard loss/metric
# Model Parameter
param2 <- list(booster = 'gbtree'
, learning_rate = 0.1
, objective = "reg:squarederror"
, set.seed = 2020)
# Train Model
xgb2 <- xgb.train(params = param2
, data = dtrain
, nrounds = 500
, watchlist
, maximize = FALSE
, early_stopping_rounds = 5)
# Predict
pred2 = predict(xgb2, dtest)
mae2 = mae(test$Salary, pred2)
结果:
MAE=199.6
与标准目标相比,自定义指标产生的结果稍好MAE=203.3
。在提升期间,RMSE 往往会随着自定义目标而降低。
对于自定义目标,RMSE 为:
[1] eval-MyError:599.490030
[2] eval-MyError:560.677996
[3] eval-MyError:527.867686
[4] eval-MyError:498.216760
[5] eval-MyError:472.167415
...
对于标准目标,RMSE 为:
[1] eval-rmse:598.144775
[2] eval-rmse:562.479431
[3] eval-rmse:529.981079
[4] eval-rmse:501.730103
[5] eval-rmse:479.081329