我想使用以下非对称成本敏感的自定义 logloss 目标函数,它通过 XGBoost 对误报进行更多惩罚,从而避免误报。
我已经计算了这个损失函数的梯度和粗麻布:
我的代码:
def logistic_obj(y_hat, dtrain):
y = dtrain.get_label()
p = 1.0 / (1.0 + np.exp(-y_hat))
grad = 4 * p * y + p - 5 * y
hess = (4 * y + 1) * (p * (1.0 - p))
return grad, hess
def err_rate(y_hat, dtrain):
y = dtrain.get_label()
y_hat = np.clip(y_hat, 10e-7, 1-10e-7)
loss_fn = y*np.log(y_hat)
loss_fp = (1.0 - y)*np.log(1.0 - y_hat)
return 'error', np.sum(-(5*loss_fn+loss_fp))/len(y)
xgb_pars = {'eta': 0.2, 'objective': 'binary:logistic',
'max_depth': 6, 'tree_method': 'hist', 'seed': 42}
model_trn = xgb.train(xgb_pars, d_trn, 10, evals=[(d_trn, 'trn'),
(d_val, 'vld')], obj=logistic_obj, feval=err_rate)
以详细模式运行代码会打印出以下内容。右侧的两列给出了我自己的误差计算函数计算的误差,传递为feval. 我不确定为什么 XGBoost 仍然显示由它自己的目标计算的错误,但问题是它显然没有使用我的更新规则,因为它的错误减少了,但我的自定义错误在五次迭代后开始增加。如果我注释掉该objective指令,它显然默认为 RMSE,这会使事情变得更糟。
[0] trn-error:0.065108 vld-error:0.056749 trn-error:0.782048 vld-error:0.755389
[1] trn-error:0.064876 vld-error:0.056645 trn-error:0.727871 vld-error:0.695685
[2] trn-error:0.064487 vld-error:0.05651 trn-error:0.699920 vld-error:0.662203
[3] trn-error:0.064573 vld-error:0.056553 trn-error:0.691798 vld-error:0.64864
[4] trn-error:0.064484 vld-error:0.056514 trn-error:0.698498 vld-error:0.649974
[5] trn-error:0.064483 vld-error:0.056514 trn-error:0.716450 vld-error:0.662659
[6] trn-error:0.064470 vld-error:0.056507 trn-error:0.742848 vld-error:0.683847
[7] trn-error:0.064466 vld-error:0.056506 trn-error:0.775665 vld-error:0.71153
[8] trn-error:0.064435 vld-error:0.056497 trn-error:0.813440 vld-error:0.744165
[9] trn-error:0.064164 vld-error:0.056393 trn-error:0.854973 vld-error:0.780628