数据挖掘 - 在使用高度偏斜数据集的 XGB 回归时，哪个损失函数是最好的损失函数？ - 吾爱随笔录

数据挖掘回归 xgboost 损失函数

2022-03-08 16:50:12

对高度倾斜的数据集使用 XGB 回归时，哪个损失函数是最好的损失函数？

数据的偏度非常高。我使用了 XGBoost 和线性回归的目标函数（但数据被转换为对数空间）。它比使用伽马目标函数表现更好。还有其他建议吗？

2个回答

您可以尝试对每个数据点的损失进行加权，这样一些数据点就不会主导损失。

由于这可能是其他人的问题，以下是我的发现结果：

我尝试了两种使用 XGB 回归的选项，它们具有不同的目标函数，包括：

线性回归目标函数（“reg:linear”或“reg:squarederror”）并将目标转换为对数空间
伽马目标函数（“reg:gamma”），对于具有伽马分布的倾斜目标很有用，例如保险索赔严重性。在这种情况下，我没有将目标转换为日志空间。

In my case, option 1 performed better than option 2 (around 15-20%). However, depending on the nature of the data one of them might outperform the other one.

Also, here is one potential option as the objective function "reg:squaredlogerror"

其它你可能感兴趣的问题