数据挖掘 - XGBoost：量化特征重要性 - 吾爱随笔录

XGBoost：量化特征重要性

数据挖掘 Python xgboost 预测重要性

2021-10-11 13:41:56

我需要量化模型中特征的重要性。但是，当我使用 XGBoost 执行此操作时，我会得到完全不同的结果，具体取决于我使用的是变量重要性图还是特征重要性。

例如，如果我使用model.feature_importances_与xgb.plot_importance(model)我得到不对齐的值。据推测，特征重要性图使用了特征重要性，但 numpy 数组feature_importances并不直接对应于plot_importance函数返回的索引。

这是情节的样子：

但这是model.feature_importances_给出完全不同的值的输出：

array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.00568182,  0.        ,  0.        ,  0.        ,
        0.13636364,  0.        ,  0.        ,  0.        ,  0.01136364,
        0.        ,  0.        ,  0.        ,  0.        ,  0.07386363,
        0.03409091,  0.        ,  0.00568182,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.00568182,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.00568182,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.01704546,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.05681818,  0.15909091,  0.0625    ,  0.        ,
        0.        ,  0.        ,  0.10227273,  0.        ,  0.07386363,
        0.01704546,  0.05113636,  0.00568182,  0.        ,  0.        ,
        0.02272727,  0.        ,  0.01136364,  0.        ,  0.        ,
        0.11363637,  0.        ,  0.01704546,  0.01136364,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ], dtype=float32)

如果我只是尝试获取功能 81 ( model.feature_importances_[81])，我会得到：0.051136363. 然而model.feature_importances_.argmax()回报72。

所以这些值彼此不对应，我不确定该怎么做。

有谁知道为什么这些值不一致？

1个回答

在xgboost 0.7.post3：

XGBRegressor.feature_importances_返回总和为 1 的权重。
XGBRegressor.get_booster().get_score(importance_type='weight')返回拆分中特征的出现次数。如果您将这些出现除以它们的总和，您将得到第 1 项。除此之外，将排除重要性为 0 的特征。
xgboost.plot_importance(XGBRegressor.get_booster())绘制第 2 项的值：拆分中出现的次数。
XGBRegressor.get_booster().get_fscore()是相同的XGBRegressor.get_booster().get_score(importance_type='weight')

方法get_score也返回其他重要性分数。检查论据importance_type。

在xgboost 0.81,XGBRegressor.feature_importances_现在默认返回增益，即get_score(importance_type='gain'). 见importance_type中XGBRegressor。

因此，对于重要性分数，最好坚持get_score使用具有显式importance_type参数的函数。

此外，请检查此问题以了解参数的解释importance_type：“weight”、“gain”和“cover”。

其它你可能感兴趣的问题

上一篇词形还原与词干化下一篇为什么我们在 Gradient Boosting 中使用梯度而不是残差？