如果您使用每个观察的解释,您可以平均(或以其他方式聚合)每个样本中特征的重要性Dealer。
例如,shap用于生成每个观察的解释:
import pandas as pd
from sklearn.linear_model import LogisticRegression
import shap
data = [['Alex',10,13,1,0],['Bob',11,14,12,0],['Clarke',13,15,13,1],['Bob',12,15,1,1]]
df = pd.DataFrame(data, columns=["dealer","x","y","z","loss"])
lr = LogisticRegression()
lr.fit(df[['x', 'y', 'z']], df['loss'])
# Whatever explainer you prefer:
explainer = shap.explainers.Permutation(lr.predict_proba, df[['x', 'y', 'z']])
shap_values = explainer(df[['x', 'y', 'z']])
# get just the explanations for the positive class
shap_values = shap_values[...,1]
shap_df = pd.DataFrame(abs(shap_values.values))
shap_df.columns = ['x_shap', 'y_shap', 'z_shap']
shap_df['dealer'] = df['dealer']
shap_df.groupby('dealer').mean()
生产
| 经销商 |
x_shap |
y_shap |
z_shap |
| 亚历克斯 |
0.260427 |
0.140054 |
0.075176 |
| 鲍勃 |
0.106593 |
0.069035 |
0.091146 |
| 克拉克 |
0.268328 |
0.083706 |
0.085807 |