我有一个逻辑回归模型来预测客户流失(0 对 1)。我被要求使用该模型来预测一组历史上的非流失者,删除任何被标记为流失者的人,然后增加一个变量,同时保持其余变量不变,以查看预测如何变化。有趣的是,它在删除第一批预测流失后预测零流失。在初始队列被移除后,增加这个变量似乎没有影响,尽管这个变量在模型中具有最大的特征重要性。这只是按预期运行吗?
这是正在使用的示例代码:
r1_pred = logisticRegr_balanced.predict(dfa)
dfa['Churn Prediction'] = r1_pred
print("Non-Churners: "+str(len(dfa[dfa['Churn Prediction']==0])))
print("Churners: "+str(len(dfa[dfa['Churn Prediction']==1])))
print("Percent Churn: " +str((len(dfa[dfa['Churn Prediction']==1]))/len(dfa['Churn Prediction'])))
结果是:
Non-Churners: 70611
Churners: 19609
Percent Churn: 0.21734648636665926
然后我创建一个只有幸存者的新数据框,并将“客户生活”变量增加 30 天。
dfa_30 = dfa[dfa['Churn Prediction']==0]
dfa_30 = dfa_60.drop('Churn Prediction', axis=1, inplace=False)
dfa_30.CustomerLife = dfa_30.CustomerLife + 30
r30_pred = logisticRegr_balanced.predict(dfa_30)
dfa_30['Churn Prediction'] = r30_pred
print("Non-Churners: "+str(len(dfa_30[dfa_30['Churn Prediction']==0])))
print("Churners: "+str(len(dfa_30[dfa_30['Churn Prediction']==1])))
print("Percent Churn: " +str((len(dfa_30[dfa_30['Churn Prediction']==1]))/len(dfa_30['Churn Prediction'])))
结果是:
Non-Churners: 70611
Churners: 0
Percent Churn: 0.0
该模型是否不再能够预测客户流失,因为它在第一次预测中将所有内容都分类为二进制,所以剩下的都是“永久”幸存者?