数据挖掘 - 如何评估模型中新特征的性能？ - 吾爱随笔录

我正在研究一个二进制分类，其中我有 4712 条记录，标签 1 是 1554 条记录，标签 0 是 3558 条记录。

当我尝试基于 6,7 和 8 功能的多个模型时，我看到了以下结果。基于新添加的第 7 或（第 7 和第 8）功能，我仅在其中一个模型（LR scikit和Xgboost）中看到 AUC 改进。

我还看到网上的文章说 AUC 或 F1 分数不是严格的评分规则。我们可以使用log-loss公制，但它只适用于logistic regression. 但我们不能使用log-loss公制Xgboost或RF或SVM对吗？那么，有没有我可以用来比较的通用指标。我在这里错过了什么吗？

这是否意味着新功能正在帮助我们提高性能？但它会降低其他模型的性能吗？

请注意，我将数据拆分为火车数据train and test并对其进行了处理。10 fold CV

那么，我怎么知道这个新添加的第 7 个功能真的有助于提高模型性能呢？

根据答案更新

from statsmodels.stats.contingency_tables import mcnemar
# define contingency table
table = [[808,138],    # here I added confusion matrix of two models together (I mean based on TP in model 1 is added with TP in model 2 etc)
         [52, 416]]
# calculate mcnemar test
result = mcnemar(table, exact=True)
# summarize the finding
print('statistic=%.3f, p-value=%.3f' % (result.statistic, result.pvalue))
# interpret the p-value
alpha = 0.05
if result.pvalue > alpha:
    print('Same proportions of errors (fail to reject H0)')
else:
    print('Different proportions of errors (reject H0)')