数据挖掘 - 如何根据精度和拟合设置机器学习算法的优先级以进行四元分类 - 吾爱随笔录

澳大利亚的降雨分类

在此背景下，将使用sklearn分类算法，即：

逻辑回归分类（参数）决策树分类（非参数）随机森林分类（非参数） K-最近邻（KNN）分类（非参数）

训练测试拆分为 80-20

准确度分数：

Logistic Regression Test Score     0.854062
Logistic Regression Train Score    0.853797

Decision Tree Test Score           0.795838
Decision Tree Train Score          1.000000

Random Forest Test Score           0.858269
Random Forest Train Score          0.999978

K-Nearest Neighbour Test Score     0.817180
K-Nearest Neighbour Train Score    0.831138

`Null accuracy score: 0.7815`

逻辑回归在没有overfitting. 但是如果看准确率，随机森林的准确率会更好。

如何检查underfitting。

混淆矩阵

中华民国 AUC :

ROC AUC For LR : 0.8742

ROC AUC For DT : 0.7072

ROC AUC For RF : 0.8883

ROC AUC For KNN : 0.7928

分类指标

Classification accuracy : 0.8583
Classification error : 0.1417
Precision : 0.9586
Recall : 0.8726
True Positive Rate : 0.8726
False Positive Rate : 0.2286
Specificity : 0.7714

逻辑回归

             precision    recall  f1-score   support

          No       0.88      0.95      0.91     17650
         Yes       0.74      0.52      0.61      4935

    accuracy                           0.85     22585
   macro avg       0.81      0.73      0.76     22585
weighted avg       0.85      0.85      0.84     22585

决策树

           precision    recall  f1-score   support

          No       0.87      0.86      0.87     17650
         Yes       0.53      0.55      0.54      4935

    accuracy                           0.80     22585
   macro avg       0.70      0.71      0.70     22585
weighted avg       0.80      0.80      0.80     22585

随机森林分类

             precision    recall  f1-score   support

          No       0.87      0.96      0.91     17650
         Yes       0.77      0.50      0.61      4935

    accuracy                           0.86     22585
   macro avg       0.82      0.73      0.76     22585
weighted avg       0.85      0.86      0.85     22585

K-最近邻（KNN）分类

              precision    recall  f1-score   support

          No       0.84      0.95      0.89     17650
         Yes       0.66      0.34      0.45      4935

    accuracy                           0.82     22585
   macro avg       0.75      0.64      0.67     22585
weighted avg       0.80      0.82      0.79     22585

应仅对 Logistic 或对所有四个进行评估。