数据挖掘 - 如何从 3 类 A、B、C 的混淆矩阵中计算 F1 分数的范围 - 吾爱随笔录

如何从 3 类 A、B、C 的混淆矩阵中计算 F1 分数的范围

数据挖掘机器学习混淆矩阵数学

2022-02-16 23:42:06

是否有任何支持函数来计算平均 F1 分数范围？

3个回答

from sklearn.metrics import f1_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]


f1_score(y_true, y_pred, average='weighted')

从文档

计算每个标签的指标，并通过支持度（每个标签的真实实例数）找到它们的平均加权值。这会改变“宏观”以解决标签不平衡问题；它可能导致 F 分数不在精确率和召回率之间。

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

您可以使用的 Scikit-learn 中的一个功能是classification_report( docs )。

这是一个例子：

from sklearn.metrics import classification_report

y_true = ["A", "B", "C", "A", "A", "B", "A", "A", "C", "B", "A", "A", "B", "A", "C", "C"]
y_pred = ["A", "B", "C", "A", "B", "C", "C", "B", "C", "B", "A", "A", "B", "C", "C", "C"]

report = classification_report(y_true=y_true, y_pred=y_pred)
print(report)

>>               precision    recall  f1-score   support
>> 
>>            A       1.00      0.50      0.67         8
>>            B       0.60      0.75      0.67         4
>>            C       0.57      1.00      0.73         4
>> 
>>    micro avg       0.69      0.69      0.69        16
>>    macro avg       0.72      0.75      0.69        16
>> weighted avg       0.79      0.69      0.68        16

从中，您可以提取每个班级的 F1 分数。这很有用，因为您可以更详细地查看模型表现不佳的地方。

您还可以查看微观、宏观和加权平均值。

请参阅此 Github Gist。

为方便起见，我从此处的链接中复制粘贴了代码（省略了评论）：

def get_f1_score(confusion_matrix, i):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for j in range(len(confusion_matrix)):
        if (i == j):
            TP += confusion_matrix[i, j]
            tmp = np.delete(confusion_matrix, i, 0)
            tmp = np.delete(tmp, j, 1)

            TN += np.sum(tmp)
        else:
            if (confusion_matrix[i, j] != 0):

                FN += confusion_matrix[i, j]
            if (confusion_matrix[j, i] != 0):

                FP += confusion_matrix[j, i]

    recall = TP / (FN + TP)
    precision = TP / (TP + FP)
    f1_score = 2 * 1/(1/recall + 1/precision)

    return f1_score

当你想计算第一类标签的 F1 时，像这样使用它：get_f1_score(confusion_matrix, 0).

然后，您可以平均所有类的 F1 以获得 Macro-F1。

顺便说一句，这个网站从一个 2X2 混淆矩阵中计算 F1、准确度和几个度量，这很容易。

其它你可能感兴趣的问题

上一篇具有倒数第二个 sigmoid 激活的 Conv1D 层的 Keras 模型，后跟 globalMaxPool 输出 [0,1] 之外的值。为什么？下一篇异常检测 - 机器学习的概念性帮助