数据挖掘 - 混淆矩阵 - 确定 FP FN TP 和 TN 的值 - 吾爱随笔录

混淆矩阵 - 确定 FP FN TP 和 TN 的值

数据挖掘 Python 准确性混淆矩阵

2022-02-17 12:37:41

运行我的代码后，我得到了准确率、精度和召回率的值，我想从这些指标中确定 FP FN TP 和 TN 的值。我尝试使用每个指标的公式来计算它，但我做不到。有没有办法做到这一点？

4个回答

你可以！

诀窍是您实际上知道另外两个关键变量：正面和负面示例的数量（P和N）。然后，您可以使用它们来代数求解混淆矩阵：

$recall=\frac{TP}{TP+FN}=1-\frac{FN}{P}\Rightarrow$

$FN = P(1-recall)$

$recall=\frac{TP}{TP+FN}\Rightarrow (recall)(TP+FN)=TP\Rightarrow TP(1-recall)=FN\Rightarrow$

$TP=\frac{FN}{1-recall}$
or simply: $TP=P-FN$

$accuracy = \frac{TP+TN}{P+N}\Rightarrow (accuracy)(P+N)=TP+TN\Rightarrow$

$TN=(accuracy)(P+N)-TP$

$accuracy = \frac{TP+TN}{TP+TN+FP+FN}\Rightarrow (accuracy)(TP+TN+FP+FN)=TP+TN\Rightarrow$

$FP=\frac{TP+FN}{accuracy}-TP-TN-FN$
or simply: $FP=N-TN$

您应该修改代码以生成混淆矩阵本身。但假设由于某种原因这是不可能的......

一点线性代数在这里会有所帮助。@n1k31t4 是对的，仅给出准确度、精度和召回率，您不能期望重现混淆矩阵：您有四个未知数中的三个方程，这些方程可以表示为线性方程（在未知数中；见下文），所以肯定有无限多的解决方案（但由于非负性要求而变得有限，并且在奇数情况下，由于整数要求而变得很少甚至是唯一的）。

如果您碰巧也知道样本总数（或者可能是其他一些混淆矩阵测量），您可以恢复所有内容。您不需要像@BenjiAlbert 使用的那样同时使用 P 和 N（尽管这会产生更令人愉悦的 IMO 公式）。下面我通过将其他所有内容都放在方面来完成它，但肯定有几种方法可以找到答案。 $TP$

从，我们得到等等 $\text{recall}=\frac{TP}{TP+FN}$ $\frac{1}{\text{recall}} = 1+\frac{FN}{TP}$

$FN = (\frac{1}{\text{recall}}-1)TP$ 。

类似地，从我们得到 $\text{precision}=\frac{TP}{TP+FP}$

$FP = (\frac{1}{\text{precision}}-1)TP$ 。

最后，

$TN = \text{accuracy}\cdot\text{count} - TP$ ,

所以

\begin{aligned} count & = T P + T N + F P + F N \\ = accuracy \cdot count + (\frac{1}{precision} - 1) T P + (\frac{1}{recall} - 1) T P, \end{aligned}

$\begin{align*} \text{count} &= TP+TN+FP+FN \\ &= \text{accuracy}\cdot\text{count} + (\frac{1}{\text{precision}}-1)TP + (\frac{1}{\text{recall}}-1)TP, \end{align*}$ 现在你可以求解 TP：

T P = \frac{(1 - accuracy) \cdot (count)}{\frac{1}{precision} + \frac{1}{recall} - 2}

$TP = \frac{(1 - \text{accuracy})\cdot(\text{count})}{\frac{1}{\text{precision}}+\frac{1}{\text{recall}}-2}$

将其重新代入上述公式即可得出所有其他公式的值。

As others have pointed out, you could retrospectively compute those values if you know enough about the data.

If you literally only know the accuracy, precision and recall then it wouldn't be possible. It would be like someone telling you the answer is 0.79, and asking how it was computed... there are infinitely many ways.

I would suggest looking into the code where those metrics are computed and intercept the raw predictions and labels. You could sum up the values in the confusion matrix (TP, FP, FN) during inference, then just use something like the sklearn.metrics.precision_recall_fscore_support function from Sci-kit Learn.

Depending on what method you are using to get the metrics, there might even already be an argument to the function that will also return the full confusion matrix.

# Calculating FP, FN, TP, and TN using accuracy, precision, and recall
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

y_true = [0, 1, 0, 0, 1, 1, 1, 1]
y_pred = [0, 0, 1, 0, 0, 1, 1, 1]

num_values = len(y_true)
print("Number of observations: {}".format(num_values))

# Calculate Precision
# The precision is the ratio tp / (tp + fp)
precision = precision_score(y_true, y_pred)
print("Precision Score: {}".format(precision))

# Calculate Recall
# The recall is the ratio tp / (tp + fn)
recall = recall_score(y_true, y_pred)
print("Recall Score: {}".format(recall))

# Calculate Accuracy
# The accuracy is the ratio (TP + TN)/(TP + TN + FP + FN)
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy Score: {}".format(accuracy))

# Calculate the number of positive predictions
num_pos_preds = accuracy * num_values
num_neg_preds = num_values - num_pos_preds
print("Number of Positive Predictions: {0} \n"
      "Number of Negative Predictions: {1}".format(num_pos_preds, num_neg_preds))

# Calculate the False Negatives
FN = num_pos_preds * (1 - recall)
print("FN: {0}".format(FN))

# Calculate the True Positives
TP = num_pos_preds - FN
print("TP: {0}".format(TP))

# Calculate the True Negatives
TN = num_pos_preds - TP
print("TN: {0}".format(TN))

# Calculate the False Positives
FP = num_neg_preds - TN
print("FP: {0}".format(FP))

# Verify the results
sk_tn, sk_fp, sk_fn, sk_tp = confusion_matrix(y_true, y_pred).ravel()
print("Verify our results using Sklearn confusion matrix values\n"
      "FN: {0}\n"
      "TP: {1}\n"
      "TN: {2}\n"
      "FP: {3}".format(sk_fn, sk_tp, sk_tn, sk_fp,))

Output:

Number of observations: 8
Precision Score: 0.75
Recall Score: 0.6
Accuracy Score: 0.625
Number of Positive Predictions: 5.0 
Number of Negative Predictions: 3.0
FN: 2.0
TP: 3.0
TN: 2.0
FP: 1.0
Sklearn values
FN: 2
TP: 3
TN: 2
FP: 1

其它你可能感兴趣的问题

上一篇检测单词是«通用英语»单词还是俚语单词下一篇数据科学的研究生学位选择