数据挖掘 - 如何用混淆矩阵计算分类准确率？ - 吾爱随笔录

如何用混淆矩阵计算分类准确率？

数据挖掘分类准确性

2022-03-10 11:00:01

我有训练和测试数据，如何使用混淆矩阵计算分类精度？谢谢

@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

火车：

1   sunny       hot     high    FALSE   no
2   sunny       hot     high    TRUE    no
3   overcast    hot     high    FALSE   yes
4   rainy       mild    high    FALSE   yes
5   rainy       cool    normal  FALSE   yes
6   rainy       cool    normal  TRUE    no
7   sunny       cool    normal  FALSE   yes
8   rainy       mild    normal  FALSE   yes
9   sunny       mild    normal  TRUE    yes
10  overcast    mild    high    TRUE    yes
11  overcast    hot     normal  FALSE   yes
12  rainy       mild    high    TRUE    no

测试：

overcast    cool    normal  TRUE    yes
sunny       mild    high    FALSE   no

发现的规则：

(humidity,normal), (windy,FALSE) -> (play,yes) [Support=0.33 , Confidence=1.00 , Correctly Classify= 4, 8, 9, 12]
(outlook,overcast) -> (play,yes) [Support=0.25 , Confidence=1.00 , Correctly Classify= 2, 11]
(outlook,rainy), (windy,FALSE) -> (play,yes) [Support=0.25 , Confidence=1.00 , Correctly Classify= 3]
(outlook,sunny), (temperature,hot) -> (play,no) [Support=0.17 , Confidence=1.00 , Correctly Classify= 0, 1]
(outlook,sunny), (humidity,normal) -> (play,yes) [Support=0.17 , Confidence=1.00 , Correctly Classify= 10]
(outlook,rainy), (windy,TRUE) -> (play,no) [Support=0.17 , Confidence=1.00 , Correctly Classify= 5, 13]

2个回答

混淆矩阵是预测值与真实观察值的交叉表，（测试）准确性是正确预测的经验率。所以在这种情况下，你需要

预测测试集的“播放”属性。（目前您没有一种方法来预测您的第二个测试用例，因此为了论证，我们假设您的模型会为阳光示例预测是。
以下跟踪您的预测的方法被称为混淆矩阵。顶部标签是预测的

               预料到的
         +----------------+
         ¦ ¦ 是 ¦ 没有 ¦
服务 ¦ 是 ¦ 1 ¦ 1 ¦
         ¦ 没有 ¦ 0 ¦ 0 ¦
         +----------------+

这里第一个 1 来自您的第一个测试用例，第二个 1 来自错误分类的第二个测试用例。

计算精度，

准确度 = (# 正确预测)/(# 总预测) = 1 / 2 = .50。

它是对测试对象进行分类：“在分类中，令 R 为生成规则的集合，T 为训练数据。该方法的基本思想是在 R 中选择一组高置信度的规则来覆盖 T。在对测试进行分类时对象，匹配测试对象条件的规则集中的第一个规则对其进行分类。此过程确保只有排名最高的规则对测试对象进行分类。

假设 1 个测试用例是（阴天、凉爽、正常、真）。从上到下查看规则，看看是否有任何条件匹配。例如，第一条规则测试 Outlook 功能。值不匹配，因此规则不匹配。继续下一条规则。等等。在这种情况下，规则 2 匹配测试用例，并且 play 变量的分类为“是”。第二个测试用例分类错误。

谢谢

其它你可能感兴趣的问题

上一篇从字符串中解析数据下一篇我可以在 R 中使用哪些包或函数来绘制这样的 3D 数据？