全面披露：

由于 Cross Validated 的流量较低，我对这个问题做了一个半交叉的帖子。一旦我得到两个问题中的任何一个的答案，我会将答案链接回各自的另一个。

tl;博士

对于多类分类器，您能否应用 McNemar 检验来确定两个分类器对相同数据的分类方式是否存在显着差异？还是 McNemar 仅限于 2 类问题？

详细问题

我需要确定一些分类器的预测是否成对显着不同。我发现有几个消息来源提到McNemar很适合这个。示例来源：

示例 1

示例 2

但是，我不确定这些来源是否假定二进制分类器。

现在我想知道我是否可以应用 McNemar 的测试我的多类案例。为了说明，让我举个例子。为此，让我们生成一些随机数据

>>> # number of categories
... k = 4
>>> 
>>> # random data representing the ground truth in k categories
... ground_truth = np.random.randint(0,k,1000)
>>> # random data representing predictions by two different classifiers
... preds1 = np.random.randint(0,k,1000)
>>> preds2 = np.random.randint(0,k,1000)

现在，鉴于这些数据，我可以申请 McNemar 吗？

>>> # binary arrays coding for whether a prediction did match with the ground truth
... results1 = preds1 == ground_truth
>>> results2 = preds2 == ground_truth
>>> 
>>> table = np.bincount(2 * (results1) + (results2), minlength=2*2).reshape(2, 2)
>>> 
>>> print(table)
[[559 186]
 [186  69]]
>>> from statsmodels.stats.contingency_tables import mcnemar
>>> print(mcnemar(table))
pvalue      1.0
statistic   186.0

测试会像这样正确应用吗？还是 McNemar 仅限于 2 类分类器？

顺便说一句，我的课程不平衡。如果这是相关的。

可以应用 McNemar 检验来评估多类模型吗？

全面披露：

tl;博士

详细问题