数据挖掘 - Sklearn SVM - 如何获取错误预测列表？ - 吾爱随笔录

Sklearn SVM - 如何获取错误预测列表？

数据挖掘 scikit-学习支持向量机多类分类

2021-10-09 03:40:12

我不是专家用户。我知道我可以获得混淆矩阵，但是我想获得一个以错误方式分类的行的列表，以便在分类后对其进行研究。

在 stackoverflow 上，我发现这个Can I get a list of wrong predictions in SVM score function in scikit-learn但我不确定是否理解了所有内容。

这是一个示例代码。

# importing necessary libraries
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
 
# loading the iris dataset
iris = datasets.load_iris()
 
# X -> features, y -> label
X = iris.data
y = iris.target
 
# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
 
# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)
 
# model accuracy for X_test  
accuracy = svm_model_linear.score(X_test, y_test)
 
# creating a confusion matrix
cm = confusion_matrix(y_test, svm_predictions)

要遍历行并找到错误的行，建议的解决方案是：

predictions = clf.predict(inputs)
for input, prediction, label in zip(inputs, predictions, labels):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label)

我不明白什么是“输入”/“输入”。如果我将此代码改编为我的代码，如下所示：

for input, prediction, label in zip (X_test, svm_predictions, y_test):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label)

我得到：

[6.  2.7 5.1 1.6] has been classified as  2 and should be  1

第 6 行是错误的行吗？6.后面的数字是多少？我问这个是因为我在比这个更大的数据集上使用相同的代码，所以我想确保我做的是正确的事情。我没有发布其他数据集，因为不幸的是我不能，但问题是我得到了这样的东西：

  (0, 253)  0.5339655767137572
  (0, 601)  0.27665553856928027
  (0, 1107) 0.7989633757962163 has been classified as  7 and should be  3
  (0, 885)  0.3034934766501018
  (0, 1295) 0.6432561790864061
  (0, 1871) 0.7029318585026516 has been classified as  7 and should be  6
  (0, 1020) 1.0 has been classified as  3 and should be  8

当我计算最后一个输出的每一行时，我得到了测试集的两倍......所以我不确定我正在分析的预测结果列表是否完全错误......我希望已经足够清楚了。

3个回答

以下方法适用于各种分类问题。

使用列表推导查找错误预测的所有索引：

indices = [i for i in range(len(y_test)) if y_test[i] != y_pred[i]]

错误的预测将是：

wrong_predictions = test_dataframe.iloc[indices,:]

您还可以将索引设置为 wrong_predictions 的新列，这样比较方便:)

欢迎来到 SE：数据科学。

这[6. 2.7 5.1 1.6]是输入实例被错误分类的特征。它是您输入功能的一排 X = iris.data。

该消息的意思是：您的 SVM 使用输入特征[6. 2.7 5.1 1.6]来预测标签，并且它预测label=2. 基本事实是label=1。

如果要打印分类错误的行的索引，可以使用

对于枚举（zip（X_test，svm_predictions，y_test））中的row_index，（输入，预测，标签）：
  如果预测！=标签：
    print('Row', row_index, '已分类为', prediction, '应该是', label)

欢迎。

除了 user12075 提到的内容之外，您还可以执行以下操作：

indices = np.arange(y.shape[0])
X_train, X_test, y_train, y_test, idx_train, idx_test = train_test_split(X, y, indices, stratify=y, test_size=0.3,
                                                                         random_state=42)

然后，

for input, prediction, label in zip (indices[idx_test], svm_predictions, y_test):
  if prediction != label:
    print(input, 'has been classified as ', prediction, 'and should be ', label)

其它你可能感兴趣的问题

上一篇如何保存 google colab notebook 输出和变量下一篇在 PySpark 中绘图？