数据挖掘 - svm.LinearSVC：较大的 max_iter 数并不总能提高准确度/精度/召回率 - 吾爱随笔录

svm.LinearSVC：较大的 max_iter 数并不总能提高准确度/精度/召回率

数据挖掘 scikit-学习支持向量机监督学习模型选择

2021-10-14 18:42:29

背景：

监督机器学习
数据形状
- 10 多个特征，仅目标 = 1 或 0，100,000 多个样本（因此应该不存在过采样问题）
80% 训练，20% 测试

train_test_split(X_train, Y_train, test_size=0.2)
使用 svm.LinearSVC(max_iter = N ).fit( ) 训练标记数据
- 尚未应用缩放（所有特征值都在 0-100 (float64) 左右）
- 其他参数（例如 c = ）使用默认值

结果：

print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
print("Precision:", metrics.precision_score(y_test, y_pred))
print("Recall:", metrics.recall_score(y_test, y_pred))

问题：

我将 max_iter = 从 1,000 增加到 10,000 和 100,000，但高于 3 的分数并没有显示出递增的趋势。10000 的分数比 1000 和 100000 差。

例如，max_iter = 100,000

Accuracy: 0.9728548424200598
Precision: 0.9669730040206778
Recall: 0.9653096330275229

max_iter = 10,000

Accuracy: 0.9197914270378038
Precision: 0.9886761615689937
Recall: 0.8093463302752294

max_iter = 1,000

Accuracy: 0.9838969404186796
Precision: 0.964741810105497
Recall: 0.9962729357798165

可能是什么原因？
我是否需要测试不同的 max_iter 值并选择最佳性能？例如，使用 GridSearchCV( )

1个回答

当试图找到最佳迭代次数时，可视化增加的迭代如何影响精度通常非常有用（可以识别过度拟合以及何时应该停止拟合）。

# Import libraries used
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC

# Create a template lit to store accuracies
acc = []

# Iterate along a logarithmically spaced ranged
for i in np.logspace(0,5, num = 6):
    # Print out the number of iterations to use for the current loop
    print('Training model with iterations: ', i)
    # Create an SVC algorithm with the number of iterations for the current loop
    svc = SVC(solver = 'lbfgs', multi_class = 'auto', max_iter = i, class_weight='balanced')
    # Fit the algorithm to the data
    svc.fit(X_train, Y_train)
    # Append the current accuracy score to the template list
    acc.append(accuracy_score(Y_test, logreg.predict(X_test)) * 100)

# Convert the accuracy list to a series
acc = pd.Series(acc, index = np.logspace(0,5, num = 6))
# Set the plot size
plt.figure(figsize = (15,10))
# Set the plot title
title = 'Graph to show the accuracy of the SVC model as number of iterations increases\nfinal accuracy: ' + str(acc.iloc[-1])
plt.title(title)
# Set the xlabel and ylabel
plt.xlabel('Number of iterations')
plt.ylabel('Accuracy score')
# Plot the graph
acc.plot.line()
plt.show()

这将生成一个迭代次数以对数方式增加的图形（请注意，尝试使用 np.logspace 创建整数迭代步骤可能需要一些时间）。

如果准确度增加，则继续跟随趋势，如果它停止，可能没有必要浪费你的时间，如果它下降到最高值（你已经过度拟合训练数据）。

其它你可能感兴趣的问题

上一篇LASSO 剩余特征用于不同的惩罚下一篇为什么在 FR（人脸识别）中不首选一般/原始 softmax 损失？