数据挖掘 - 如果模型性能不佳，逻辑回归确实无法收敛 - 吾爱随笔录

如果模型性能不佳，逻辑回归确实无法收敛

数据挖掘 scikit-学习逻辑回归多类分类收敛

2021-10-15 01:32:05

我有一个多类分类逻辑回归模型。使用一个非常基本的 sklearn 管道，我正在接受对象的净化文本描述并将所述对象分类到一个类别中。

logreg = Pipeline([('vect', CountVectorizer()),
                ('tfidf', TfidfTransformer()),
                ('clf', LogisticRegression(n_jobs=1, C=cVal)),
               ])

最初，我从 C = 1e5 的正则化强度开始，在我的测试集上实现了 78% 的准确率，在我的训练集上实现了近 100% 的准确率（不确定这是否常见）。然而，即使模型达到了合理的精度，我也被警告说模型没有收敛，我应该增加最大迭代次数或缩放数据。

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)

更改 max_iter 没有任何作用，但是修改 C 可以使模型收敛，但会导致精度下降。以下是测试不同 C 值的结果：

--------------------------------------------------------------------------------
C = 0.1
Model trained with accuracy 0.266403785488959 in 0.99mins
maxCoeff 7.64751682657047
aveProb 0.1409874146376454
[0.118305   0.08591412 0.09528015 ... 0.19066049 0.09083797 0.0999868 ]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
C = 1
Model trained with accuracy 0.6291798107255521 in 1.72mins
maxCoeff 16.413911220284994
aveProb 0.4221365866656076
[0.46077294 0.80758323 0.12618175 ... 0.91545935 0.79839096 0.13214606]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 10
Model trained with accuracy 0.7720820189274448 in 1.9mins
maxCoeff 22.719712528228182
aveProb 0.7013386216302577
[0.92306384 0.97842762 0.71936027 ... 0.98604736 0.98845931 0.20129053]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 100
Model trained with accuracy 0.7847003154574133 in 1.89mins
maxCoeff 40.572468674674916
aveProb 0.8278969567537955
[0.98949986 0.99777337 0.94394682 ... 0.99882797 0.99992239 0.28833321]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 1000
Model trained with accuracy 0.7796529968454259 in 1.85mins
maxCoeff 72.19441171771533
aveProb 0.8845385182334065
[0.99817968 0.99980068 0.98481744 ... 0.9999964  0.99999998 0.36462353]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 10000
Model trained with accuracy 0.7757097791798108 in 1.88mins
maxCoeff 121.56900229473293
aveProb 0.9351308553465546
[0.99994777 0.99999677 0.98521023 ... 0.99999987 1.         0.48251051]
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
(no converge)
C = 100000
Model trained with accuracy 0.7785488958990536 in 1.84mins
maxCoeff 160.02719692775156
aveProb 0.9520556562102963
[0.99999773 0.99999977 0.98558839 ... 0.99999983 1.         0.54044361]
--------------------------------------------------------------------------------

如您所见，模型训练仅在 1e-3 到 1 之间的 C 值处收敛，但未达到较高 C 值不收敛时所见的准确度。

更新：这里是 C = 1 和 C = 1e5 的学习曲线。正如我之前提到的，训练曲线似乎总是 1 或接近 1 (0.9999999)，C 值很高并且没有收敛，但是在优化收敛的 C = 1 的情况下，情况看起来要正常得多。这对我来说似乎很奇怪......

C = 1，收敛

C = 1e5，不收敛

这是测试不同求解器的结果

--------------------------------------------------------------------------------
Solver = newton-cg
Model trained with accuracy 0.7810725552050474 in 6.23mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
Solver = lbfgs
Model trained with accuracy 0.7847003154574133 in 1.93mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Solver = liblinear
Model trained with accuracy 0.7779179810725552 in 0.27mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  "the coef_ did not converge", ConvergenceWarning)
Solver = sag
Model trained with accuracy 0.7818611987381704 in 0.47mins
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  "the coef_ did not converge", ConvergenceWarning)
Solver = saga
Model trained with accuracy 0.782018927444795 in 0.54mins
--------------------------------------------------------------------------------

这是常见的行为吗？基于这种行为，任何人都可以判断我是否以错误的方式进行此操作？

2个回答

我经常LogisticRegression“不收敛”但相当稳定（意味着系数在迭代之间变化不大）。

也许有一些多重共线性会导致系数发生实质性变化，而不会实际影响许多预测/分数。

另一种可能性（似乎是这种情况，感谢您的测试）是您在训练集上获得了近乎完美的分离。在不受惩罚的逻辑回归中，线性可分数据集不会有最佳拟合：系数将膨胀到无穷大（将概率推到 0 和 1）。当您添加正则化时，它可以防止那些巨大的系数。因此，对于较大的值C，即很少的正则化，您仍然会得到较大的系数，因此收敛可能会很慢，但部分收敛的模型在测试集上可能仍然非常好；而使用大的正则化，你会得到更小的系数，并且在训练集和测试集上的表现都更差。

如果您担心不收敛，您可以尝试增加n_iter（更多）、增加tol、更改solver或缩放功能（尽管使用 tf-idf，我认为这没有帮助）。

我会寻找C能给你带来好的结果的最大的，然后试着让它与更多的迭代和/或不同的求解器收敛。

感谢@BenReiniger 的建议，我将逆正则化强度从 C = 1e5 降低到 C = 1e2。这允许模型在测试集中收敛、最大化（基于 C 值）精度，而 max_iter 仅从 100 -> 350 次迭代增加。

下面的学习曲线仍然显示出非常高（不是 1）的训练准确度，但是我的研究似乎表明这在高维逻辑回归应用程序中并不少见，例如基于文本的分类（我的用例）。

“当你有一个高维数据集时，在训练中得到一个完美的分类是很常见的。这样的数据集在基于文本的分类、生物信息学等中经常遇到。”

其它你可能感兴趣的问题

上一篇处理分类特征中的缺失数据下一篇验证时是否应该关闭标签平滑？