数据挖掘 - 实施了提前停止但遇到了错误 SGDClassifier: Not applicable error in sklearn - 吾爱随笔录

实施了提前停止但遇到了错误 SGDClassifier: Not applicable error in sklearn

数据挖掘 scikit-学习正则化

2022-02-07 14:17:07

下面是早期停止的更简单的实现，我偶然发现了这本书并想尝试一下。

# Implement SGD Classifier

sgd_clf =   SGDClassifier(random_state=42,
                          warm_start=True,
                          n_iter=1,
                          learning_rate='constant',
                          eta0=0.0005)

minimum_val_error = float('inf')
best_epoch = None
best_model = None

for epoch in range(1000):
  sgd_clf.fit(X_train_scaled,y_train)
  predictions = sgd_clf.predict(X_val_scaled)
  error = mean_squared_error(y_val,predictions)
  if error < minimum_val_error:
    minimum_val_error = error
    best_epoch = epoch
    best_model = clone(sgd_clf)

执行上述代码段后，最佳模型和最佳时期将存储在变量中best_model。best_epoch因此，为了测试 best_model，我运行了以下语句。

y_test_predictions = best_model.predict(X_test)

但后来我遇到了错误This SGDClassifier instance is not fitted yet

任何有关如何解决此问题的提示都会非常有帮助。谢谢

1个回答

这是因为clone只会复制具有相同参数的估计器，而不是附加数据。因此，它会产生一个新的估计量，该估计量不适合数据。因此，您不能使用它来进行预测。

代替clone，您可以使用pickle或joblib。

1.pickle

import pickle
...

for epoch in range(1000):
    ...
    if error < minimum_val_error:
        best_model = pickle.dumps(sgd_clf)

稍后如果您想使用存储的模型：

sgd_clf2 = pickle.loads(best_model)
y_test_predictions = sgd_clf2.predict(X_test)

2.joblib

您还可以使用joblib, 并将模型存储到磁盘。

from sklearn.externals import joblib
...

joblib.dump(sgd_clf, 'filename.joblib')

使用存储模型

clf = joblib.load('filename.joblib')

其它你可能感兴趣的问题

上一篇Spacy 为 Doc 返回不一致的结果。例子？下一篇如何使用 Pandas 日期时间列计算每年的观察次数？