在块上训练模型后,如何保存最终模型?
df = pd.read_csv(, chunksize=10000)
for chunk in df:
text_clf.fit(X_train, y_train)
filename = 'finalized_model.sav'
joblib.dump(text_clf, filename)
# load the model from disk
loaded_model = joblib.load(filename)
保存这样的模型只会给我最后一块训练的模型。我怎样才能避免这种情况并在每个块上训练整个模型?
更新: 大多数现实世界的数据集都是巨大的,不能一次性训练。在对每个数据块进行训练后如何保存模型?
df = pd.read_csv(“an.csv”, chunksize=6953)
for chunk in df:
text = chunk[‘body’]
label = chunk[‘user_id’]
X_train, X_test, y_train, y_test = train_test_split(text, label, test_size=0.3 )
text_clf = Pipeline([(‘vect’, TfidfVectorizer()),
(‘tfidf’, TfidfTransformer()),
(‘clf’, LinearSVC()),
])
text_clf.fit(X_train, y_train)
# save the model to disk
filename = ‘finalized_model.sav’
joblib.dump(model, filename)
以这种方式保存它会给我在整个数据集上训练的模型吗?我希望在每个块上训练模型。有什么帮助吗?