数据框:
id review name label
1 it is a great product for turning lights on. Ashley 1
2 plays music and have a good sound. Alex 1
3 I love it, lots of fun. Peter 0
目的是对文本进行分类;如果评论是关于产品的功能(例如打开灯、音乐)label=1
,否则label=0
。
我正在运行几个 sklearn 模型,看看哪一个效果最好:
# Naïve Bayes:
text_clf_nb = Pipeline([('tfidf', TfidfVectorizer()), ('clf', MultinomialNB())])
# Linear Support Vectors Classifier:
text_clf_lsvc = Pipeline([('tfidf', TfidfVectorizer()), ('clf', LinearSVC(loss='hinge',
penalty='l2', max_iter = 50))])
# SGDClassifier
text_clf_sgd = Pipeline([('tfidf', TfidfVectorizer()), ('clf', SGDClassifier(loss='hinge', penalty='l2',alpha=1e-3, random_state=42,max_iter=50, tol=None))])
#Random Forest
text_clf_rf = Pipeline([('tfidf', TfidfVectorizer()), ('clf', RandomForestClassifier())])
#neural network MLPClassifier
text_clf_mlp = Pipeline([('tfidf', TfidfVectorizer()), ('clf', MLPClassifier())])
问题:如何使用 GridSearchCV 调整模型?到目前为止我所拥有的:
from sklearn.model_selection import GridSearchCV
parameters = {'vect__ngram_range': [(1, 1), (1, 2)],'tfidf__use_idf': (True, False),'clf__alpha': (1e-2, 1e-3) }
gs_clf = GridSearchCV(text_clf_nb, param_grid= parameters, cv=2, scoring='roc_auc', n_jobs=-1)
gs_clf = gs_clf.fit((X_train, y_train))
这会在运行时出现以下错误gs_clf = gs_clf.fit((X_train, y_train))
:
ValueError: Invalid parameter C for estimator Pipeline(memory=None,
steps=[('tfidf',
TfidfVectorizer(analyzer='word', binary=False,
decode_error='strict',
dtype=<class 'numpy.float64'>,
encoding='utf-8', input='content',
lowercase=True, max_df=1.0, max_features=None,
min_df=1, ngram_range=(1, 1), norm='l2',
preprocessor=None, smooth_idf=True,
stop_words=None, strip_accents=None,
sublinear_tf=False,
token_pattern='(?u)\\b\\w\\w+\\b',
tokenizer=None, use_idf=True,
vocabulary=None)),
('clf',
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.
我将不胜感激任何建议。谢谢。