管道中的 C 参数错误

数据挖掘 机器学习 分类 逻辑回归 多类分类
2022-02-25 17:59:56

我正在尝试为我的数据集构建一个分类器,但同时使用我的 gridsearchCV 和管道时遇到了问题。这是我的代码:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer, StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

imp = Imputer()
scaler = StandardScaler()
clf = LogisticRegression(multi_class='multinomial')

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.3, random_state=42)

pipeline = make_pipeline(imp, scaler, clf)

param_grid = {'penalty':["l1","l2"], 'C':np.arange(0.001, 1, 0.01),
                  'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']}

search = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)

一旦我在搜索中调用该方法,我就会收到以下错误。

search.fit(Xtrain, ytrain)

ValueError: Invalid parameter C for estimator Pipeline(memory=None,
 steps=[('imputer', Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0)), ('standardscaler', StandardScaler(copy=True, with_mean=True, with_std=True)), ('logisticregression', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
      intercept_scaling=1, max_iter=100, multi_class='multinomial',
      n_jobs=1, penalty='l2', random_state=None, solver='liblinear',
      tol=0.0001, verbose=0, warm_start=False))]). Check the list of available parameters with `estimator.get_params().keys()`.

我不确定我的 C 参数如何无效。有谁知道我做错了什么?

2个回答

我已经在这里发布了:

https://stackoverflow.com/questions/43366561/use-sklearns-gridsearchcv-with-a-pipeline-preprocessing-just-once/55401454#55401454

假设你有这个管道:

classifier = Pipeline([
    ('vectorizer', CountVectorizer(max_features=100000, ngram_range=(1, 3))),
    ('clf', RandomForestClassifier(n_estimators=10, random_state=SEED, n_jobs=-1))])

然后,在指定参数时,您需要包含您用于估算器的这个“clf_”名称。所以参数网格将是:

params={'clf__max_features':[0.3, 0.5, 0.7],
        'clf__min_samples_leaf':[1, 2, 3],
        'clf__max_depth':[None]
        }

原因是你在做网格搜索pipeline,但 sklearn.pipeline.Pipeline没有带参数C因此错误消息告诉你Invalid parameter C for estimator Pipeline

解决方案:在您的上进行网格搜索,clf因为sklearn.linear_model.LogisticRegression确实需要参数penalty,Csolver. 在其他地方建立你的管道。