我正在开发一个管道来拟合梯度提升分类器的参数,同时也拟合 PCA 模型中的最佳特征数量。这是当前的设置:
pipe = Pipeline([
('reduce_dim', PCA()),
('classify', GradientBoostingClassifier())
])
score = {'f1': 'f1', 'accuracy': 'accuracy'}
N_FEATURES_OPTIONS = [12,13,14,15]
max_dep = [3,4,5,6]
n_est = [50,80,100, 120, 150]
min_samp = [4,5,6,10]
param_grid = [
{
'reduce_dim': [PCA()],
'reduce_dim__n_components': N_FEATURES_OPTIONS,
'classify__n_estimators': n_est,
'classify__max_depth': max_dep,
'classify__min_samples_split':min_samp
}]
reducer_labels = ['PCA']
grid_adc = GridSearchCV(pipe, cv=5, n_jobs=-1, param_grid=param_grid, scoring=score, refit='accuracy')
grid_adc.fit(X_train, y_train)
grid_adc.best_params_
哪个输出:
{'classify__max_depth': 3,
'classify__min_samples_split': 4,
'classify__n_estimators': 50,
'reduce_dim': PCA(copy=True, iterated_power='auto', n_components=12, random_state=None,
svd_solver='auto', tol=0.0, whiten=False),
'reduce_dim__n_components': 12}
现在我想通过交叉验证来验证和评分模型。如果我运行以下命令:
cross_val_score(grid, X_train, y_train, cv=5, n_jobs=-1)
我从管道中拟合的 PCA 是否会延续到 cross_val_score 函数?如果是这样,cross_val_score 函数是否会在每次生成新的训练/测试拆分时使用 PCA 转换数据?
还是我需要在管道之后创建一个新的 PCA 以适应 cross_val_score 函数?