数据挖掘 - ColumnTransformer 性能比 sklearn 管道差 - 吾爱随笔录

我有一个（不平衡的，二进制数据）管道模型，由两条管道（预处理和实际模型）组成。现在我想将其包含SimpleImputer到我的预处理管道中，因为我不想将其应用于所有使用的列，ColumnTransformer但现在我发现ColumnTransformer使用 sklearn 管道的性能要差得多（AUC 在 0.93 左右之前，ColumnTransformer现在在 0.7 左右）。我在管道之前填充了 nan 值，以检查性能是否会更好（因为 SimpleImputer 那时不会做任何事情），但即使数据中没有任何 nan 值，性能仍然如此糟糕。我有下面的部分代码。有谁知道发生了什么或我可以改变什么？

from sklearn.pipeline import Pipeline as pipeline
from imblearn.pipeline import Pipeline as pipeline_imb
from sklearn.compose import ColumnTransformer


#option with ColumnTransformer (performs a lot worse)
preproc = ColumnTransformer([
           ('imputer',SimpleImputer(strategy = 'mean'),['col1','col2','col3'])
           ])


#option with sklearn pipeline (performs better)
preproc = pipeline([
           ('SimpleImputer', SimpleImputer(strategy = 'mean')), 
           ])


modelpipe = pipeline_imb([
             ('undersampling',RandomUnderSampler()),
             ('xgboost', xgb.XGBClassifier(**params, n_jobs=-1))
             ])

model = pipeline([('preproc', preproc), ('modelpipe', modelpipe)])

所以只有交换两个 preproc 才能产生如此巨大的性能差异。为什么是这样？