数据挖掘 - 在管道中使用 SMOTEN - 吾爱随笔录

我正在尝试找出构建管道以训练模型的适当方法，其中包括使用 SMOTENC 算法：

鉴于使用了 N-Nearest Neighbors 算法和欧几里得距离，应将数据归一化（将输入向量单独缩放到单位范数）。在管道中应用 SMOTEN 之前？
该算法可以处理缺失值吗？如果基于中位数和百分位数的数据插补和异常值删除是在 SMOTENC 之前而不是之后执行的，这不会影响插补/百分位数吗？
可以在 one-hot 编码并将数字二进制列定义为分类特征后应用 SMOTEN 吗？
当管道包含在交叉验证模式中时，数据平衡是否仅适用于不平衡的训练折叠或测试折叠？

这是我的管道当前的样子：

from imblearn.pipeline import Pipeline as Pipeline_imb
from imblearn.over_sampling import SMOTENC

categorical_features_bool = [True, True, ……. False, False]
smt = SMOTENC(categorical_features =categorical_features_bool, 
                random_state=RANDOM_STATE_GRID,
                k_neighbors=10
                ,n_jobs=-1
                     )

preprocess_pipeline = ColumnTransformer(
        transformers=[
            ('Winsorize', FunctionTransformer(winsorize, validate=False, 
                                              kw_args={'limits':[0, 0.02],'inplace':False,'axis':0}), 
             ['feat_1,'Feat_2']),

            ('num_impute', SimpleImputer(strategy='median', add_indicator=True) , 
             ['feat_10,'Feat_15']),
        ], remainder='passthrough', #passthough features not listed
        n_jobs=-1,
        verbose = False
    )

Model = LogisticRegression()

model_pipeline = Pipeline_imb([
            ('preprocessing', preprocess_pipeline),
            ('smt', smt),
            ('Std', StandardScaler()),
            ('classifier', Model)
            ])