如何在管道内使用 SMOTENC?

数据挖掘 Python scikit-学习 不平衡学习 smotenc
2022-02-21 01:29:10

如果您能告诉我如何使用SMOTENC ,我将不胜感激。我写:

num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
print(len(num_indices1))
print(len(cat_indices1))

pipeline=Pipeline(steps= [
    # Categorical features
    ('feature_processing', FeatureUnion(transformer_list = [
            ('categorical', MultiColumn(cat_indices1)),

            #numeric
            ('numeric', Pipeline(steps = [
                ('select', MultiColumn(num_indices1)),
                ('scale', StandardScaler())
                        ]))
        ])),
    ('clf', rg)
    ]
)

因此,如前所述,我有 5 个分类特征。实际上,索引 123 到 160 与一个具有 37 个可能值的分类特征相关,这些值使用get_dummies.

我认为SMOTENC应该在分类器之前插入,('clf', reg)但我不知道如何定义 " categorical_features" in SMOTENC此外,你能告诉我在哪里使用imblearn.pipeline吗?

提前致谢。

1个回答

如下所示,应使用两条管道:

num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:120,121:123]].columns.values)
cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,120]].columns.values)
print(len(num_indices1))
print(len(cat_indices1))
cat_indices = [94, 96, 98, 99, 120]

from imblearn.pipeline import make_pipeline

pipeline=Pipeline(steps= [
    # Categorical features
    ('feature_processing', FeatureUnion(transformer_list = [
            ('categorical', MultiColumn(cat_indices1)),

            #numeric
            ('numeric', Pipeline(steps = [
                ('select', MultiColumn(num_indices1)),
                ('scale', StandardScaler())
                        ]))
        ])),
    ('clf', rg)
    ]
)
pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices), pipeline)