数据挖掘 - 超参数调优 XGBClassifier - 吾爱随笔录

我正在为一场比赛研究一个高度不平衡的数据集。

训练数据形状为：(166573, 14)

train['outcome'].value_counts()

0    159730 
1      6843

我正在使用 XGBClassifier 构建模型，我手动设置的唯一参数是scale_pos_weight : 23.34 (0 value counts / 1 value counts)

它在 AUC 指标下给出了大约 82%。

我想如果我对所有其他参数进行超调，我可以获得更高的准确性。

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
       n_jobs=1, nthread=None, objective='binary:logistic', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=23.4, seed=None,
       silent=True, subsample=1)

我尝试了 GridSearchCV，但在我的本地机器上完成需要很长时间，而且我无法获得任何结果。

clf = XGBClassifier()
grid = GridSearchCV(clf,
                    params, n_jobs=-1,
                    scoring="roc_auc",
                    cv=3)

grid.fit(X_train, y_train)
print("Best: %f using %s" % (grid.best_score_, grid.best_params_))

考虑到高度不平衡的数据集以及如何运行它，我应该针对哪些其他参数进行调整，以便我实际上可以得到一些结果？