如何在 NLP CRF 模型上执行网格搜索

数据挖掘 nlp 机器学习模型 网格搜索
2022-02-22 17:15:57

我正在尝试对 sklearn_crfsuite.CRF 模型执行超参数调整。当我尝试执行下面的代码时,它没有给出任何异常,但它可能无法执行拟合。因此,如果我尝试从网格搜索中获得最佳估计器,它就不起作用。

%%time
# define fixed parameters and parameters to search
crf = sklearn_crfsuite.CRF(
    algorithm='lbfgs',
    max_iterations=100,
    all_possible_transitions=True
)
params_space = {
    "c1": [0,0.05,0.1, 0.25,0.5,1],
    "c2": [0,0.05,0.1, 0.25,0.5,1]
}

# use the same metric for evaluation
f1_scorer = make_scorer(metrics.flat_f1_score,
                        average='weighted', labels=labels)

# search
grid_search = GridSearchCV(estimator=crf,
                           param_grid=params_space,
                           cv=3,
                           n_jobs=-1, verbose=1,scoring=f1_scorer)

#grid_search.fit(X_train, Y_train)
#above code throws exception, which seems to be a open bug in latest version of scikit-learn 0.24.0 or later.
#github link for bug: https://github.com/TeamHG-Memex/sklearn-crfsuite/issues/60
try:
    grid_search.fit(X_train, Y_train)
except AttributeError as e:
     if "'CRF' object has no attribute 'keep_tempfiles'" not in str(e):
        raise

任何帮助将不胜感激,我如何在这里执行超参数调整?

我参考了本教程,但陷入了同样的境地。

1个回答

我不知道我们如何在 sci-kit 版本 0.24 或更高版本中解决它,但是当我将它降级到 0.23.2 版本时,同一段代码似乎工作正常。