数据挖掘 - 如何训练用于文本分类的 spacy 模型？ - 吾爱随笔录

如何训练用于文本分类的 spacy 模型？

数据挖掘机器学习 nlp 斯派西

2021-09-24 01:39:09

我可以知道为文本分类训练 spacy 模型的方式或步骤吗？（在我的情况下是二进制分类）

请帮助我处理流程和方法。

1个回答

您在网上有几个很好的教程：

https://www.kaggle.com/poonaml/text-classification-using-spacy

基本上，您必须：

在 python 中导入数据，这里 POSITIVE 是要预测的变量，0 和 1 是 2 个编码类。

TRAIN_DATA = [(Text1, {'cats': {'POSITIVE': 1}}),
(Text2, {'cats': {'POSITIVE': 0}})]

在 spacy 管道对象 (nlp) 中初始化一个 textcat 管道，并在其中添加标签变量。

nlp = spacy.load('en_core_web_sm')
if 'textcat' not in nlp.pipe_names:
  textcat = nlp.create_pipe("textcat")
  nlp.add_pipe(textcat, last=True) 
else:
  textcat = nlp.get_pipe("textcat")

textcat.add_label('POSITIVE')

迭代训练示例以优化模型

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']

n_iter = 1

# Only train the textcat pipe
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.begin_training()
    print("Training model...")
    for i in range(n_iter):
        losses = {}
        batches = minibatch(train_data, size=compounding(4,32,1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer,
                      drop=0.2, losses=losses)

其它你可能感兴趣的问题

上一篇部署 LSTM 模型下一篇何时使用随机森林