我可以知道为文本分类训练 spacy 模型的方式或步骤吗?(在我的情况下是二进制分类)
请帮助我处理流程和方法。
我可以知道为文本分类训练 spacy 模型的方式或步骤吗?(在我的情况下是二进制分类)
请帮助我处理流程和方法。
您在网上有几个很好的教程:
https://www.kaggle.com/poonaml/text-classification-using-spacy
基本上,您必须:
TRAIN_DATA = [(Text1, {'cats': {'POSITIVE': 1}}),
(Text2, {'cats': {'POSITIVE': 0}})]
nlp = spacy.load('en_core_web_sm')
if 'textcat' not in nlp.pipe_names:
textcat = nlp.create_pipe("textcat")
nlp.add_pipe(textcat, last=True)
else:
textcat = nlp.get_pipe("textcat")
textcat.add_label('POSITIVE')
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
n_iter = 1
# Only train the textcat pipe
with nlp.disable_pipes(*other_pipes):
optimizer = nlp.begin_training()
print("Training model...")
for i in range(n_iter):
losses = {}
batches = minibatch(train_data, size=compounding(4,32,1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(texts, annotations, sgd=optimizer,
drop=0.2, losses=losses)