我有很多句子(500k)看起来像这样:
"Penalty missed! Bad penalty by Felipe Brisola - Riga FC - shot with right foot is very close to the goal. Felipe Brisola should be disappointed."
"Penalty saved! Damir Kojasevic - Sutjeska Niksic - fails to capitalise on this great opportunity, shot with right foot saved in the centre of the goal."
"Penalty saved! Stefan Panic - Riga FC - fails to capitalise on this great opportunity, shot with right foot saved in the centre of the goal."
"Penalty saved! Georgie Kelly - Dundalk - fails to capitalise on this great opportunity, shot with right foot saved in the centre of the goal."
"Penalty missed! Still FC København 1, Crvena Zvezda 1. Marko Marin - Crvena Zvezda - hits the bar with a shot with right foot."
如您所见,它们并不是真正的机器人,在最终编写了 1500 行 php 代码(使用正则表达式)并且仍然不一致之后,我决定看看我的机器学习替代方案。
我想要实现的是:
For example this one:
"Penalty saved! Stefan Panic - Riga FC - fails to capitalise on this great opportunity, shot with right foot saved in the centre of the goal."
type => penalty
action => saved
reason => shot with right foot saved in the centre of the goal
person => Stefan Panic
我偶然发现了 spaCy 并看到了“命名实体识别”,并认为也许我可以将它用于此目的。特别是因为我有大量的训练数据。
我想问:spaCy 的命名实体识别是否适合这项任务?如果没有,我应该为这项任务学习什么?
PS:我对python有点了解,但对ML一无所知