数据挖掘 - NLP - 识别标记词 - 吾爱随笔录

请原谅我，因为标题可能不是很准确

我正在尝试创建一个模型来学习单词表示，然后能够预测另一段文本中的单词表示。一个例子会更清楚。请参阅下面的示例：

模型训练文本：

I lived in *Munich last summer. *Germany has a relaxing, slow summer lifestyle. One night, I got food poisoning and couldn't find !Tylenol to make the pain go away, they insisted I take !aspirin instead.

模型预测：

['Munich','Germany','Tylenol','aspirin']

评价文本：

When I lived in Paris last year, France was experiencing a recession. The nightlife was too fun, I developed an addiction to Adderall and Ritalin.

输出：

['Paris','France','Adderall','Ritalin']

问题是在这种情况下哪种 NLP 技术会有所帮助。我什至不知道这些问题叫什么。你能告诉我这些问题叫什么吗？

我能想到的一种方法是训练RNNwithEmbedding Layer来预测的位置，*因为!将*作为国家名称的!前缀，并将作为药物名称的前缀，但我的挑战是我如何为此类构建我的数据训练。这是一种可行的方法吗？

是否有任何资源/材料可供我参考并从中汲取灵感？

我将非常感谢任何帮助或建议。提前非常感谢。

import spacy nlp = spacy.load("en_core_web_lg") doc = nlp("""When I lived in Paris last year, France was experiencing arecession. The nightlife was too fun, I developed an addiction to Adderall and Ritalin.""") print([(e.text, e.label_) for e in doc.ents])