我是 NER 的新手,我一直在尝试使用 Spacy 提取名称。这是我的代码:
import spacy
spacy_nlp = spacy.load('en_core_web_sm')
doc = spacy_nlp(text.strip())
# create sets to hold words
named_entities = set()
money_entities = set()
organization_entities = set()
location_entities = set()
time_indicator_entities = set()
for i in doc.ents:
entry = str(i.lemma_).lower()
text = text.replace(str(i).lower(), "")
# Time indicator entities detection
if i.label_ in ["TIM", "DATE"]:
time_indicator_entities.add(entry)
# money value entities detection
elif i.label_ in ["MONEY"]:
money_entities.add(entry)
# organization entities detection
elif i.label_ in ["ORG"]:
organization_entities.add(entry)
# Geographical and Geographical entities detection
elif i.label_ in ["GPE", "GEO"]:
location_entities.add(entry)
# extract artifacts, events and natural phenomenon from text
elif i.label_ in ["ART", "EVE", "NAT", "PERSON"]:
named_entities.add(entry.title())
该模型似乎对某些类型的名称具有不错的准确性。然而,它不知道人们的名字在世界各地是如何不同的(不适应文化差异)。是否有可能的解决方法来避免这种偏见?