我需要根据人类技能与输入技能的相似性对它们进行排名。所以如果我输入“荷兰语”,我想要这样的列表:
0.97 Dutch
0.86 Dutch lessons
0.55 Frisian
0.50 Flemish
0.27 German language
我有一个包含大约 4500 种人类技能的数据库(从“C 编程”到“烘焙杏仁蛋糕”),其中有 600 种是手动分类的。我已经可以在 BabelNet 上找到相应的文章并拉取域、类别和相关术语。
使用来自 BabelNet 的数据的示例技能:
name:"photography"
categories:
0:"Photography"
1:"French_inventions"
2:"Optics"
3:"1822_introductions"
manualCategory:"art & music"
domains:
ART_ARCHITECTURE_AND_ARCHAEOLOGY:1
compounds:
0:"digital_photography"
1:"landscape_photography"
2:"photographic_developing"
3:"motion_photography"
4:"nature_photography"
...
48:"photographic_plates"
otherForms:
0:"still_photography"
1:"photo"
2:"photos"
3:"photographed"
4:"photographers"
...
20:"Photographer"
你能建议我的方法或至少引导正确的方向吗?