数据挖掘 - 根据相似度对技能进行排名 - 吾爱随笔录

我需要根据人类技能与输入技能的相似性对它们进行排名。所以如果我输入“荷兰语”，我想要这样的列表：

0.97 Dutch
0.86 Dutch lessons
0.55 Frisian
0.50 Flemish
0.27 German language

我有一个包含大约 4500 种人类技能的数据库（从“C 编程”到“烘焙杏仁蛋糕”），其中有 600 种是手动分类的。我已经可以在 BabelNet 上找到相应的文章并拉取域、类别和相关术语。

使用来自 BabelNet 的数据的示例技能：

name:"photography"
categories:
  0:"Photography"
  1:"French_inventions"
  2:"Optics"
  3:"1822_introductions"
manualCategory:"art & music"
domains:
   ART_ARCHITECTURE_AND_ARCHAEOLOGY:1
compounds:
  0:"digital_photography"
  1:"landscape_photography"
  2:"photographic_developing"
  3:"motion_photography"
  4:"nature_photography"
  ...
  48:"photographic_plates"
otherForms:
  0:"still_photography"
  1:"photo"
  2:"photos"
  3:"photographed"
  4:"photographers"
  ...
  20:"Photographer"

你能建议我的方法或至少引导正确的方向吗？