数据挖掘 - 如何找到上下文相关的单词并分类为自定义标签/标签？ - 吾爱随笔录

问题：

假设我有一个包含一些单词及其标签/标签的小数据集。主要任务是根据与数据集中已有单词的上下文关系为其他单词（不在数据集中）提供标签。

例如，假设我的自定义数据集包括

              Soap --> label__(cleaning_agent)
              pencil--> label__(stationary_item)
              mobile--> label__(electronics)
              washingmachine--> label(electronics)
              and so on.

我希望我的程序能够正确预测未知单词的标签，例如

         washing powder to its correct category
         label__(cleaning_agent)
         radio to label__(electronics) etc.

行动：

现在的主要问题是根据上下文找到两个单词之间的关系，但我无法决定找到它的参数是什么。

我尝试了一种使用 datamuse API 和 fastText 库的简单方法。

    Naive approach is as follows->
        step 1-> find all the related words of the given word(let's say W) e.g. pencil using datamuse API.
        step 2-> combine them into a string(let's say S) with spaces in between them
        step 3-> use the label name, W, S as the training dataset for fastText.

注意：fastText 需要标签名称、单词、句子（可以来自新闻文章、博客、维基百科等）作为该单词的上下文。

结果： fastText 没有提供任何可靠的结果。我正在考虑为此目的构建一种神经网络。但我无法决定我们数据的输入参数是什么。

主要问题是关于自定义词标签。我们的程序应该能够根据一些分数将未知（不在训练数据集中）的单词标记到它们最可能的类中。

由于我是 NLP 的新手，我想知道下一步可以做什么。