数据挖掘 - 哪种模型更适合关键字集分类？ - 吾爱随笔录

哪种模型更适合关键字集分类？

数据挖掘深度学习分类 nlp 文本挖掘文本分类

2022-02-17 18:46:00

存在一个名为文本分类的类似任务。

但我想找到一种输入是关键字集的模型。并且关键字集不是来自一个句子。

例如：

input ["apple", "pear", "water melon"] --> target class "fruit"
input ["tomato", "potato"] --> target class "vegetable"

另一个例子：

input ["apple", "Peking", "in summer"]  -->  target class "Chinese fruit"
input ["tomato", "New York", "in winter"]  -->  target class "American vegetable"
input ["apple", "Peking", "in winter"]  -->  target class "Chinese fruit"
input ["tomato", "Peking", "in winter"]  -->  target class "Chinese vegetable"

谢谢你。

2个回答

您可以在嵌入模型中利用词向量相似性。

TL;DR单词的相似向量（例如水果）将在这个高（向量）维空间中聚集在一起。对于每个可能的类集，您将有一个类集代表（质心），它实际上是一个关键（因此在您的情况下是水果、蔬菜等），您需要做的就是训练/找到您的语料库的代表性词嵌入模型。

将段嵌入（来自 BERT 的想法）用于原始文本分类模型。

例如：

input ["apple", "Peking", "in summer"]  += segment emb [1,2,3,3,0]
input ["tomato", "New York", "in winter"] += segment emb [1,2,2,3,3]

1,2,3输入的数据源类型之类的东西在哪里。

另一个改进：查看 PCNN 或 PCNN+ATT

其它你可能感兴趣的问题

上一篇深度 Q 学习是否适用于有限视野问题？下一篇如何解释来自 keras 模型的预测数据