我有一个分类问题。我有称为“经验”、“教育”、“能力”的集群。带有两列的标记数据(包含所有集群的 72,000 多个条目)如下所示。
year of education Education
years education Education
years of educational Education
two years of education Education
years of education beyond Education
education four year Education
..........
of proven sales experience Ability
knowledge of and Ability
experience or education high Ability
assigned knowledge skills Ability
accountable for driving Ability
..........
administrative and leadership skills Experience
advanced negotiations skills Experience
must have keyboarding skills Experience
must have skills Experience
activities preferred skills Experience
of clinical skill Experience
我必须给出一个字符串,并根据经过训练的模型确定它属于经验、教育还是能力。字符串的例子。
string1 = "There is a requirement of four-year professional degrees"
string2 = "Able to drive the teams to higher levels"
string3 = "Must have programming experience in C, C++"
当我测试这些字符串时,它应该能够将字符串分类到任一簇中。
- 有哪些可能的方法来训练我的模型?
- 参考 word2vec 和 doc2vec,这些模型会起作用吗?
我找不到任何相关的例子来训练单词模型和测试字符串。关于如何工作的任何想法?