机器算法验证 - 如何使用预训练的 word2vec 模型？ - 吾爱随笔录

如何使用预训练的 word2vec 模型？

机器算法验证张量流 word2vec 喀拉斯

2022-03-08 01:51:54

我在哪里可以找到word2vec经过一些英文文章训练的可靠模型？

我需要一个word2vec黑盒子，例如，我可以将一个句子作为数组传递： ["London", "is", "the", "capital", "of", "Great", "Britain"]

并收到： [some_vector_of_floats1, some_vector_of_floats2, some_vector_of_floats3, some_vector_of_floats4, some_vector_of_floats5, some_vector_of_floats6, some_vector_of_floats7]

2个回答

在 Python 中，您可以使用Gensim

import gensim
model = gensim.models.Word2Vec.load_word2vec_format('path-to-vectors.txt', binary=False)
# if you vector file is in binary format, change to binary=True
sentence = ["London", "is", "the", "capital", "of", "Great", "Britain"]
vectors = [model[w] for w in sentence]

这些向量应该比使用 word2vec 获得的预训练向量提供更好的性能。

我稍微修改了代码 - 出于我的目的

vocab = model.vocab.keys()
sentence = ["London", "is", "the", "capital", "of", "Great", "Britain"]
vectors=[]
for w in sentence:
    if w in vocab:
        vectors.append(model[w])
    else:
        print("Word {} not in vocab".format(w))
        vectors.append([0])

你也可以使用 try/catch - 你的电话。

其它你可能感兴趣的问题

上一篇联合概率中的分号表示法是什么？下一篇在神经网络的上下文中，术语“密集”和“稀疏”是什么意思？