我正在尝试使用预先训练的句子转换器模型来查找句子之间的相似性。我正在尝试遵循此处的代码 - https://www.sbert.net/docs/usage/paraphrase_mining.html
在试验一中,我运行了 2 个 for 循环,在其中我尝试找到给定句子与其他句子的相似性。这是代码 -
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
# Single list of sentences
sentences = ['The cat sits outside',
'A man is playing guitar',
'The new movie is awesome',
'Do you like pizza?']
#Compute embeddings
embeddings = model.encode(sentences, convert_to_tensor=True)
#Compute cosine-similarities for each sentence with each other sentence
cosine_scores = util.pytorch_cos_sim(embeddings, embeddings)
#Find the pairs with the highest cosine similarity scores
pairs = []
for i in range(len(cosine_scores)-1):
for j in range(i+1, len(cosine_scores)):
pairs.append({'index': [i, j], 'score': cosine_scores[i][j]})
#Sort scores in decreasing order
pairs = sorted(pairs, key=lambda x: x['score'], reverse=True)
print(len(pairs))
6
for pair in pairs[0:10]:
i, j = pair['index']
print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], pair['score']))
A man is playing guitar Do you like pizza? Score: 0.1080
The new movie is awesome Do you like pizza? Score: 0.0829
A man is playing guitar The new movie is awesome Score: 0.0652
The cat sits outside Do you like pizza? Score: 0.0523
The cat sits outside The new movie is awesome Score: -0.0270
The cat sits outside A man is playing guitar Score: -0.0530
这可以按预期工作,因为 4 个句子的组合之间可以有 6 个相似度得分组合。在他们的文档页面上,他们提到由于二次复杂性,这不能很好地扩展,因此他们建议使用 paraphrase_mining() 方法。
但是当我尝试使用该方法时,我没有得到 6 个组合,而是只得到 5 个。为什么会这样?
这是我尝试使用 paraphrase_mining() 方法的示例代码 -
# Single list of sentences
sentences = ['The cat sits outside',
'A man is playing guitar',
'The new movie is awesome',
'Do you like pizza?']
paraphrases = util.paraphrase_mining(model, sentences)
print(len(paraphrases))
5
k = 0
for paraphrase in paraphrases:
print(k)
score, i, j = paraphrase
print("{} \t\t {} \t\t Score: {:.4f}".format(sentences[i], sentences[j], score))
print()
k = k + 1
0
A man is playing guitar Do you like pizza? Score: 0.1080
1
The new movie is awesome Do you like pizza? Score: 0.0829
2
A man is playing guitar The new movie is awesome Score: 0.0652
3
The cat sits outside Do you like pizza? Score: 0.0523
4
The cat sits outside The new movie is awesome Score: -0.0270
工作方式有区别paraphrase_mining()吗?