从Python 中计算两个字符串之间的 Levenshtein 距离,可以计算两个给定字符串(句子)之间的距离和相似度。
并从Python 中的 Levenshtein 距离和文本相似度返回每个字符的矩阵和两个字符串的距离。
有什么方法可以计算字符串中每个单词之间的距离和相似度,并打印字符串(句子)中每个单词的矩阵?
a = "This is a dog."
b = "This is a cat."
from difflib import ndiff
def levenshtein(seq1, seq2):
size_x = len(seq1) + 1
size_y = len(seq2) + 1
matrix = np.zeros ((size_x, size_y))
for x in range(size_x):
matrix [x, 0] = x
for y in range(size_y):
matrix [0, y] = y
for x in range(1, size_x):
for y in range(1, size_y):
if seq1[x-1] == seq2[y-1]:
matrix [x,y] = min(
matrix[x-1, y] + 1,
matrix[x-1, y-1],
matrix[x, y-1] + 1
)
else:
matrix [x,y] = min(
matrix[x-1,y] + 1,
matrix[x-1,y-1] + 1,
matrix[x,y-1] + 1
)
print (matrix)
return (matrix[size_x - 1, size_y - 1])
levenshtein(a, b)
输出
>> 3
矩阵
[[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.]
[ 1. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.]
[ 2. 1. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.]
[ 3. 2. 1. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.]
[ 4. 3. 2. 1. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
[ 5. 4. 3. 2. 1. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[ 6. 5. 4. 3. 2. 1. 0. 1. 2. 3. 4. 5. 6. 7. 8.]
[ 7. 6. 5. 4. 3. 2. 1. 0. 1. 2. 3. 4. 5. 6. 7.]
[ 8. 7. 6. 5. 4. 3. 2. 1. 0. 1. 2. 3. 4. 5. 6.]
[ 9. 8. 7. 6. 5. 4. 3. 2. 1. 0. 1. 2. 3. 4. 5.]
[10. 9. 8. 7. 6. 5. 4. 3. 2. 1. 0. 1. 2. 3. 4.]
[11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. 1. 2. 3. 4.]
[12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 2. 2. 3. 4.]
[13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 3. 3. 3. 4.]
[14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 4. 4. 4. 3.]]
是否可以计算单词级别的 Levenshtein 距离?
所需矩阵
This is a cat
This
is
a
dog