set() 和 word_tokenize() 有什么区别?

数据挖掘 Python nlp nltk
2022-03-01 00:29:13
from nltk.tokenize import sent_tokenize ,word_tokenize

sentence = 'jainmiah I love you but you are not bothering about my request, 
            please yaar consider me for the sake'

word_tok = word_tokenize(sentence)
print(word_tok)

set_all = set(word_tokenize(sentence))
print(set_all)

实际上 word_tokenize() 和 set(word_tokenize()) 都返回相同的答案有什么区别?

1个回答

word_tokenize之间有两个区别set

Word_tokenize

  • 返回一个列表(尝试print(type(word_tok))
  • 返回所有令牌,无论是否有重复

  • 返回一个集合(尝试print(type(set_all))
  • 返回所有唯一标记

试试这个

sentence = 'jainmiah jainmiah jainmiah I love you but you are not bothering about my request, please yaar consider me for the sake'

word_tok = word_tokenize(sentence)
print(word_tok)

set_all = set(word_tokenize(sentence))
print(set_all)