我有一些包含电影评论的文本文件,我需要确定评论是好还是坏。我尝试了以下代码,但它不起作用:
import nltk
with open("c:/users/user/desktop/datascience/moviesr/movies-1-32.txt", 'r') as m11:
mov_rev = m11.read()
mov_review1=nltk.word_tokenize(mov_rev)
bon="crap aweful horrible terrible bad bland trite sucks unpleasant boring dull moronic dreadful disgusting distasteful flawed ordinary slow senseless unoriginal weak wacky uninteresting unpretentious "
bag_of_negative_words=nltk.word_tokenize(bon)
bop="Absorbing Big-Budget Brilliant Brutal Charismatic Charming Clever Comical Dazzling Dramatic Enjoyable Entertaining Excellent Exciting Expensive Fascinating Fast-Moving First-Rate Funny Highly-Charged Hilarious Imaginative Insightful Inspirational Intriguing Juvenile Lasting Legendary Pleasant Powerful Ripping Riveting Romantic Sad Satirical Sensitive Sentimental Surprising Suspenseful Tender Thought Provoking Tragic Uplifting Uproarious"
bop.lower()
bag_of_positive_words=nltk.word_tokenize(bop)
vec=[]
for i in bag_of_negative_words:
if i in mov_review1:
vec.append(1)
else:
for w in bag_of_positive_words:
if w in moview_review1:
vec.append(5)
所以我试图检查评论是否包含正面词或负面词。如果它包含一个否定词,则将值 1 分配给向量 vec,否则将分配值 5。但是我得到的输出是一个空向量。
请帮忙。另外,请提出其他解决此问题的方法。