数据挖掘 - 使用python进行情感分析 - 吾爱随笔录

使用python进行情感分析

数据挖掘 Python nlp 情绪分析

2022-02-24 10:57:30

我有一些包含电影评论的文本文件，我需要确定评论是好还是坏。我尝试了以下代码，但它不起作用：

import nltk
with open("c:/users/user/desktop/datascience/moviesr/movies-1-32.txt", 'r') as m11:
    mov_rev = m11.read()
mov_review1=nltk.word_tokenize(mov_rev)
bon="crap aweful horrible terrible bad bland trite sucks unpleasant boring dull moronic dreadful disgusting distasteful flawed ordinary slow senseless unoriginal weak wacky uninteresting unpretentious "
bag_of_negative_words=nltk.word_tokenize(bon)
bop="Absorbing Big-Budget Brilliant Brutal Charismatic Charming Clever Comical Dazzling Dramatic Enjoyable Entertaining Excellent Exciting  Expensive Fascinating Fast-Moving First-Rate Funny Highly-Charged Hilarious Imaginative Insightful Inspirational Intriguing Juvenile Lasting Legendary Pleasant Powerful Ripping Riveting Romantic Sad  Satirical Sensitive  Sentimental Surprising Suspenseful Tender Thought Provoking Tragic Uplifting Uproarious"
bop.lower()
bag_of_positive_words=nltk.word_tokenize(bop)
vec=[]
for i in bag_of_negative_words:
    if i in mov_review1:
        vec.append(1)
    else:
        for w in bag_of_positive_words:
            if w in moview_review1:
                vec.append(5)

所以我试图检查评论是否包含正面词或负面词。如果它包含一个否定词，则将值 1 分配给向量 vec，否则将分配值 5。但是我得到的输出是一个空向量。

请帮忙。另外，请提出其他解决此问题的方法。

3个回答

尝试从谷歌在此链接中发布的官方“坏词”数据库中搜索Google 的官方坏词列表。另外，这里是好词的链接不是好词的官方列表

对于代码，我会这样做：

textArray = file('dir_to_your_text','r').read().split()

#Bad words should be listed like this for the split function to work
# "*** ****** **** ****" the stars are for the cenzuration :P
badArray = file('dir_to_your_bad_word_file).read().split()
goodArray = file('dir_to_your_good_word_file).read().split()

# Then you use matching algorithm from difflib on good and bad word for every word in an array of words
import difflib

goodMachingCouter = 0;
badMacihngCouter = 0;


for iGood in range(0, len(goodArray)):
    for iWord in range(0, len(textArray)):
        goodMachingCounter += difflib.SequenceMatcher(None, goodArray[iGood], textArray[iWord]).ratio()
     
for iBad in range(0, len(badArray)):
    for iWord in range(0, len(textArray)):
        badMachingCounter += difflib.SequenceMatcher(None, badArray[ibad], textArray[iWgoodord]).ratio()

goodMachingCouter *= 100/(len(goodArray)*len(textArray))
badMacihngCouter *= 100/(len(badArray)*len(textArray))

print('Show the good measurment of the text in %: '+goodMachingCouter)
print('Show the bad measurment of the text in %: '+badMacihngCouter)
print('Show the hootnes of the text: ' + len(textArray)*goodMachingCounter)

代码会很慢但准确:)我没有运行和测试它请为我做并发布正确的代码:)因为我也想测试它:)

以下链接包含 [-5, 5] 范围内的正面和负面极化情绪列表。只需尝试根据单词匹配来计算分数，您就可以获得整体电影评论分数。

AFINN

尝试

vec =[]

for word in bag_of_negative_words:
    if word in mov_review1:
        vec.append(1)

for word in bag_of_positive_words:
    if word in moview_review1:
         vec.append(5)

其它你可能感兴趣的问题

上一篇数据分析可以成为人工智能的基础吗？下一篇如何在不关闭 X11 的情况下运行 R 脚本