我想通过解析文本开始进行数据挖掘。似乎最好的起点是从文本中处理 n-gram 来尝试情绪分析。
Muffins are fine, I wouldn't say I like them though.
但是,我很想知道是否应该包含标点符号。(我计划从 3 克开始并逐步完善,因为我不确定 2 克是否包含足够的信息以获得准确的结果。)
Muffins are fine | are fine [,] | I wouldn't say | ....
由于找到了“,”,因此从“,”之后的下一个单词重新开始。而不是像往常一样包含标点符号。
Muffins are fine | are fine , | fine , I | , I wouldn't | ...
谁能告诉我这是不是一个坏主意?