我创建了一个MultinomialNB分类器模型,我试图通过它来标记一些测试文本:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import preprocessing
from sklearn.naive_bayes import MultinomialNB
tfv = TfidfVectorizer(strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
use_idf=1,smooth_idf=1,sublinear_tf=1)
# df['text'] is a long string text of words
tfv.fit(df['text'])
lbl_enc = preprocessing.LabelEncoder()
# df['which_subject'] is one of the following 7 subjects: ['Educational', 'Political', 'Sports', 'Tech', 'Social', 'Religions', 'Economics']
y = lbl_enc.fit_transform(df['which_subject'])
xtrain_tfv = tfv.transform(df['text'])
# xtest_tfv has 7 samples
xtest_tfv = tfv.transform(test_df['text'])
clf = MultinomialNB()
clf.fit(xtrain_tfv, y)
y_test_preds = clf.predict_proba(xtest_tfv)
现在y_test_preds如下:
0.255328 0.118111 0.129958 0.123368 0.119301 0.131098 0.122836
0.122814 0.265444 0.117637 0.13531 0.116697 0.122812 0.119286
0.131485 0.114459 0.258224 0.122414 0.118132 0.134005 0.12128
0.125075 0.131948 0.122668 0.258655 0.116518 0.119995 0.12514
0.124356 0.116987 0.121706 0.119796 0.266172 0.127231 0.123751
0.132295 0.1192 0.13366 0.119445 0.123186 0.257318 0.114895
0.126779 0.118406 0.123723 0.127393 0.122539 0.117509 0.263652
如您所见,所有元素都小于 0.5。这张表有什么显示吗?我可以得出分类器无法标记测试文本的结论吗?
