关于半监督朴素贝叶斯中的对数似然计算,我有以下 2 个问题。
我在网上阅读了几篇文档,在半监督朴素贝叶斯的每次 EM 迭代中,对数似然都是正的。这总是正确的吗?在我的文本分类问题中,我得到以下对数似然:
previous loglh current loglh diff M: #iteration 2 -36268.3096003 -> -89209.1178494 (-52940.8082491 ) M: #iteration 3 -89209.1178494 -> -34633.3568107 ( 54575.7610387 ) M: #iteration 4 -34633.3568107 -> -38624.6148215 ( -3991.25801086 ) M: #iteration 5 -38624.6148215 -> -32929.3134083 ( 5695.30141321 ) M: #iteration 6 -32929.3134083 -> -36901.1324845 ( -3971.81907618 ) M: #iteration 7 -36901.1324845 -> -33105.8190786 ( 3795.31340593 ) M: #iteration 8 -33105.8190786 -> -35887.8113077 ( -2781.99222912 ) M: #iteration 9 -35887.8113077 -> -33249.0299832 ( 2638.78132451 ) M: #iteration 10 -33249.0299832 -> -35094.6821847 ( -1845.65220157 ) M: #iteration 11 -35094.6821847 -> -33459.5111152 ( 1635.17106958 ) M: #iteration 12 -33459.5111152 -> -34587.8807293 ( -1128.36961412 ) M: #iteration 13 -34587.8807293 -> -33661.1108938 ( 926.769835475 ) M: #iteration 14 -33661.1108938 -> -34252.017022 ( -590.906128148 ) M: #iteration 15 -34252.017022 -> -33804.2917848 ( 447.72523711 ) M: #iteration 16 -33804.2917848 -> -34025.8914036 ( -221.599618742 ) M: #iteration 17 -34025.8914036 -> -33851.2573206 ( 174.634083003 ) M: #iteration 18 -33851.2573206 -> -33911.2395915 ( -59.9822709405) M: #iteration 19 -33911.2395915 -> -33871.2589912 ( 39.980600331 ) M: #iteration 20 -33871.2589912 -> -33843.8767245 ( 27.3822666886)
正如你所看到的,它在一些迭代中得到了改进,而在其他迭代中它退化了。这种情况交替发生,我觉得这很奇怪......
如果是标记(未标记)文档的数量,班级的数量和是标记文档的类别,我将对数似然计算为以下两个似然的总和。这个计算正确吗?