半监督朴素贝叶斯中的对数似然计算

机器算法验证 分类 可能性 朴素贝叶斯 无监督学习 半监督学习
2022-03-17 02:07:13

关于半监督朴素贝叶斯中的对数似然计算,我有以下 2 个问题。

  1. 我在网上阅读了几篇文档,在半监督朴素贝叶斯的每次 EM 迭代中,对数似然都是正的。这总是正确的吗?在我的文本分类问题中,我得到以下对数似然:

                     previous loglh     current loglh         diff 
    M: #iteration 2  -36268.3096003 -> -89209.1178494 (-52940.8082491   )
    M: #iteration 3  -89209.1178494 -> -34633.3568107 ( 54575.7610387   )
    M: #iteration 4  -34633.3568107 -> -38624.6148215 ( -3991.25801086  )
    M: #iteration 5  -38624.6148215 -> -32929.3134083 (  5695.30141321  )
    M: #iteration 6  -32929.3134083 -> -36901.1324845 ( -3971.81907618  )
    M: #iteration 7  -36901.1324845 -> -33105.8190786 (  3795.31340593  )
    M: #iteration 8  -33105.8190786 -> -35887.8113077 ( -2781.99222912  )
    M: #iteration 9  -35887.8113077 -> -33249.0299832 (  2638.78132451  )
    M: #iteration 10 -33249.0299832 -> -35094.6821847 ( -1845.65220157  )
    M: #iteration 11 -35094.6821847 -> -33459.5111152 (  1635.17106958  )
    M: #iteration 12 -33459.5111152 -> -34587.8807293 ( -1128.36961412  )
    M: #iteration 13 -34587.8807293 -> -33661.1108938 (   926.769835475 )
    M: #iteration 14 -33661.1108938 -> -34252.017022  (  -590.906128148 )
    M: #iteration 15 -34252.017022  -> -33804.2917848 (   447.72523711  )
    M: #iteration 16 -33804.2917848 -> -34025.8914036 (  -221.599618742 )
    M: #iteration 17 -34025.8914036 -> -33851.2573206 (   174.634083003 )
    M: #iteration 18 -33851.2573206 -> -33911.2395915 (   -59.9822709405)
    M: #iteration 19 -33911.2395915 -> -33871.2589912 (    39.980600331 )
    M: #iteration 20 -33871.2589912 -> -33843.8767245 (    27.3822666886)
    

    正如你所看到的,它在一些迭代中得到了改进,而在其他迭代中它退化了。这种情况交替发生,我觉得这很奇怪......

  2. 如果L(U)是标记(未标记)文档的数量,C班级的数量和classdi是标记文档的类别i,我将对数似然计算为以下两个似然的总和。这个计算正确吗?

    loglik(hlabeled)=i=1Llog(prob(classdi)prob(di|classdi))loglik(hunlabeled)=i=1Uj=1Clog(prob(classj)prob(di|classj))

0个回答
没有发现任何回复~