机器算法验证 - 使用 VGAM 和 rms 包的 R 中序数逻辑回归中的比例赔率假设 - 吾爱随笔录

序数逻辑回归的假设是比例优势假设。使用 R 和提到的 2 个包，我有 2 种方法可以检查，但每种方法都有问题。

1) 使用 rms 包

给定下一个命令

library(rms)
ddist <- datadist(Ki67,Cyclin_E)
options(datadist='ddist')
f <- lrm(grade ~Ki67+Cyclin_E);f
sf <- function(y)
c('Y>=1'=qlogis(mean(y >= 1)),'Y>=2'=qlogis(mean(y >= 2)),'Y>=3'=qlogis(mean(y >= 3)))
s <- summary(grade ~Ki67+Cyclin_E, fun=sf)
plot(s,which=1:3,pch=1:3,xlab='logit',main='',xlim=c(-2.5,2.5))

我有

lrm(formula = grade ~ Ki67 + Cyclin_E)

Frequencies of Missing Values Due to Each Variable
   grade     Ki67 Cyclin_E 
       0        0        3 


                     Model Likelihood     Discrimination    Rank Discrim.    
                        Ratio Test            Indexes          Indexes       

Obs            42    LR chi2     11.38    R2       0.268    C       0.728    
 1             11    d.f.            2    g        1.279    Dxy     0.456    
 2             15    Pr(> chi2) 0.0034    gr       3.592    gamma   0.458    
 3             16                         gp       0.192    tau-a   0.308    
max |deriv| 1e-07                         Brier    0.166                     


         Coef    S.E.   Wald Z Pr(>|Z|)
y>=2     -0.1895 0.8427 -0.22  0.8221  
y>=3     -2.0690 0.9109 -2.27  0.0231  
Ki67      0.0971 0.0330  2.94  0.0033  
Cyclin_E -0.0076 0.0227 -0.33  0.7387

该s表给出：（不幸的是，我不知道如何上传用 R 制作的图表）

grade    N=45

+--------+-------+--+----+---------+----------+
|        |       |N |Y>=1|Y>=2     |Y>=3      |
+--------+-------+--+----+---------+----------+
|Ki67    |[ 2, 9)|12|Inf |0.6931472|-1.0986123|
|        |[ 9,16)|12|Inf |0.3364722|-2.3978953|
|        |[16,24)|10|Inf |2.1972246| 0.0000000|
|        |[24,44]|11|Inf |2.3025851| 1.5040774|
+--------+-------+--+----+---------+----------+
|Cyclin_E|[ 3,16)|15|Inf |1.0116009|-0.1335314|
|        |[16,22)| 7|Inf |1.7917595|-0.9162907|
|        |[22,33)|10|Inf |1.3862944|-0.8472979|
|        |[33,80]|10|Inf |0.4054651|-0.4054651|
|        |Missing| 3|Inf |      Inf| 0.6931472|
+--------+-------+--+----+---------+----------+
|Overall |       |45|Inf |1.1284653|-0.4054651|
+--------+-------+--+----+---------+----------+

对于 Ki67，我看到 4 个差异 logit(P[Y> = 2])-logit(P[Y> = 3])中有 3 个接近 2。只有最后一个差异相当低（大约 0.8）。但是这里的 Ki67 是连续的，不是分类的，所以我不知道表格的结果是否正确，也没有任何 p 值可以决定。顺便说一句，我在 SPSS 中运行上述内容，我并没有拒绝这个假设。

2）使用VGAM包

在这里使用下一个命令，我在比例赔率的假设下拥有模型

library(VGAM)
fit1 <- vglm(grade ~Ki67+Cyclin_E,family=cumulative(parallel=T))
summary(fit1)

和结果

Coefficients:
                Estimate Std. Error  z value
(Intercept):1  0.1894723   0.820442  0.23094
(Intercept):2  2.0690395   0.886732  2.33333
Ki67          -0.0970972   0.032423 -2.99467
Cyclin_E       0.0075887   0.021521  0.35261

Number of linear predictors:  2 

Names of linear predictors: logit(P[Y< = 1]), logit(P[Y< = 2])

Dispersion Parameter for cumulative family:   1

Residual deviance: 79.86801 on 80 degrees of freedom

Log-likelihood: -39.93401 on 80 degrees of freedom

Number of iterations: 5

在使用下一个命令时，我有没有假设比例赔率假设的模型

fit2 <- vglm(grade ~Ki67+Cyclin_E,family=cumulative(parallel=F))

不幸的是，我收到了下一条消息

警告信息：在 vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 = Xm2, : 在 30 次迭代中未获得收敛

但是，如果我输入summary(fit2)我会得到结果，但我不知道它们是否正确。我的意图是使用下一个命令并获得答案，但我知道这是否正确（顺便说一句，如果我这样做，我会得到p-value=0.6.

pchisq(deviance(fit1)-deviance(fit2),
df=df.residual(fit1)-df.residual(fit2),lower.tail=FALSE)

那么，关于上面提到的方法，有没有人知道我得到的结果是否有效，或者在 VGAM 包的情况下，有没有办法增加迭代次数？还有其他方法可以检查吗？