我已经使用glm.nbfrom MASSpackage 有一段时间了。然而,有些事情我似乎不太明白。假设我有一个如下所示的数据:
表达物种时间点复制 40 A T1 R1 60 A T1 R2 48 A T1 R3 52 A T2 R1 58 A T2 R2 64 A T2 R3 39 B T1 R1 48 B T1 R2 54 B T1 R3 448 B T2 R1 490 B T2 R2 378 B T2 R3
现在,如果我想检查时间点和 之间speciesA是否存在表达差异,那么,我会:speciesBT1T2
require(MASS)
df <- data.frame( Expression=c(40,60,48,52,58,64,39,48,54,448,490,378), Species=c(rep("A",6), rep("B",6)), timePoint=rep(c(rep("T1",3), rep("T2",3)), 2), Replicate=rep(c("R1","R2","R3"),4), stringsAsFactors=T)
nb.fit <- glm.nb( Expression ~ Species * timePoint, data=df, control=glm.control(maxit=25, trace=T) )
summary(nb.fit)
Call:
glm.nb(formula = Expression ~ Species * timePoint, data = df,
control = glm.control(maxit = 25, trace = T), init.theta = 163.3237449,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.57348 -0.78584 0.06399 0.71550 1.27660
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.89860 0.09380 41.565 <2e-16 ***
SpeciesB -0.04845 0.13391 -0.362 0.717
timePointT2 0.16184 0.12879 1.257 0.209
SpeciesB:timePointT2 2.07175 0.16888 12.268 <2e-16 ***
(Dispersion parameter for Negative Binomial(163.3237) family taken to be 1)
Null deviance: 947.708 on 11 degrees of freedom
Residual deviance: 10.024 on 8 degrees of freedom
AIC: 102.06
Number of Fisher Scoring iterations: 1
Theta: 163
Std. Err.: 138
2 x log-likelihood: -92.06
现在,estimate可以通过 log( T2/T1 of B) - log( T2/T1 of A) 计算得到,如下所示:
> meanVal <- c( t( sapply( split(df, df[,2:3] ), function(x) mean(x[,1] ) ) ) )
> estimate <- log( meanVal[4]/meanVal[2] ) - log( meanVal[3]/meanVal[1] )
> estimate
> [1] 2.071749
直到这我跟随。但是,从这里,我想知道这些:
1)如何估计标准误差?
3)负二项分布的拟合如何影响标准。误差、z 值还是 p 值?我的意思是,dispersion parameter计算在哪里使用?
我已经阅读并尝试从很多教程和书籍中理解。但我似乎不明白。如果你们中的任何人能为我简化它,我将不胜感激。
谢谢!