机器算法验证 - 如何解释 GLMM 结果？ - 吾爱随笔录

如何解释 GLMM 结果？

机器算法验证 r 混合模式解释咕噜咕噜零通胀

2022-04-09 03:25:14

我的问题与我之前的文章Extract variance of the fixed effect in a glmm 有关。但是，在这种情况下，我更改了 GLMM 遵循的模型。它遵循一个对数族，并且由于我的数据集中有很多零，所以我使用了零膨胀方法。

我想获得每个栖息地内发生率（inc.）的变化（方差分量），同时注意季节和地点等随机因素

这是我的数据集：

## Incidence:

Incidence <- data.frame(Inc. = c(0.4400, 0.5102, 0.2979, 0.2667, 0.0000, 0.0000,
                                 0.0200, 0.0213, 0.0000, 0.0238, 0.0256, 0.0000,
                                 0.0000, 0.1538, 0.0417, 0.0000, 0.0734, 0.0000,
                                 0.0000, 0.0000, 0.1293, 0.0072, 0.0000, 0.0078,
                                 0.0000, 0.0000, 0.0000, 0.0068, 0.0000, 0.0000,
                                 0.0068), 
                        Habitat = c("Crop", "Crop", "Crop", "Crop", "Edge", "Edge", 
                                    "Edge", "Edge", "Edge", "Edge", "Edge", "Edge", 
                                    "Edge", "Edge", "Edge", "Oakwood", "Oakwood", 
                                    "Oakwood", "Oakwood", "Oakwood", "Oakwood", 
                                    "Oakwood", "Oakwood", "Wasteland", "Wasteland", 
                                    "Wasteland", "Wasteland", "Wasteland", "Wasteland", 
                                    "Wasteland", "Wasteland"), 
                        Season = c("Summer", "Summer", "Summer", "Summer", "Autumn", 
                                   "Autumn", "Autumn", "Autumn", "Spring", "Spring", 
                                   "Spring", "Spring", "Summer", "Summer", "Summer", 
                                   "Autumn", "Autumn", "Autumn", "Autumn", "Spring",
                                   "Spring", "Spring", "Spring", "Autumn", "Autumn", 
                                   "Autumn", "Autumn", "Spring", "Spring", "Spring", 
                                   "Spring"), 
                        Site = c("M1", "M2", "M3", "M4", "L1", "L2", "L3", "L4", 
                                 "L1", "L2", "L3", "L4", "L1", "L2", "L3", "Q1", 
                                 "Q2", "Q3", "Q4", "Q1", "Q2", "Q3", "Q4", "E1", 
                                 "E2", "E3", "E4", "E1", "E2", "E3", "E4"))

为了获得变化，我之前通过 shapiro wilk 测试检查了 Rstudio 我的数据集的分布情况。

shapiro.test(x = Incidence$Inc.):

       Shapiro-Wilk normality test
       data:  Incidence$Incidence
       W = 0.56708, p-value = 2.092e-08

此外，我通过 levene 测试得到了同方差性：

leveneTest(y = Incidence$Inc., group = Incidence$Habitat, center = "median")

     Levene's Test for Homogeneity of Variance (center = "median")
           Df F value   Pr(>F)   
     group  3  6.3481 0.002129 **
     27                    
     ---
     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

之后我检查分布是如何使用的：

Input_2 <- Incidence$Inc.
library(rriskDistributions)
Prueba <- fit.cont(as.vector(t(Input_2)))

我得到了一个日志分布

然后我在 R 中执行了这个数据集的 glmm：

GlM_habitats <- glmmTMB(Inc.~ Habitat + (1|Season)+ (1|Site),
                        data = Incidence,
                        ziformula = ~1,
                        family = poisson(link = "log")) 
 
#Warning message:
#In glmmTMB(Inc.~ Habitat + (1 | Season) + (1 | Site), data = Incidence,  :
#non-integer counts in a poisson model

Anova(GlM_habitats)
Analysis of Deviance Table (Type II Wald chisquare tests)

Response: Incidence
         Chisq Df Pr(>Chisq)
Habitat 3.0632  3      0.382 

summary(GlM_habitats)
   Family: poisson  ( log )
   Formula:          Inc.~ Habitat + (1 | Season) + (1 | Site)
   Zero inflation:             ~1
   Data: Incidence

    AIC      BIC   logLik deviance df.resid 
    23.5     33.5     -4.7      9.5       24 

 Random effects:

 Conditional model:
  Groups Name        Variance  Std.Dev.
  Season (Intercept) 5.656e-13 7.52e-07
  Site   (Intercept) 1.176e-13 3.43e-07
 Number of obs: 31, groups:  Season, 3; Site, 16

 Conditional model:
                  Estimate Std. Error z value Pr(>|z|)
 (Intercept)       -0.9710     0.8125  -1.195    0.232
 HabitatEdge       -2.6780     2.0382  -1.314    0.189
 HabitatOakwood    -2.6696     2.3290  -1.146    0.252
 HabitatWasteland  -4.9528     6.8841  -0.720    0.472

 Zero-inflation model:
             Estimate Std. Error z value Pr(>|z|)
 (Intercept)    -24.1    43216.9  -0.001        1

然后就像在上一篇文章中一样，我试图提取固定效应的方差：

 # Variance of random effects: 
 vc <- lme4::VarCorr(GlM_habitats)
 print(vc,comp=c("Variance","Std.Dev."),digits=2)

 Conditional model:
  Groups Name        Variance Std.Dev.
  Season (Intercept) 5.7e-13  7.5e-07 
  Site   (Intercept) 1.2e-13  3.4e-07 

  # Variance-Covariance Matrix of fixed effects: 
  vc_fixed <- as.matrix(vcov(GlM_habitats))
  # Variance of fixed effects: 
  var_fixed <- diag(vc_fixed); var_fixed

  [[1]]
                   (Intercept) HabitatEdge HabitatOakwood HabitatWasteland
  (Intercept)         0.660153   -0.660153      -0.660153        -0.660153
  HabitatEdge        -0.660153    4.154245       0.660153         0.660153
  HabitatOakwood     -0.660153    0.660153       5.424338         0.660153
  HabitatWasteland   -0.660153    0.660153       0.660153        47.390362



  # Standard errors of fixed effects: 
  se_fixed <- sqrt(var_fixed); se_fixed

当我执行此分析时，我得到了这个

 Error in sqrt(var_fixed) : non-numeric argument to mathematical function

我想知道如何解释这个结果并知道它们是否执行得很好。我不敢相信，Season方差Site非常低，方差分析结果给出的 ap 值并不显着。此外，我不知道为什么固定效果的标准误差不起作用。

我究竟做错了什么？

1个回答

这里有两个主要问题：

与其他线性模型一样，结果变量不需要在线性混合效应模型中呈正态分布。因此shapiro.test(x = Incidence$Inc.)，任何试图找到结果分布的程序（例如fit.cont您使用的程序）都是浪费时间 - 这些东西可能对理论家感兴趣，但它们对应用研究的价值非常有限。然而，我们希望残差至少近似正态分布。
您已经拟合了泊松模型。泊松模型适用于具有计数（整数）结果的数据。您有一个数字变量，因此要拟合的第一个模型是标准线性混合效应模型。
您只有 3 个级别的Season. 这应该是一个固定的效果。

因此，根据您的数据，我们可以拟合：

> m0 <- lmer(Inc.~ Habitat + (1|Season)+ (1|Site),
+            data = Incidence)
> summary(m0)
Linear mixed model fit by REML ['lmerMod']
Formula: Inc. ~ Habitat + (1 | Season) + (1 | Site)
   Data: Incidence

REML criterion at convergence: -78.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.45229 -0.30319 -0.01575  0.20558  2.53994 

Random effects:
 Groups   Name        Variance  Std.Dev.
 Site     (Intercept) 0.0031294 0.05594 
 Season   (Intercept) 0.0005702 0.02388 
 Residual             0.0008246 0.02872 
Number of obs: 31, groups:  Site, 16; Season, 3

Fixed effects:
                 Estimate Std. Error t value
(Intercept)       0.35450    0.03607   9.827
HabitatEdge      -0.32669    0.04475  -7.301
HabitatOakwood   -0.31616    0.04637  -6.818
HabitatWasteland -0.33973    0.04637  -7.326

然后我们可以检查残差直方图：

hist(residuals(m0))

看起来不错。无需对正态性进行统计检验。

请注意，您可能应该将其建模Season为固定效应，而不是随机效应。

其它你可能感兴趣的问题

上一篇面试题：如何衡量线性回归的性能？下一篇为什么引导 p 值不仅仅是在引导样本中找到空值？