SAS 的 proc genmod 和 R 的 glm 之间的输出差异

机器算法验证 r 物流 广义线性模型 sas
2022-03-20 04:13:48

我正在尝试在 R 中复制适合 SAS 的模型,但我得到的拟合给了我稍微不同的系数和标准误差。

数据:

testdata <- data.frame(matrix(c("f","Test", 1.75,   16, 0,  16, 0,  1,  1,
"m",    "Test", 1.75,   15, 1,  16, 6.25,   1,  0,
"f",    "Test", 2.75,   4,  12, 16, 75, 1,  1,
"m",    "Test", 2.75,   9,  6,  15, 40, 1,  0,
"f",    "WHO",  1.75,   15, 1,  16, 6.25,   0,  1,
"m",    "WHO",  1.75,   14, 2,  16, 12.5,   0,  0,
"f",    "WHO",  2.75,   2,  13, 15, 86.6667,    0,  1,
"m",    "WHO",  2.75,   3,  13, 16, 81.25,  0,  0
), ncol=9, byrow=TRUE))
names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct", 
                     "vacnum", "sexno")

SAS:

proc genmod data=model_data;
class sex;
model para/n  = dose sex vacnum

  /dist=bin 
  link=logit
  type3;
run;

Analysis Of Maximum Likelihood Parameter Estimates 
Parameter   DF Estimate Std Error Wald 95% Conf Lim Wald Chi-Square Pr > ChiSq 
Intercept   1 -9.4020   1.6220    -12.5810  -6.2230  33.60           <.0001 
dose        1  3.9208   0.6460      2.6546   5.1870  36.83           <.0001 
sex f       1  0.5574   0.5184     -0.4587   1.5735   1.16           0.2823 
sex m       0  0.0000   0.0000      0.0000   0.0000    .              . 
vacnum      1 -1.3221   0.5483     -2.3967  -0.2475   5.81           0.0159 
Scale       0  1.0000   0.0000      1.0000   1.0000     

回复:

testdata$sexno <- as.factor(testdata$sexno)    
a <- contr.treatment(2, base = 1, contrasts = TRUE)

contrasts(testdata$sexno) <- a

fitreduced <- glm(para/n ~ dose + as.factor(sex) + vacnum, 
                  family=quasibinomial(link="logit"), data=testdata)

coef(summary(fitreduced))

                  Estimate Std. Error   t value    Pr(>|t|)
(Intercept)     -9.4013750  1.7613982 -5.337450 0.005935450
dose             3.9173794  0.7001133  5.595351 0.005007179
as.factor(sex)1  0.5704671  0.5568436  1.024466 0.363525300
vacnum          -1.3336100  0.5887552 -2.265135 0.086189704

我相信我有正确的对比来给我一个 III 型 SS,但价值观有一点差异,这里有什么遗漏吗?

1个回答

我注意到这里有几件事。

首先,当您通过 输入数据时matrix,所有数据必须是同一类型。因此,它们被强制成为最具包容性的类型,字符串,而这些类型又被默认强制为因子。笔记:

testdata <- data.frame(matrix(c("f","Test", 1.75,   16, 0,  16, 0,  1,  1,
...
sapply(testdata, class)
#      sex  vaccine     dose    not_p     para        n      pct   vacnum    sexno
# "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor" "factor"

尝试read.table(text='...', sep=",")改用:

testdata <- read.table(text='"f", "Test", 1.75,   16,   0,  16,  0,      1,  1
"m", "Test", 1.75,   15,   1,  16,  6.25,   1,  0
"f", "Test", 2.75,    4,  12,  16, 75,      1,  1
"m", "Test", 2.75,    9,   6,  15, 40,      1,  0
"f", "WHO",  1.75,   15,   1,  16,  6.25,   0,  1
"m", "WHO",  1.75,   14,   2,  16, 12.5,    0,  0
"f", "WHO",  2.75,    2,  13,  15, 86.6667, 0,  1
"m", "WHO",  2.75,    3,  13,  16, 81.25,   0,  0', sep=",")
names(testdata) <- c("sex", "vaccine", "dose", "not_p", "para", "n", "pct", 
                     "vacnum", "sexno")
sapply(testdata, class)
#      sex   vaccine      dose     not_p      para         n       pct    vacnum 
# "factor"  "factor" "numeric" "integer" "integer" "integer" "numeric" "integer" 
#     sexno 
# "integer" 

(那是小土豆。)下一个要担心的陷阱是二项式数据的 SAS 和 R 代码逻辑回归不同。SAS 使用“事件超过试验”,但 R 使用几率、成功/失败。因此,您的模型公式应为:

form <- as.formula("cbind(para, n-para) ~ dose + sex + vacnum")

最后,您family=quasibinomial在 R 代码中指定(即准\DIST=BIN二项式),但在 SAS 代码中指定(即二项式)。要匹配 SAS 输出,请改用二项式。因此,您的最终模型是:

fitreduced <- glm(form, family=binomial(link="logit"), data=testdata)
coef(summary(fitreduced))
#               Estimate Std. Error   z value     Pr(>|z|)
# (Intercept) -9.4020028  1.6219570 -5.796703 6.763131e-09
# dose         3.9207805  0.6460193  6.069138 1.285986e-09
# sexf         0.5574087  0.5184112  1.075225 2.822741e-01
# vacnum      -1.3221011  0.5482645 -2.411430 1.589012e-02

这似乎与 SAS 估计值和标准误差相匹配。