您可以复制加州大学洛杉矶分校关于比例的常见问题解答(以百分比作为因变量),如下所示:
require(foreign);require(lmtest);require(sandwich)
meals <- read.dta("http://www.ats.ucla.edu/stat/stata/faq/proportion.dta")
fitperc <- glm(meals ~ yr_rnd + parented + api99, family = binomial, data=meals)
## Warning message:
## In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
我不知道上面的警告是否是这里的问题。由于某种原因,截距在 R 和 Stata 中不匹配,但由于我们通常不会在 logit/probit 中解释它,所以它应该无关紧要。
summary(fitperc)
##
## Call:
## glm(formula = meals ~ yr_rnd + parented + api99, family = binomial,
## data = meals, na.action = na.exclude)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.77722 -0.18995 -0.01649 0.18692 1.60959
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.801683 0.231914 29.329 <2e-16 ***
## yr_rndYes 0.048253 0.104210 0.463 0.643
## parented -0.766260 0.090733 -8.445 <2e-16 ***
## api99 -0.007305 0.000506 -14.435 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1953.94 on 4256 degrees of freedom
## Residual deviance: 395.81 on 4253 degrees of freedom
## (164 observations deleted due to missingness)
## AIC: 2936.7
##
## Number of Fisher Scoring iterations: 5
在 R 中,使用的小样本校正与 Stata 中的不同,但稳健的 SE 非常相似:
coeftest(fitperc, function(x) vcovHC(x, type = "HC1"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.80168270 0.07240299 93.9420 <2e-16 ***
## yr_rndYes 0.04825266 0.03218271 1.4993 0.1338
## parented -0.76625982 0.03908528 -19.6048 <2e-16 ***
## api99 -0.00730460 0.00021564 -33.8748 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
要使用完全相同的小样本校正,您需要关注这篇文章:
sandwich1 <- function(object, ...) sandwich(object) * nobs(object) / (nobs(object) - 1)
coeftest(fitperc, vcov = sandwich1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.80168270 0.07237747 93.9751 <2e-16 ***
## yr_rndYes 0.04825266 0.03217137 1.4999 0.1336
## parented -0.76625982 0.03907151 -19.6117 <2e-16 ***
## api99 -0.00730460 0.00021556 -33.8867 <2e-16 ***
## ---
## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
对数似然和置信区间(略有不同,因为估计过程似乎不同):
logLik(fitperc)
## 'log Lik.' -1464.363 (df=4)
confint(fitperc)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 6.352788748 7.262067304
## yr_rndYes -0.155529338 0.253123151
## parented -0.944775733 -0.588903012
## api99 -0.008303668 -0.006319185
要获得预测:
meals_pred <- data.frame(api99=rep(c(500,600,700), 2),
yr_rnd=rep(c("No", "Yes"), times=1, each=3),
parented=rep(2.5, 6))
cbind(meals_pred, pred=predict(fitperc, meals_pred, "response"))
## api99 yr_rnd parented pred
## 1 500 No 2.5 0.7744710
## 2 600 No 2.5 0.6232278
## 3 700 No 2.5 0.4434458
## 4 500 Yes 2.5 0.7827873
## 5 600 Yes 2.5 0.6344891
## 6 700 Yes 2.5 0.4553849
有关相关讨论,请参阅此问题: