我运行了一些带有偏移的 glm 和线性模型。数据集中的每一行都是医疗保健用户。数据包含 2000 年至 2007 年间每位用户的医疗费用和 icu 天数。由于用户与机构接触的年数不同(例如,有些人在 2001-2003 年来到医疗机构,而有些人在所有年份都来了),我认为我应该抵消“观察期”的年数。而且接触时间越长,支付和ICU天数越高,这是常识。
伽玛:
Call:
glm(formula = payment_amt ~ offset(log(years)) +
as.factor(gender) + age,
family = Gamma(link = "log"), data = pm, control = glm.control(maxit = 50))
Deviance Residuals:
Min 1Q Median 3Q Max
-3.8787 -1.2142 -0.5339 0.1904 15.1442
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.6718536 0.0134132 348.3 <2e-16 ***
as.factor(gender)M 0.7800695 0.0024625 316.8 <2e-16 ***
age 0.0642834 0.0001908 337.0 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 1.238685)
Null deviance: 1520252 on 852449 degrees of freedom
Residual deviance: 1251859 on 852447 degrees of freedom
AIC: 20497443
Number of Fisher Scoring iterations: 8
OLS:
Call:
lm(formula = payment_amt ~ offset(years) +
as.factor(gender) + age,
data = pm)
Residuals:
Min 1Q Median 3Q Max
-170257 -53628 -23808 15835 8808825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -206943.18 1425.83 -145.1 <2e-16 ***
as.factor(gender)M 48794.00 261.77 186.4 <2e-16 ***
age 3547.31 20.28 174.9 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 118300 on 852447 degrees of freedom
Multiple R-squared: 0.07811, Adjusted R-squared: 0.0781
F-statistic: 3.611e+04 on 2 and 852447 DF, p-value: < 2.2e-16
Poisson:
Call:
glm(formula = icu_days ~ offset(log(years)) + as.factor(gender) +
age, family = poisson(link = "log"),
data = pm)
Deviance Residuals:
Min 1Q Median 3Q Max
-56.95 -15.11 -7.11 3.22 747.64
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.518e-01 9.058e-04 -609.2 <2e-16 ***
as.factor(gender)M 6.357e-01 1.341e-04 4738.9 <2e-16 ***
age 6.395e-02 1.246e-05 5130.9 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
负二项式:
Call:
glm.nb(formula = icu_days ~ offset(log(years)) +
as.factor(gender) +
age, data = pm, init.theta = 0.9279403178,
link = log)
Deviance Residuals:
Deviance Residuals:
Min 1Q Median 3Q Max
-2.7720 -1.0788 -0.4652 0.1641 17.2095
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0038131 0.0126237 -79.52 <2e-16 ***
as.factor(gender)M 0.5977179 0.0023017 259.69 <2e-16 ***
age 0.0708916 0.0001795 394.97 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.9279) family taken to be 1)
我可以知道如何解释系数吗?