我正在为其他人处理时间序列数据,这些数据计算了 48 个月内发生关闭的紧急部门相关事件,并希望调查这些关闭的影响。测量是事件的计数或每个时间点的比例。
最初他们要求我使用 Prais-Winsten 时间序列回归,它解释了重复测量之间的自相关。他们现在要求我对计数结果执行负二项式时间序列,以检查建模方法之间的一致性(我最初提出了 Prais-Winsten 的替代方案,例如 GARCH 或 ARIMA 模型,但他们不感兴趣)。
我现在已经执行了要求的分析,但我正在努力协调这两种方法的结果,非常感谢您的建议。
数据
> dput(sample)
structure(list(relative.month = 1:48, value = c(2407, 2019, 2623,
2443, 2043, 2605, 2623, 2365, 2822, 2558, 2144, 2601, 3034, 2878,
2883, 1995, 2323, 2861, 2193, 2729, 2859, 3118, 2064, 2350, 1701,
1665, 1848, 1428, 1371, 2043, 1218, 790, 1502, 1591, 1661, 2074,
713, 1531, 1460, 1444, 1391, 1632, 1623, 934, 1718, 1217, 1439,
1377), step = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), town = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1)), .Names = c("relative.month", "value", "step", "town"), row.names = c(NA, -48L), class = "data.frame")
## Pre and post Means
> group_by(sample, step) %>%
summarise(mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE))
step mean sd
<dbl> <dbl> <dbl>
1 0 2522.500 336.5390
2 1 1473.792 334.4394
## Difference in means Before (0) and After (1)
> 2522.500 - 1473.792
[1] 1048.708
## Percentage difference in means
> 100* (2522.500 - 1473.792) / 2522.500
[1] 41.57415
快速绘制数据图......
普拉斯-温斯滕
一步运行一个非常简单的 Prais-Winsten 回归并考虑自相关......
## Simple Prais-Winsten Regression with 1-step
model <- reformulate(response = 'value',
termlabels = c('step'))
pwr.1step <- panelAR(formula = model,
data = sample,
panelVar = 'town',
timeVar = 'relative.month',
autoCorr = 'ar1',
panelCorrMethod = 'pcse',
rho.na.rm = TRUE,
seq.times = FALSE)
pwr.1step %>% summary()
Panel Regression with AR(1) Prais-Winsten correction and panel-corrected standard errors
Balanced Panel Design:
Total obs.: 48 Avg obs. per panel 48
Number of panels: 1 Max obs. per panel 48
Number of times: 48 Min obs. per panel 48
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2524.55 59.99 42.09 < 2e-16 ***
step -1051.90 85.00 -12.38 3.05e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-squared: 0.7598
Wald statistic: 153.1524, Pr(>Chisq(1)): 0
我对此的解释是,该步骤导致观察到的事件数量平均减少,大约为 908 个事件(/平均值),这与 (0) 之前和 (1) 之后的均值差异相差不远即 904(或减少 37.56%)。
包括(相对)时间作为协变量来估计随时间的变化......
## Simple Prais-Winsten Regression with 1-step and time as co-variate
model <- reformulate(response = 'value',
termlabels = c('closure', 'relative.month'))
pwr.1step.time <- panelAR(formula = model,
data = sample,
panelVar = 'town',
timeVar = 'relative.month',
autoCorr = 'ar1',
panelCorrMethod = 'pcse',
rho.na.rm = TRUE,
seq.times = FALSE)
pwr.1step.time %>% summary()
Panel Regression with AR(1) Prais-Winsten correction and panel-corrected standard errors
Balanced Panel Design:
Total obs.: 48 Avg obs. per panel 48
Number of panels: 1 Max obs. per panel 48
Number of times: 48 Min obs. per panel 48
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2526.3522 97.9934 25.781 < 2e-16 ***
step -1048.3859 172.1547 -6.090 2.3e-07 ***
relative.month -0.1453 6.2160 -0.023 0.981
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-squared: 0.7596
Wald statistic: 152.9394, Pr(>Chisq(2)): 0
与更简单的模型一样,该步骤之后的事件数量仍然减少,但包含时间变量表明,在整个时间序列中(即在 (0) 之前和 (1) 之后),往往会减少随着时间的推移事件的数量,虽然这在步骤之后更加明显。
负二项式时间序列
只需一步即可在非常简单的模型上运行负二项式时间序列,但允许连续观察之间的相关性......
## Negative Binomial Time-Series with 1-step
> outcome <- sample[,'value']
> covariates <- dplyr::select(sample, step)
> negbin.1step <- tsglm(ts = outcome,
link = 'log',
model = list(past_obs = 1),
xreg = covariates,
distr = 'nbinom')
> negbin.1step %>% summary()
Call:
tsglm(ts = outcome, model = list(past_obs = 1), xreg = covariates,
link = "log", distr = "nbinom")
Coefficients:
Estimate Std.Error CI(lower) CI(upper)
(Intercept) 8.3258 1.0195 6.328 10.324
beta_1 -0.0629 0.1301 -0.318 0.192
step -0.5715 0.0888 -0.746 -0.397
sigmasq 0.0349 NA NA NA
Standard errors and confidence intervals (level = 95 %) obtained
by normal approximation.
Link function: log
Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
Number of coefficients: 4
Log-likelihood: -351.8314
AIC: 711.6627
BIC: 719.1476
QIC: 3760.517
## Exponentiate the coefficients to obtain a percentage change
> negbin.1step %>% coefficients() %>% exp()
(Intercept) beta_1 step
4128.9061822 0.9389960 0.5646913
通读tscount 小插图后,我认为这意味着在该步骤之后每个月观察到的事件数量(大约)减少了 44%,并且事件数量减少了小幅(~7%)随着时间的推移(beta_1
系数)。
添加时间作为协变量以匹配 Prais-Winsten 回归给出...
## Negative Binomial Time-Series with 1-step and time
> outcome <- sample[,'value']
> covariates <- dplyr::select(sample, step, relative.month)
> negbin.1step.time <- tsglm(ts = outcome,
link = 'log',
model = list(past_obs = 1),
xreg = covariates,
distr = 'nbinom')
> negbin.1step.time %>% summary()
Call:
tsglm(ts = outcome, model = list(past_obs = 1), xreg = covariates,
link = "log", distr = "nbinom")
Coefficients:
Estimate Std.Error CI(lower) CI(upper)
(Intercept) 8.335323 1.04206 6.29292 10.37773
beta_1 -0.063606 0.13211 -0.32255 0.19533
step -0.563206 0.12753 -0.81316 -0.31325
relative.month -0.000357 0.00412 -0.00843 0.00771
sigmasq 0.035660 NA NA NA
Standard errors and confidence intervals (level = 95 %) obtained
by normal approximation.
Link function: log
Distribution family: nbinom (with overdispersion coefficient 'sigmasq')
Number of coefficients: 5
Log-likelihood: -351.7691
AIC: 713.5383
BIC: 722.8943
QIC: 3923.615
> negbin.1step.time %>% coefficients() %>% exp()
(Intercept) beta_1 step relative.month
4168.5475445 0.9383745 0.5693807 0.9996430
在这一步有大致相同的减少,并且考虑到随着时间的推移轻微减少的相关性 (the beta_1
),relative.month
协变量几乎没有随时间变化。
问题
等效性 - 结果告诉我大致相同的事情,即由步骤引起的事件数量减少,并且随着时间的推移,事件数量减少。
beta_1
考虑到它估计连续观察对之间的相关性,将负二项式中的分量解释为大致相当于相对月份的影响是否合理?如果是这样,是否不需要relative.month
在负二项式时间序列中包含协变量?当观察到的事件数以千计时,使用负二项式时间序列模型是否合理?有人建议这样做可能不合适,因为负二项式分布适用于过度分散(即由于可能的偏斜而导致高方差)的计数。
从上一个问题开始,哪种方法最适合报告?