R中的ARMA建模

机器算法验证 r 时间序列
2022-03-29 02:18:56

我有一个时间序列。我想使用 ARMA 对其进行建模,它将用于预测。

在 RI 中使用arima()函数来获取系数。arima()需要 order(p,d,q) 作为输入。在 R 中,为 p 和 q(d = 0)得出一个好的值,这样我就不会过度拟合,最简单的方法是什么?

2个回答

值的最简单方法是使用forecast中的函数。在任何统计数据包中都没有最简单的方法可以得出好的值。主要原因是没有普遍的好定义。pqauto.arima

由于您提到过拟合,一种可能的方法是为的不同值拟合 arima 模型,然后根据您的过拟合标准(例如,样本预测性能)选择最佳模型。基本相同,您可以在 AIC、AICC 和 BIC 之间进行选择,以选择最佳模型。pqauto.arimaauto.arima

一种选择是使用以下组合来拟合一系列 ARMA 模型pq并使用最“适合”的模型。在这里,我使用 BIC 评估“拟合”以尝试惩罚过于复杂的拟合。下面显示了内置 Mauna Loa 的示例CO2浓度数据集

## load the data
data(co2)
## take only data up to end of 1990 - predict for remaining data later
CO2 <- window(co2, end = c(1990, 12))

## Set up the parameter sets over which we want to operate
CO2.pars <- expand.grid(ar = 0:2, diff = 1, ma = 0:2, sar = 0:1,
                        sdiff = 1, sma = 0:1)
## As you are only wanting ARMA, then you would need something like
## pars <- expand.grid(ar = 0:4, diff = 0, ma = 0:4)
## and where you choose the upper and lower limits - here 0 and 4

## A vector to hold the BIC values for each combination of model
CO2.bic <- rep(0, nrow(CO2.pars))

## loop over the combinations, fitting an ARIMA model and recording the BIC
## for that model. Note we use AIC() with extra penalty given by `k`
for (i in seq(along = CO2.bic)) {
    CO2.bic[i] <- AIC(arima(CO2, unlist(CO2.pars[i, 1:3]), 
                            unlist(CO2.pars[i, 4:6])),
                      k = log(length(CO2)))
}

## identify the model with lowest BIC
CO2.pars[which.min(CO2.bic), ]

## Refit the model with lowest BIC
CO2.mod <- arima(CO2, order = c(0, 1, 1), seasonal = c(0, 1, 1))
CO2.mod
## Diagnostics plots
tsdiag(CO2.mod, gof.lag = 36)

## predict for the most recent data
pred <- predict(CO2.mod, n.ahead = 7 * 12)
upr <- pred$pred + (2 * pred$se) ## upper and lower confidence intervals
lwr <- pred$pred - (2 * pred$se) ## approximate 95% pointwise

## plot what we have done
ylim <- range(co2, upr, lwr)
plot(co2, ylab = ylab, main = expression(bold(Mauna ~ Loa ~ CO[2])),
     xlab = "Year", ylim = ylim)
lines(pred$pred, col = "red")
lines(upr, col = "red", lty = 2)
lines(lwr, col = "red", lty = 2)
legend("topleft", legend = c("Observed", "Predicted", "95% CI"),
       col = c("black", "red", "red"), lty = c(1, 1, 2), bty = "n")