机器算法验证 - R中时间序列的ACF的“手动”计算 - 接近但不完全 - 吾爱随笔录

R中时间序列的ACF的“手动”计算 - 接近但不完全

机器算法验证 r 时间序列自相关

2022-03-27 19:34:01

我试图了解在时间序列中计算 ACF 值背后的“机制”。作为“向自己证明自己的练习” [注意：我更新了此链接中的代码以反映已接受答案中的信息]，我并不关注代码的优雅性，而是故意使用循环。问题是，虽然我得到的值与生成的值接近acf()，但它们并不相等，而且在某些时候，并不是那么接近。有什么概念上的误解吗？

正如上面链接的在线注释中所记，数据是根据这个在线示例生成的。

set.seed(0)                          # To reproduce results
x = seq(pi, 10 * pi, 0.1)            # Creating time-series as sin function.
y = 0.1 * x + sin(x) + rnorm(x)      # Time-series sinusoidal points with noise.
y = ts(y, start = 1800)              # Labeling time axis.
model = lm(y ~ I(1801:2083))         # Detrending (0.1 * x)
st.y = y - predict(model)            # Final de-trended ts (st.y)
ACF = 0                              # Empty vector to capture the auto-correlations.
ACF[1] = cor(st.y, st.y)             # The first entry in the ACF is the cor with itself.
for(i in 1:24){                      # Took 24 points to parallel the output of `acf()`
  lag = st.y[-c(1:i)]                # Introducing lags in the stationary ts.
  clipped.y = st.y[1:length(lag)]    # Compensating by reducing length of ts.
  ACF[i + 1] = cor(clipped.y, lag)   # Storing each correlation.
}
w = acf(st.y)                        # Extracting values of acf without plotting.
all.equal(ACF, as.vector(w$acf))     # Checking for equality in values.
# Pretty close to manual calc: Mean relative difference: 0.03611463"

为了对相对输出有一个切实的了解，这是手动计算的自相关的样子：

1.0000000 0.3195564 0.3345448 0.2877745 0.2783382 0.2949996 ... 
... -0.1262182 -0.1795683 -0.1941921 -0.1352814 -0.2153883 -0.3423663

与 Racf()函数相反：

1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 ...
... -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342

正如所建议的那样，这很可能与以下行中的调用中的代码相比得到了解释acf()：

acf <- .Call(C_acf, x, lag.max, type == "correlation")

如何C_acf访问此功能？

我与 PACF 有类似的问题，我认为这是相关的。PACF 值的代码在这里。[注意：在这种情况下，我怀疑它实际上是一个四舍五入的差异]。

1个回答

显然cor并acf没有使用相同的除数。acf使用观察次数作为除数，可以如下所示重现。具体可以C_acf在文件中找到代码R-3.3.2/src/library/stats/src/filter.c（procedures acfand acf0）：

set.seed(0)                          # To reproduce results
x = seq(pi, 10 * pi, 0.1)            # Creating time-series as sin function.
y = 0.1 * x + sin(x) + rnorm(x)      # Time-series sinusoidal points with noise.
y = ts(y, start = 1800)              # Labeling time axis.
model = lm(y ~ I(1801:2083))         # Detrending (0.1 * x)
st.y = y - predict(model)  
lag.max <- 24
ydm <- st.y - mean(st.y)
n <- length(ydm)
ACF <- rep(NA, lag.max+1)
for (tau in seq(0, lag.max))
  ACF[tau+1] <- sum(ydm * lag(ydm, -tau)) / n
ACF <- ACF/ACF[1]
names(ACF) <- paste0("lag", seq.int(0, lag.max))
ACF
head(ACF)
# lag0      lag1      lag2      lag3      lag4      lag5 
# 1.0000000 0.3187104 0.3329545 0.2857004 0.2745302 0.2907426 
tail(ACF)
# lag19      lag20      lag21      lag22      lag23      lag24 
# -0.1144625 -0.1621018 -0.1737770 -0.1203673 -0.1924761 -0.3069342 
all.equal(ACF, acf(st.y, lag.max=lag.max, plot=FALSE)$acf[,,1], check.names=FALSE)
# [1] TRUE

其它你可能感兴趣的问题

上一篇简单神经网络的最小训练规模下一篇我们应该在 sklearn 中使用 VarianceThreshold 之前进行标准化吗？