机器算法验证 - 为什么高斯过程是时间序列预测的有效统计模型？ - 吾爱随笔录

为什么高斯过程是时间序列预测的有效统计模型？

机器算法验证时间序列预测自相关高斯过程

2022-03-07 22:05:20

重复免责声明：我知道这个问题

但这不是重复的，因为这个问题只涉及对协方差函数的修改，而我认为实际上噪声项也必须修改。

问题：在时间序列模型中，非参数回归（iid 数据）的通常假设失败，因为残差是自相关的。现在，高斯过程回归是非参数回归的一种形式，因为基础模型是，假设我们有一个独立同分布的样本 $D=\{(t_i,y_i)\}_{i=1}^N$ ：

y_{i} = G P (t_{i}) + ϵ_{i}, i = 1, \dots, N

$y_i = \mathcal{GP}(t_i)+\epsilon_i,\ i=1,\dots,N$

在哪里 $\mathcal{GP}(t)$ 是一个高斯过程，具有平均函数 $mu(t)$ （通常假定为 0）和协方差函数 $k(t,t')$ ，尽管 $\epsilon\sim\mathcal{N}(0,\sigma)$ . 然后我们使用贝叶斯推理来计算后验预测分布，给定样本 $D$ .

但是，在时间序列数据中，样本不是独立同分布的。因此：

我们如何证明使用这种模型的合理性？
由于使用 GP 进行时间序列预测的均值函数通常被假定为零，因此当我在未来计算足够远的预测时，我预计它将恢复为数据的均值。这似乎是一个特别糟糕的选择，因为我希望（原则上）能够预测尽可能远的未来，并且模型设法使整体时间趋势正确，只是增加了预测不确定性（参见下面的 ARIMA 模型案例）：

在使用 GP 进行时间序列预测时，这是如何处理的？

2个回答

一些相关概念可能会出现在问题中，为什么在 GAM 中包含纬度和经度来解释空间自相关？

如果您在回归中使用高斯处理，那么您将趋势包含在模型定义中 $y(t) = f(t,\theta) + \epsilon(t)$ 这些错误在哪里 $\epsilon(t) \sim \mathcal{N}(0,{\Sigma})$ 和 $\Sigma$ 取决于点之间距离的某些函数。

就您的数据而言，CO2 水平可能是周期性分量比具有周期性相关性的噪声更具系统性，这意味着您最好将其合并到模型中

`DiceKriging`使用R 中的模型进行演示。

第一张图片显示了趋势线的拟合 $y(t) = \beta_0 + \beta_1 t + \beta_2 t^2 +\beta_3 \sin(2 \pi t) + \beta_4 \sin(2 \pi t)$ .

95% 的置信区间与 arima 图像相比要小得多。但请注意，残差项也非常小，并且有很多数据点。为了比较，还做了其他三个拟合。

适合具有较少数据点的更简单（线性）模型。在这里，您可以看到趋势线误差的影响，导致预测置信区间在外推更远时增加（该置信区间也仅与模型正确一样正确）。
一个普通的最小二乘模型。您可以看到它提供了与高斯过程模型或多或少相同的置信区间
一个普通的克里金模型。这是一个不包括趋势的高斯过程。当您外推很远时，预测值将等于平均值。误差估计很大，因为残差项（数据均值）很大。

library(DiceKriging)
library(datasets)


# data
y <- as.numeric(co2)
x <- c(1:length(y))/12

# design-matrix 
# the model is a linear sum of x, x^2, sin(2*pi*x), and cos(2*pi*x)
xm <- cbind(rep(1,length(x)),x, x^2, sin(2*pi*x), cos(2*pi*x))
colnames(xm) <- c("i","x","x2","sin","cos")

# fitting non-stationary Gaussian processes 
epsilon <- 10^-3
fit1 <- km(formula= ~x+x2+sin+cos, 
          design = as.data.frame(xm[,-1]), 
          response = as.data.frame(y),
          covtype="matern3_2", nugget=epsilon)

# fitting simpler model and with less data (5 years)
epsilon <- 10^-3
fit2 <- km(formula= ~x, 
           design = data.frame(x=x[120:180]), 
           response = data.frame(y=y[120:180]),
           covtype="matern3_2", nugget=epsilon)

# fitting OLS
fit3 <- lm(y~1+x+x2+sin+cos, data = as.data.frame(cbind(y,xm)))

# ordinary kriging 
epsilon <- 10^-3
fit4 <- km(formula= ~1, 
           design = data.frame(x=x), 
           response = data.frame(y=y),
           covtype="matern3_2", nugget=epsilon)


# predictions and errors
newx <- seq(0,80,1/12/4)
newxm <- cbind(rep(1,length(newx)),newx, newx^2, sin(2*pi*newx), cos(2*pi*newx))
colnames(newxm) <- c("i","x","x2","sin","cos")
# using the type="UK" 'universal kriging' in the predict function
# makes the prediction for the SE take into account the variance of model parameter estimates
newy1 <- predict(fit1, type="UK", newdata = as.data.frame(newxm[,-1]))
newy2 <- predict(fit2, type="UK", newdata = data.frame(x=newx))
newy3 <- predict(fit3, interval = "confidence", newdata = as.data.frame(x=newxm))
newy4 <- predict(fit4, type="UK", newdata = data.frame(x=newx))

# plotting
plot(1959-1/24+newx, newy1$mean,
 col = 1, type = "l",
 xlim = c(1959, 2039), ylim=c(300, 480),
 xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy1$lower95), newy1$upper95), 
        col = rgb(0,0,0,0.3), border = NA)
points(1959-1/24+x, y, pch=21, cex=0.3, col=1, bg="white")
title("Gausian process with polynomial + trigonometric function for trend")

# plotting
plot(1959-1/24+newx, newy2$mean,
 col = 2, type = "l",
 xlim = c(1959, 2010), ylim=c(300, 380),
 xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy2$lower95), newy2$upper95), 
        col = rgb(1,0,0,0.3), border = NA)
points(1959-1/24+x, y, pch=21, cex=0.5, col=1, bg="white")
points(1959-1/24+x[120:180], y[120:180], pch=21, cex=0.5, col=1, bg=2)
title("Gausian process with linear function for trend")

# plotting
plot(1959-1/24+newx, newy3[,1],
     col = 1, type = "l",
     xlim = c(1959, 2039), ylim=c(300, 480),
     xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy3[,2]), newy3[,3]), 
        col = rgb(0,0,0,0.3), border = NA)
points(1959-1/24+x, y, pch=21, cex=0.3, col=1, bg="white")
title("Ordinory linear regression with polynomial + trigonometric function for trend")


# plotting
plot(1959-1/24+newx, newy4$mean,
 col = 1, type = "l",
 xlim = c(1959, 2039), ylim=c(300, 480),
 xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy4$lower95), newy4$upper95), 
        col = rgb(0,0,0,0.3), border = NA, lwd=0.01)
points(1959-1/24+x, y, pch=21, cex=0.5, col=1, bg="white")
title("ordinary kriging")

GP 的主要假设之一是数据应该是固定的。您的数据具有明显的趋势，因此它不是静止的。在时间序列（以及任何其他类型的数据）中使用 GP 的正确方法是首先删除一些明显的趋势，然后将 GP 应用于残差。

其它你可能感兴趣的问题

为什么高斯过程是时间序列预测的有效统计模型？

DiceKriging使用R 中的模型进行演示。

`DiceKriging`使用R 中的模型进行演示。