一些相关概念可能会出现在问题中,为什么在 GAM 中包含纬度和经度来解释空间自相关?
如果您在回归中使用高斯处理,那么您将趋势包含在模型定义中y(t)=f(t,θ)+ϵ(t)这些错误在哪里ϵ(t)∼N(0,Σ)和Σ取决于点之间距离的某些函数。
就您的数据而言,CO2 水平可能是周期性分量比具有周期性相关性的噪声更具系统性,这意味着您最好将其合并到模型中
DiceKriging
使用R 中的模型进行演示。
第一张图片显示了趋势线的拟合y(t)=β0+β1t+β2t2+β3sin(2πt)+β4sin(2πt).
95% 的置信区间与 arima 图像相比要小得多。但请注意,残差项也非常小,并且有很多数据点。为了比较,还做了其他三个拟合。
- 适合具有较少数据点的更简单(线性)模型。在这里,您可以看到趋势线误差的影响,导致预测置信区间在外推更远时增加(该置信区间也仅与模型正确一样正确)。
- 一个普通的最小二乘模型。您可以看到它提供了与高斯过程模型或多或少相同的置信区间
- 一个普通的克里金模型。这是一个不包括趋势的高斯过程。当您外推很远时,预测值将等于平均值。误差估计很大,因为残差项(数据均值)很大。
library(DiceKriging)
library(datasets)
# data
y <- as.numeric(co2)
x <- c(1:length(y))/12
# design-matrix
# the model is a linear sum of x, x^2, sin(2*pi*x), and cos(2*pi*x)
xm <- cbind(rep(1,length(x)),x, x^2, sin(2*pi*x), cos(2*pi*x))
colnames(xm) <- c("i","x","x2","sin","cos")
# fitting non-stationary Gaussian processes
epsilon <- 10^-3
fit1 <- km(formula= ~x+x2+sin+cos,
design = as.data.frame(xm[,-1]),
response = as.data.frame(y),
covtype="matern3_2", nugget=epsilon)
# fitting simpler model and with less data (5 years)
epsilon <- 10^-3
fit2 <- km(formula= ~x,
design = data.frame(x=x[120:180]),
response = data.frame(y=y[120:180]),
covtype="matern3_2", nugget=epsilon)
# fitting OLS
fit3 <- lm(y~1+x+x2+sin+cos, data = as.data.frame(cbind(y,xm)))
# ordinary kriging
epsilon <- 10^-3
fit4 <- km(formula= ~1,
design = data.frame(x=x),
response = data.frame(y=y),
covtype="matern3_2", nugget=epsilon)
# predictions and errors
newx <- seq(0,80,1/12/4)
newxm <- cbind(rep(1,length(newx)),newx, newx^2, sin(2*pi*newx), cos(2*pi*newx))
colnames(newxm) <- c("i","x","x2","sin","cos")
# using the type="UK" 'universal kriging' in the predict function
# makes the prediction for the SE take into account the variance of model parameter estimates
newy1 <- predict(fit1, type="UK", newdata = as.data.frame(newxm[,-1]))
newy2 <- predict(fit2, type="UK", newdata = data.frame(x=newx))
newy3 <- predict(fit3, interval = "confidence", newdata = as.data.frame(x=newxm))
newy4 <- predict(fit4, type="UK", newdata = data.frame(x=newx))
# plotting
plot(1959-1/24+newx, newy1$mean,
col = 1, type = "l",
xlim = c(1959, 2039), ylim=c(300, 480),
xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy1$lower95), newy1$upper95),
col = rgb(0,0,0,0.3), border = NA)
points(1959-1/24+x, y, pch=21, cex=0.3, col=1, bg="white")
title("Gausian process with polynomial + trigonometric function for trend")
# plotting
plot(1959-1/24+newx, newy2$mean,
col = 2, type = "l",
xlim = c(1959, 2010), ylim=c(300, 380),
xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy2$lower95), newy2$upper95),
col = rgb(1,0,0,0.3), border = NA)
points(1959-1/24+x, y, pch=21, cex=0.5, col=1, bg="white")
points(1959-1/24+x[120:180], y[120:180], pch=21, cex=0.5, col=1, bg=2)
title("Gausian process with linear function for trend")
# plotting
plot(1959-1/24+newx, newy3[,1],
col = 1, type = "l",
xlim = c(1959, 2039), ylim=c(300, 480),
xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy3[,2]), newy3[,3]),
col = rgb(0,0,0,0.3), border = NA)
points(1959-1/24+x, y, pch=21, cex=0.3, col=1, bg="white")
title("Ordinory linear regression with polynomial + trigonometric function for trend")
# plotting
plot(1959-1/24+newx, newy4$mean,
col = 1, type = "l",
xlim = c(1959, 2039), ylim=c(300, 480),
xlab = "time [years]", ylab = "atmospheric CO2 [ppm]")
polygon(c(rev(1959-1/24+newx), 1959-1/24+newx), c(rev(newy4$lower95), newy4$upper95),
col = rgb(0,0,0,0.3), border = NA, lwd=0.01)
points(1959-1/24+x, y, pch=21, cex=0.5, col=1, bg="white")
title("ordinary kriging")