对于黄土回归,我作为非统计学家的理解是,您可以根据视觉解释选择跨度(具有大量跨度值的图可以选择看起来合适的平滑量最少的图),或者您可以使用交叉验证(CV) 或广义交叉验证 (GCV)。下面是我用于黄土回归的 GCV 的代码,该代码基于 Takezawa 的优秀书籍Introduction to Nonparametric Regression(来自 p219)中的代码。
locv1 <- function(x1, y1, nd, span, ntrial)
{
locvgcv <- function(sp, x1, y1)
{
nd <- length(x1)
assign("data1", data.frame(xx1 = x1, yy1 = y1))
fit.lo <- loess(yy1 ~ xx1, data = data1, span = sp, family = "gaussian", degree = 2, surface = "direct")
res <- residuals(fit.lo)
dhat2 <- function(x1, sp)
{
nd2 <- length(x1)
diag1 <- diag(nd2)
dhat <- rep(0, length = nd2)
for(jj in 1:nd2){
y2 <- diag1[, jj]
assign("data1", data.frame(xx1 = x1, yy1 = y2))
fit.lo <- loess(yy1 ~ xx1, data = data1, span = sp, family = "gaussian", degree = 2, surface = "direct")
ey <- fitted.values(fit.lo)
dhat[jj] <- ey[jj]
}
return(dhat)
}
dhat <- dhat2(x1, sp)
trhat <- sum(dhat)
sse <- sum(res^2)
cv <- sum((res/(1 - dhat))^2)/nd
gcv <- sse/(nd * (1 - (trhat/nd))^2)
return(gcv)
}
gcv <- lapply(as.list(span1), locvgcv, x1 = x1, y1 = y1)
#cvgcv <- unlist(cvgcv)
#cv <- cvgcv[attr(cvgcv, "names") == "cv"]
#gcv <- cvgcv[attr(cvgcv, "names") == "gcv"]
return(gcv)
}
并使用我的数据,我做了以下事情:
nd <- length(Edge2$Distance)
xx <- Edge2$Distance
yy <- lcap
ntrial <- 50
span1 <- seq(from = 0.5, by = 0.01, length = ntrial)
output.lo <- locv1(xx, yy, nd, span1, ntrial)
#cv <- output.lo
gcv <- output.lo
plot(span1, gcv, type = "n", xlab = "span", ylab = "GCV")
points(span1, gcv, pch = 3)
lines(span1, gcv, lwd = 2)
gpcvmin <- seq(along = gcv)[gcv == min(gcv)]
spangcv <- span1[pgcvmin]
gcvmin <- cv[pgcvmin]
points(spangcv, gcvmin, cex = 1, pch = 15)
抱歉,代码相当草率,这是我第一次使用 R,但它应该让您了解如何对黄土回归进行 GSV,以比简单的视觉检查更客观的方式找到最佳跨度。在上图中,您对最小化函数的跨度感兴趣(绘制的“曲线”上的最低值)。