R 中 Kolmogorov-Smirnov 检验的值D最大限度Dmax

机器算法验证 r 分布 kolmogorov-smirnov 测试
2022-04-07 11:29:18

我正在使用ks.test()R 中的 Kolmogorov-Smirnov 检验 [] 比较两个分布,并且想知道Dmax出现的数值是多少。据我所知,它不在输出中ks.test(),所以我想知道是否有人对我如何找到它有任何想法。谢谢!

2个回答

类似的东西?Dmax 出现在 值处max.at

set.seed(12345)

x <- rnorm(10000, 5, 5)
y <- rnorm(10000, 7, 6.5)

# remove any missings from the data

x <- x[!is.na(x)]
y <- y[!is.na(y)]

ecdf.x <- ecdf(x)
ecdf.y <- ecdf(y)

plot(ecdf.x, xlim=c(min(c(x,y)), max(c(x,y))), verticals=T, cex.lab=1.2, cex.axis=1.3,
     las=1, col="skyblue4", lwd=2, main="")

plot(ecdf.y, verticals=T, add=T, do.points=FALSE, cex.lab=1.2,
     cex.axis=1.3, col="red", lwd=2)

n.x <- length(x)
n.y <- length(y)

n <- n.x * n.y/(n.x + n.y)
w <- c(x, y)

z <- cumsum(ifelse(order(w) <= n.x, 1/n.x, -1/n.y))

max(abs(z)) # Dmax
[1] 0.1664

ks.test(x,y)$statistic # the same
     D 
0.1664

max.at <- sort(w)[which(abs(z) == max(abs(z)))]
[1] 9.082877

# Draw vertical line

abline(v=max.at, lty=2)

lines(abs(z)~sort(w), col="purple", lwd=2)

legend("topleft", legend=c("x", "y", "|Distance|"), col=c("skyblue4", "red", "purple"), lwd=c(2,2,2), bty="n")

Dmax 可视化

您还可以使用@COOLSerdash 的答案加环境使ks.test函数直接输出值,如下所示:

ks.test.2 <- function(x, y, ..., alternative = c("two.sided", "less", "greater"), 
    exact = NULL) 
{
e <- new.env()
ks.test.2 <- ks.test
environment(ks.test.2) <- e
e$C_pkstwo <- stats:::C_pkstwo
    e$C_psmirnov2x <- stats:::C_psmirnov2x
e$C_pkolmogorov2x <- stats:::C_pkolmogorov2x
    e$return <- function(x){
    w<-get("w", envir=parent.frame())
    z<-get("z", envir=parent.frame())
    x$max.at <- sort(w)[which(abs(z) == max(abs(z)))]
    return(x)
    }
ks.test.2(x, y, ..., alternative = c("two.sided", "less", "greater"), 
    exact = NULL)
}

除了现在它还返回所需的组件之外,该函数的ks.test.2行为应该完全相同。ks.testmax.at

set.seed(12345)

x <- rnorm(10000, 5, 5)
y <- rnorm(10000, 7, 6.5)

ks.test.2(x,y)$max.at
# [1] 9.082877

这仅适用于双面替代方案,但如果需要,您可以增强它以处理单面替代方案。