机器算法验证 - 如何从 R 中的经验密度中获得百分位数？ - 吾爱随笔录

如何从 R 中的经验密度中获得百分位数？

机器算法验证 r 分位数内核平滑非参数密度

2022-03-26 04:28:16

中的density()函数R允许我输入观察结果并获得可以绘制 x 和 y 值的经验密度。我喜欢它，因为它允许我根据观察的重要性对观察进行加权，并且它允许我指定我想要的平滑带宽。

我的问题是，一旦我运行该density()函数，我如何从这个密度中获得百分位数？请注意，这与仅从我的数据中获取样本百分位数不同，因为我想对观察结果使用权重。

2个回答

该命令density()虽然对于快速检查KDE非常有用，但也非常严格，因为它只返回网格上的值。我更喜欢编写自己的 KDE（通常使用高斯内核）。这可以获得如下所示（1行代码）：

rm(list=ls())
# Constructing your own KDE
set.seed(123)
sample = rnorm(1000,10,1)
# Bandwidth used by density()
hT = bw.nrd0(sample)
kde <- Vectorize(function(x) mean(dnorm((x-sample)/hT)/hT))
# Comparison
plot(density(sample))
curve(kde,6,13,add=T,col="red")

CDF对应的非参数估计量可以得到如下：

# Obtaining the corresponding kernel distribution estimator 

KDE <-  Vectorize(function(x) mean(pnorm((x-sample)/hT)))
curve(KDE,6,13,col="blue")

如果您可以提供感兴趣的分位数所在的区间，则可以使用这些函数手动近似百分位数：

# Manual calculation of the percentile (requires the probability and an interval containing the quantile)

QKDE <- function(p,Interval){
tempf <- function(t) KDE(t)-p
return(uniroot(tempf,Interval)$root )
}

QKDE(0.5,c(8,12))

这可能不是最有效的方法，但它有效，而且快速准确。我希望这有帮助。

为什么要重新发明轮子？我建议你使用库ewcdf中的spatstat函数。如果我正确理解了您的问题，那么它完全符合您的要求：

library(spatstat)
  x <- rnorm(100)    #data
   w <- runif(100)   #weights
   a1<-ewcdf(x,w)    #empricial *weighted* cdf and quantile function
 quantile(a1,.2)     #calls quantile.ecdf()
 #which is different from quantile because of the effects of the weights:
    quantile(x,.2)

其它你可能感兴趣的问题

上一篇二元假设测试是比 A/B 测试更好的统计术语吗？下一篇Beta rv 的 Gamma 乘积