机器算法验证 - 如何从非参数估计分布中抽取随机样本？ - 吾爱随笔录

如何从非参数估计分布中抽取随机样本？

机器算法验证 r 采样内核平滑

2022-01-20 02:46:52

我有 100 个连续的一维点的样本。我使用核方法估计了它的非参数密度。如何从这个估计分布中抽取随机样本？

1个回答

核密度估计是混合分布；对于每一次观察，都有一个内核。如果内核是一个缩放密度，这会导致一个简单的算法从内核密度估计中采样：

repeat nsim times:
  sample (with replacement) a random observation from the data
  sample from the kernel, and add the previously sampled random observation

如果（例如）您使用高斯核，则您的密度估计是 100 个法线的混合，每个法线都以您的一个样本点为中心，并且所有的标准偏差等于估计的带宽。要绘制样本，您只需替换一个样本点（例如）进行采样，然后从中采样。在 R 中： $h$ $x_i$ $N(\mu = x_i, \sigma = h)$

# Original distribution is exp(rate = 5)
N = 1000
x <- rexp(N, rate = 5)

hist(x, prob = TRUE)
lines(density(x))

# Store the bandwith of the estimated KDE
bw <- density(x)$bw

# Draw from the sample and then from the kernel
means <- sample(x, N, replace = TRUE)
hist(rnorm(N, mean = means, sd = bw), prob = TRUE)

严格来说，鉴于混合物的成分权重相同，您可以避免使用替换零件进行抽样，而只需从混合物的每个成分中 $M$

M = 10
hist(rnorm(N * M, mean = x, sd = bw))

如果由于某种原因您无法从内核中提取（例如，您的内核不是密度），您可以尝试使用重要性采样或MCMC。例如，使用重要性抽样：

# Draw from proposal distribution which is normal(mu, sd = 1)
sam <- rnorm(N, mean(x), 1)

# Weight the sample using ratio of target and proposal densities
w <- sapply(sam, function(input) sum(dnorm(input, mean = x, sd = bw)) / 
                                 dnorm(input, mean(x), 1))

# Resample according to the weights to obtain an un-weighted sample
finalSample <- sample(sam, N, replace = TRUE, prob = w)

hist(finalSample, prob = TRUE)

PS感谢为答案做出贡献的Glen_b。

其它你可能感兴趣的问题

上一篇似然值可以在 [0, 1] 范围之外取值吗？下一篇从平滑数据中查找 R 中的拐点