机器算法验证 - 对 95% 置信区间的重复实验解释的模拟研究存在问题 - 我哪里出错了？ - 吾爱随笔录

我正在尝试编写一个 R 脚本来模拟 95% 置信区间的重复实验解释。我发现它高估了一个比例的真实总体值包含在样本的 95% CI 内的次数比例。差别不大——大约 96% 和 95%，但这仍然让我感兴趣。

我的函数samp_n从具有概率的伯努利分布中抽取样本pop_p，然后prop.test()使用连续性校正或更准确地使用来计算 95% 的置信区间binom.test()。如果真实人口比例pop_p包含在 95% CI 内，则返回 1。我写了两个函数，一个使用prop.test()，一个使用binom.test()，并且两者都有相似的结果：

in_conf_int_normal <- function(pop_p = 0.3, samp_n = 1000, correct = T){
    ## uses normal approximation to calculate confidence interval
    ## returns 1 if the CI contain the pop proportion
    ## returns 0 otherwise
    samp <- rbinom(samp_n, 1, pop_p)
    pt_result <- prop.test(length(which(samp == 1)), samp_n)
    lb <- pt_result$conf.int[1]
        ub <- pt_result$conf.int[2]
    if(pop_p < ub & pop_p > lb){
        return(1)
    } else {
    return(0)
    }
}
in_conf_int_binom <- function(pop_p = 0.3, samp_n = 1000, correct = T){
    ## uses Clopper and Pearson method
    ## returns 1 if the CI contain the pop proportion
    ## returns 0 otherwise
    samp <- rbinom(samp_n, 1, pop_p)
    pt_result <- binom.test(length(which(samp == 1)), samp_n)
    lb <- pt_result$conf.int[1]
        ub <- pt_result$conf.int[2] 
    if(pop_p < ub & pop_p > lb){
        return(1)
    } else {
    return(0)
    }
 }

我发现，当你重复实验几千次时，pop_p样本的 95% CI 内的次数比例更接近于 0.96 而不是 0.95。

set.seed(1234)
times = 10000
results <- replicate(times, in_conf_int_binom())
sum(results) / times
[1] 0.9562

到目前为止，我对为什么会这样的想法是

我的代码是错误的（但我已经检查了很多）
我最初认为这是由于正常的近似问题，但后来发现binom.test()

有什么建议么？