95% t 分布 CI 没有获得 95% 的覆盖率

机器算法验证 置信区间 t分布 覆盖概率
2022-03-05 00:59:57

我正在模拟一组从正态分布中提取的样本的 95% 置信区间。既然数据是正常的,那么我认为,我 95% 的置信度应该转化为 95% 的覆盖概率。但是,我得到了大约 94% 的结果。具体来说,我采用 1000 个大小为 n=10 的样本来制作一堆 CI 并估计覆盖概率,然后重复1000次以获得覆盖概率的 CI。我的覆盖概率的 5 sigma CI 原来是 ~(0.9384, 0.9408)。是否有一些统计原因,或者我做错了什么?

这是我的模拟代码:

    import numpy as np
    import scipy.stats as stats

    def CI_coverage(alpha, dist, n, n_samples):
        ''' creates n_samples samples of size n
            creates an 1-alpha confidence interval for each
            computes the fraction of those that contain mu '''
        # get samples
        samples = np.stack([dist.rvs(size=n) for i in range(n_samples)])
        
        # summary stats
        mu = dist.mean()
        xbar = samples.mean(axis=1)
        s = samples.std(axis=1)
        
        # compute CIs... note that xbar, s, CI_low, CI_high are arrays size n_samples
        t = stats.t.ppf(1 - alpha/2, n-1)
        interval_width = t * s / np.sqrt(n)
        CI_low = xbar - interval_width
        CI_high = xbar + interval_width
        
        coverage_p = np.sum(np.logical_and(CI_low < mu, mu < CI_high)) / samples.shape[0]
        return coverage_p

    mu = 1
    sigma = 0.5
    norm_dist = stats.norm(loc=mu, scale=sigma)

    n = 10
    n_samples = 1000
    n_CI_samples = 1000
    # compute the empirical coverage probability many times
    CI_coverages = [CI_coverage(0.05, norm_dist, n, n_samples) for i in range(n_CI_samples)]

    # use this to get a CI for the coverage probabilities
    CI_c_mean = np.mean(CI_coverages)
    CI_c_std = np.std(CI_coverages)

    print(CI_c_mean - 5*CI_c_std / np.sqrt(n_CI_samples), CI_c_mean + 5*CI_c_std / np.sqrt(n_CI_samples))
2个回答

根据@whuber 的评论,np.std()提供了样本标准差的有估计幸运的是,该函数允许您通过使用参数指定多个自由度来纠正这一点ddof

s = samples.std(axis=1, ddof=1)

修复此问题可提供与预期 95% CI 一致的覆盖概率:(0.9485, 0.9508)

在 R 中,使用$-notation 从输出中选择 95% CI ,我从 100,000 次迭代中t.test得到0.949±0.001,

set.seed(2021)
CI = replicate( 10^5, t.test(rnorm(10))$conf.int)
mean(CI[1,] <= 0 & CI[2,] >= 0)
[1]  0.94907
sd(CI[1,]<=0 & CI[2,]>=0)/sqrt(10^5)
[1] 0.0006952454  # aprx 95% margin of simulation error