机器算法验证 - 获得 2 个连续正面的有偏硬币投掷次数的预测区间 - 吾爱随笔录

获得 2 个连续正面的有偏硬币投掷次数的预测区间

机器算法验证置信区间二项分布常客

2022-03-27 07:55:02

假设给了我一个可能有偏差的硬币，它出现正面的概率是未知。 $p$

最初我掷硬币 10 次（例如生成数据 = TTHTTHHTHH）。

接下来，我必须反复掷硬币，直到连续出现 2 个正面。让我在连续获得 2 个正面之前执行的翻转次数为。 $F_{HH}$

我的问题：如何计算的最频繁预测区间（假设我已经进行了 10 次抛硬币）？ $F_{HH}$

注意：在我看来，这个问题的等效贝叶斯答案相当简单（？？）——但我不确定最好的最常见方法。为清楚起见，我将贝叶斯方法包括在下面。

贝叶斯方法：

我们有：。计算应该很简单（例如，使用像这样的蛮力方法：在一系列抛硬币中击中正面和反面模式所花费的时间）

P (F_{H H} | d a t a) = \int_{0}^{1} P (F_{H H} | p) P (p | d a t a) d p

$P(F_{HH} | data) = \int_{0}^{1} P(F_{HH}|p)P(p|data)~dp$

P (F_{H H} | p)

$P(F_{HH}|p)$

我们可以使用贝叶斯定理其中我们指定先验（例如统一先验），从二项分布计算似然只是一个归一化常数。 $P(p|data)$

P (p | d a t a) = L (d a t a | p) P (p) / P (d a t a)

$P(p|data) = L(data|p)P(p)/P(data)$

P (p)

$P(p)$

L (d a t a | p)

$L(data|p)$

P (d a t a)

$P(data)$

鉴于此，我们可以计算积分以获得分布，从中可以构建各种点/区间估计。 $P(F_{HH}|data)$

但是，我们避免指定的最常见解决方案又如何呢？ $P(p)$

1个回答

这个答案实现了一种类似于 whuber 评论的方法，其中从一开始的 10 次投掷中天真地估计，然后“插入”到以获得预测间隔。 $p$ $P(F_{HH}|p)$

该方法没有明确考虑的不确定性，这在某些情况下会导致性能不佳，如下所示。如果我们有超过 10 次可用于估计的投掷，那么这种方法可能工作得很好。不确定性的其他方法会很有趣。 $p$ $p$ $p$

此答案中的所有代码都在 R 中。

第 1 步：计算 $P(F_{HH}|p)$

首先，我们需要能够计算。以下代码以分析方式执行此操作（因为模拟方法对于小非常低效）： $P(F_{HH}|p)$ $p$

pmf_FHH<-function(p, Nout){
#############################################################
#
# Analytically compute the probability mass function for F_HH
# F_HH = number of coin flips required to give 2 consecutive heads
#
#
# p = probability of heads (length 1 vector)
# Nout = integer vector of values for which we want the pmf 
#

# Quick exit
if(p==0) return(Nout*0)
if(p==1) return((Nout==2))
if(max(Nout)==1) return(0*Nout)

# Recursively compute the pmf
N=max(Nout)

# Storage
PrN_T=rep(NA,N) # Probability that we got to the N'th flip without 2 consecutive heads, AND the N'th flip was a tail
PrN_2H=rep(NA,N) # Probability that the N'th flip gives 2 consecutive heads (for the first time)

# First flip
PrN_T[1]=(1-p) # Probability that we got to the first flip and it was a tail
PrN_2H[1]=0 # Can't have 2 heads on 1st flip

# Second flip
PrN_T[2] =(1- p) # Probability we get to the second flip and it was a tail
PrN_2H[2]=p*p # Probability that we get 2 heads after 2 flips

# Third flip and above
for(i in 3:length(PrN_2H)){
    # 'Probability that we got to the i'th flip, and it was a tail
    # = [1-(probability that we have terminated by i-1) ]*(1-p)
    PrN_T[i] = (1-sum(PrN_2H[1:(i-1)]))*(1-p)
    # Probability that flip i-2 was a tail, and i-1 and i were heads
    PrN_2H[i]=PrN_T[i-2]*p*p 
}

return(PrN_2H[Nout]) 
}

为了测试上述函数并为以后用于测试预测区间，我们编写了另一个函数来模拟抛硬币过程。

sim_FHH_p<-function(p,n=round(1e+04/p**3), pattern='11'){
# Simulate many coin toss sequences, ending in the first occurrence of pattern
#
# p = probability of 1 (1=heads)
# n = number of individual tosses to sample (split into sequences ending in pattern)
# pattern = pattern to split on (1=heads)
#
# returns vector with the length of each toss sequence

# Make a data string of many coin flips e.g. '011010011'.
random_data=paste(sample(c(0,1),n,replace=T,prob=c(1-p,p)),
                  collapse="")

# Split up by occurrence of pattern, count characters, and add the number of characters in pattern.
# Each element of random_FHH gives a number of coin-tosses to get pattern
random_FHH=nchar(unlist(strsplit(random_data,pattern)))+nchar(pattern)

# The last string may not have ended in pattern. Remove it. 
random_FHH=random_FHH[-length(random_FHH)]

return(random_FHH)
}

现在我运行一个测试来检查模拟结果和分析结果是否“相同”（增加 1e+07 以获得更好的一致性）。

set.seed(1)
p=0.3
# Simulate coin-toss
qq=sim_FHH_p(p,n=1e+07, pattern='11')

Nmax=round(10/p**2) # Convenient upper limit where we check pmf_FHH
empirical_pmf=rep(NA,Nmax)
for(i in 1:Nmax) empirical_pmf[i] = (sum(qq==i)/length(qq))

png('test_analytical_relation.png',width=6,height=5,res=200,units='in')
plot(1:Nmax,empirical_pmf,main='Test of analytical relation',ylab='pmf')
points(1:Nmax,pmf_FHH(p, 1:Nmax),col='red',t='l')
legend('topright', c('Approximate empirical pmf', 'Analytical pmf'), pch=c(1,NA),lty=c(NA,1),col=c(1,2))
dev.off()

看起来不错。 $$F_{HH}$ 的分析概率质量函数与模拟近似值的比较。$

第 2 步：计算预测区间的代码，假设已知。 $p$

如果 p 已知，那么我们可以直接使用得到的预测区间。对于单边 (1- ) 预测区间，我们只需要获得 ) 分位数。代码是： $P(F_{HH}|p)$ $F_{HH}$ $\alpha$ $\alpha$ $P(F_{HH}|p)$

ci_FHH<-function(p, alpha=0.1,Nmax=round(10/max(p,0.001)**2), two.sided=FALSE){
## Compute a prediction interval for FHH, assuming p
## is known exactly
##
## By default, compute 1-sided prediction interval to bound the upper values of FHH
if(p==0){
    return(c(Inf, Inf, NA, NA))
}else if(p==1){
    return(c(2, 2, 0, 1))
}else{
    cdf_FHH=cumsum(pmf_FHH(p, 1:Nmax))
    if(two.sided){ 
        lowerInd=max(which(cdf_FHH<(alpha/2)))+1 
        upperInd=min(which(cdf_FHH>(1-alpha/2)))
    }else{
        lowerInd=2 
        upperInd=min(which(cdf_FHH>(1-alpha)))
    }
    return(c(lowerInd,upperInd, cdf_FHH[lowerInd-1],cdf_FHH[upperInd]))
}

第 3 步：测试预测区间覆盖率估计正确，我们预计上面开发的预测区间非常好，但如果不是，则可能非常糟糕。为了测试覆盖率，以下函数假设的真实值已知，然后根据 10 次掷硬币（使用观察到的正面的分数）的估计值计算预测区间. $p$ $p$ $p$ $p$

test_ci_with_estimated_p<-function(true_p=0.5, theoretical_coverage=0.9, len_data=10, Nsim=100){

# Simulate many coin-toss experiments
simRuns=sim_FHH_p(true_p,n=1e+07)

# Simulate many prediction intervals with ESTIMATED p, and see what their
# coverage is like
store_est_p=rep(NA,Nsim)
store_coverage=rep(NA,Nsim) 
for(i in 1:Nsim){
    # Estimate p from a sample of size len_data
    mysim=rbinom(len_data,1,true_p)
    est_p = mean(mysim) # sample estimate of p
    myci=ci_FHH(est_p,alpha=(1-theoretical_coverage))

    #
    store_est_p[i]=est_p
    store_coverage[i] = sum(simRuns>=myci[1] & simRuns<=myci[2])/length(simRuns)
}
return(list(est_p=store_est_p,coverage=store_coverage, simRuns=simRuns))
}

一些测试证实，当估计正确时，覆盖率几乎是正确的，但如果不是，则可能非常糟糕。这些图显示了实际 = 0.2 和 0.5（垂直线）以及 0.9（水平线）的理论覆盖率的测试。很明显，如果估计的太高，那么预测区间往往会覆盖，而如果估计的太低，它们就会过度覆盖，除非估计的为零，在这种情况下，我们无法计算任何预测间隔（因为使用插件估计，磁头应该永远不会出现）。只有 10 个样本来估计，通常覆盖率远低于理论水平。 $p$ $p$ $p$ $p$ $p$ $p$

t5=test_ci_with_estimated_p(0.5,theoretical_coverage=0.9)
t2=test_ci_with_estimated_p(0.2,theoretical_coverage=0.9)

png('test_CI.png',width=12,height=10,res=300,units='in')
par(mfrow=c(2,2))
plot(t2$est_p,t2$coverage,xlab='Estimated value of p', ylab='Coverage',cex=2,pch=19,main='CI performance when p=0.2')
abline(h=0.9)
abline(v=0.2)
plot(t5$est_p,t5$coverage,xlab='Estimated value of p', ylab='Coverage',cex=2,pch=19,main='CI performance when p=0.5')
abline(h=0.9)
abline(v=0.5)
#dev.off()

barplot(table(t2$est_p),main='Estimated p when p=0.2')
barplot(table(t5$est_p),main='Estimated p when p=0.5')
dev.off()

CI性能测试

= 0.5（87% 与期望的 90% 相比）时，平均覆盖率非常接近期望的覆盖率 = 0.2 时（71% 对 90%）就不太好。 $p$ $p$

# Compute mean coverage + other stats
summary(t2$coverage)
summary(t5$coverage)

其它你可能感兴趣的问题

上一篇使用直方图进行图像分类下一篇多项逻辑回归的过采样校正