获得 2 个连续正面的有偏硬币投掷次数的预测区间

机器算法验证 置信区间 二项分布 常客
2022-03-27 07:55:02

假设给了我一个可能有偏差的硬币,它出现正面的概率是未知p

最初我掷硬币 10 次(例如生成数据 = TTHTTHHTHH)。

接下来,我必须反复掷硬币,直到连续出现 2 个正面。让我在连续获得 2 个正面之前执行的翻转次数为FHH

我的问题:如何计算的最频繁预测区间(假设我已经进行了 10 次抛硬币)?FHH

注意:在我看来,这个问题的等效贝叶斯答案相当简单(??)——但我不确定最好的最常见方法。为清楚起见,我将贝叶斯方法包括在下面。

贝叶斯方法:

我们有:计算应该很简单(例如,使用像这样的蛮力方法:在一系列抛硬币中击中正面和反面模式所花费的时间

P(FHH|data)=01P(FHH|p)P(p|data) dp
P(FHH|p)

我们可以使用贝叶斯定理 其中我们指定先验(例如统一先验) ,从二项分布计算似然只是一个归一化常数。P(p|data)

P(p|data)=L(data|p)P(p)/P(data)
P(p)L(data|p)P(data)

鉴于此,我们可以计算积分以获得分布,从中可以构建各种点/区间估计。P(FHH|data)

但是,我们避免指定的最常见解决方案又如何呢? P(p)

1个回答

这个答案实现了一种类似于 whuber 评论的方法,其中从一开始的 10 次投掷中天真地估计,然后“插入”到以获得预测间隔。pP(FHH|p)

该方法没有明确考虑的不确定性,这在某些情况下会导致性能不佳,如下所示。如果我们有超过 10 次可用于估计的投掷,那么这种方法可能工作得很好。不确定性的其他方法会很有趣ppp

此答案中的所有代码都在 R 中。

第 1 步:计算P(FHH|p)

首先,我们需要能够计算以下代码以分析方式执行此操作(因为模拟方法对于小非常低效):P(FHH|p)p

pmf_FHH<-function(p, Nout){
#############################################################
#
# Analytically compute the probability mass function for F_HH
# F_HH = number of coin flips required to give 2 consecutive heads
#
#
# p = probability of heads (length 1 vector)
# Nout = integer vector of values for which we want the pmf 
#

# Quick exit
if(p==0) return(Nout*0)
if(p==1) return((Nout==2))
if(max(Nout)==1) return(0*Nout)

# Recursively compute the pmf
N=max(Nout)

# Storage
PrN_T=rep(NA,N) # Probability that we got to the N'th flip without 2 consecutive heads, AND the N'th flip was a tail
PrN_2H=rep(NA,N) # Probability that the N'th flip gives 2 consecutive heads (for the first time)

# First flip
PrN_T[1]=(1-p) # Probability that we got to the first flip and it was a tail
PrN_2H[1]=0 # Can't have 2 heads on 1st flip

# Second flip
PrN_T[2] =(1- p) # Probability we get to the second flip and it was a tail
PrN_2H[2]=p*p # Probability that we get 2 heads after 2 flips

# Third flip and above
for(i in 3:length(PrN_2H)){
    # 'Probability that we got to the i'th flip, and it was a tail
    # = [1-(probability that we have terminated by i-1) ]*(1-p)
    PrN_T[i] = (1-sum(PrN_2H[1:(i-1)]))*(1-p)
    # Probability that flip i-2 was a tail, and i-1 and i were heads
    PrN_2H[i]=PrN_T[i-2]*p*p 
}

return(PrN_2H[Nout]) 
}

为了测试上述函数并为以后用于测试预测区间,我们编写了另一个函数来模拟抛硬币过程。

sim_FHH_p<-function(p,n=round(1e+04/p**3), pattern='11'){
# Simulate many coin toss sequences, ending in the first occurrence of pattern
#
# p = probability of 1 (1=heads)
# n = number of individual tosses to sample (split into sequences ending in pattern)
# pattern = pattern to split on (1=heads)
#
# returns vector with the length of each toss sequence

# Make a data string of many coin flips e.g. '011010011'.
random_data=paste(sample(c(0,1),n,replace=T,prob=c(1-p,p)),
                  collapse="")

# Split up by occurrence of pattern, count characters, and add the number of characters in pattern.
# Each element of random_FHH gives a number of coin-tosses to get pattern
random_FHH=nchar(unlist(strsplit(random_data,pattern)))+nchar(pattern)

# The last string may not have ended in pattern. Remove it. 
random_FHH=random_FHH[-length(random_FHH)]

return(random_FHH)
}

现在我运行一个测试来检查模拟结果和分析结果是否“相同”(增加 1e+07 以获得更好的一致性)。

set.seed(1)
p=0.3
# Simulate coin-toss
qq=sim_FHH_p(p,n=1e+07, pattern='11')

Nmax=round(10/p**2) # Convenient upper limit where we check pmf_FHH
empirical_pmf=rep(NA,Nmax)
for(i in 1:Nmax) empirical_pmf[i] = (sum(qq==i)/length(qq))

png('test_analytical_relation.png',width=6,height=5,res=200,units='in')
plot(1:Nmax,empirical_pmf,main='Test of analytical relation',ylab='pmf')
points(1:Nmax,pmf_FHH(p, 1:Nmax),col='red',t='l')
legend('topright', c('Approximate empirical pmf', 'Analytical pmf'), pch=c(1,NA),lty=c(NA,1),col=c(1,2))
dev.off()

看起来不错。$F_{HH}$ 的分析概率质量函数与模拟近似值的比较。

第 2 步:计算预测区间的代码,假设已知。p

如果 p 已知,那么我们可以直接使用得到的预测区间。对于单边 (1- ) 预测区间,我们只需要获得 ) 分位数代码是:P(FHH|p)FHHααP(FHH|p)

ci_FHH<-function(p, alpha=0.1,Nmax=round(10/max(p,0.001)**2), two.sided=FALSE){
## Compute a prediction interval for FHH, assuming p
## is known exactly
##
## By default, compute 1-sided prediction interval to bound the upper values of FHH
if(p==0){
    return(c(Inf, Inf, NA, NA))
}else if(p==1){
    return(c(2, 2, 0, 1))
}else{
    cdf_FHH=cumsum(pmf_FHH(p, 1:Nmax))
    if(two.sided){ 
        lowerInd=max(which(cdf_FHH<(alpha/2)))+1 
        upperInd=min(which(cdf_FHH>(1-alpha/2)))
    }else{
        lowerInd=2 
        upperInd=min(which(cdf_FHH>(1-alpha)))
    }
    return(c(lowerInd,upperInd, cdf_FHH[lowerInd-1],cdf_FHH[upperInd]))
}

第 3 步:测试预测区间覆盖率估计正确, 我们预计上面开发的预测区间非常好,但如果不是,则可能非常糟糕。为了测试覆盖率,以下函数假设的真实值已知,然后根据 10 次掷硬币(使用观察到的正面的分数)的估计值计算预测区间.pppp

test_ci_with_estimated_p<-function(true_p=0.5, theoretical_coverage=0.9, len_data=10, Nsim=100){

# Simulate many coin-toss experiments
simRuns=sim_FHH_p(true_p,n=1e+07)

# Simulate many prediction intervals with ESTIMATED p, and see what their
# coverage is like
store_est_p=rep(NA,Nsim)
store_coverage=rep(NA,Nsim) 
for(i in 1:Nsim){
    # Estimate p from a sample of size len_data
    mysim=rbinom(len_data,1,true_p)
    est_p = mean(mysim) # sample estimate of p
    myci=ci_FHH(est_p,alpha=(1-theoretical_coverage))

    #
    store_est_p[i]=est_p
    store_coverage[i] = sum(simRuns>=myci[1] & simRuns<=myci[2])/length(simRuns)
}
return(list(est_p=store_est_p,coverage=store_coverage, simRuns=simRuns))
}

一些测试证实,当估计正确时,覆盖率几乎是正确的,但如果不是,则可能非常糟糕。这些图显示了实际 = 0.2 和 0.5(垂直线)以及 0.9(水平线)的理论覆盖率的测试。很明显,如果估计的太高,那么预测区间往往会覆盖,而如果估计的太低,它们就会过度覆盖,除非估计的为零,在这种情况下,我们无法计算任何预测间隔(因为使用插件估计,磁头应该永远不会出现)。只有 10 个样本来估计,通常覆盖率远低于理论水平。pppppp

t5=test_ci_with_estimated_p(0.5,theoretical_coverage=0.9)
t2=test_ci_with_estimated_p(0.2,theoretical_coverage=0.9)

png('test_CI.png',width=12,height=10,res=300,units='in')
par(mfrow=c(2,2))
plot(t2$est_p,t2$coverage,xlab='Estimated value of p', ylab='Coverage',cex=2,pch=19,main='CI performance when p=0.2')
abline(h=0.9)
abline(v=0.2)
plot(t5$est_p,t5$coverage,xlab='Estimated value of p', ylab='Coverage',cex=2,pch=19,main='CI performance when p=0.5')
abline(h=0.9)
abline(v=0.5)
#dev.off()

barplot(table(t2$est_p),main='Estimated p when p=0.2')
barplot(table(t5$est_p),main='Estimated p when p=0.5')
dev.off()

CI性能测试

= 0.5(87% 与期望的 90% 相比)时,平均覆盖率非常接近期望的覆盖率 = 0.2 时(71% 对 90%)就不太好。pp

# Compute mean coverage + other stats
summary(t2$coverage)
summary(t5$coverage)