这个答案实现了一种类似于 whuber 评论的方法,其中从一开始的 10 次投掷中天真地估计,然后“插入”到以获得预测间隔。pP(FHH|p)
该方法没有明确考虑的不确定性,这在某些情况下会导致性能不佳,如下所示。如果我们有超过 10 次可用于估计的投掷,那么这种方法可能工作得很好。不确定性的其他方法会很有趣。 ppp
此答案中的所有代码都在 R 中。
第 1 步:计算P(FHH|p)
首先,我们需要能够计算。以下代码以分析方式执行此操作(因为模拟方法对于小非常低效):P(FHH|p)p
pmf_FHH<-function(p, Nout){
#############################################################
#
# Analytically compute the probability mass function for F_HH
# F_HH = number of coin flips required to give 2 consecutive heads
#
#
# p = probability of heads (length 1 vector)
# Nout = integer vector of values for which we want the pmf
#
# Quick exit
if(p==0) return(Nout*0)
if(p==1) return((Nout==2))
if(max(Nout)==1) return(0*Nout)
# Recursively compute the pmf
N=max(Nout)
# Storage
PrN_T=rep(NA,N) # Probability that we got to the N'th flip without 2 consecutive heads, AND the N'th flip was a tail
PrN_2H=rep(NA,N) # Probability that the N'th flip gives 2 consecutive heads (for the first time)
# First flip
PrN_T[1]=(1-p) # Probability that we got to the first flip and it was a tail
PrN_2H[1]=0 # Can't have 2 heads on 1st flip
# Second flip
PrN_T[2] =(1- p) # Probability we get to the second flip and it was a tail
PrN_2H[2]=p*p # Probability that we get 2 heads after 2 flips
# Third flip and above
for(i in 3:length(PrN_2H)){
# 'Probability that we got to the i'th flip, and it was a tail
# = [1-(probability that we have terminated by i-1) ]*(1-p)
PrN_T[i] = (1-sum(PrN_2H[1:(i-1)]))*(1-p)
# Probability that flip i-2 was a tail, and i-1 and i were heads
PrN_2H[i]=PrN_T[i-2]*p*p
}
return(PrN_2H[Nout])
}
为了测试上述函数并为以后用于测试预测区间,我们编写了另一个函数来模拟抛硬币过程。
sim_FHH_p<-function(p,n=round(1e+04/p**3), pattern='11'){
# Simulate many coin toss sequences, ending in the first occurrence of pattern
#
# p = probability of 1 (1=heads)
# n = number of individual tosses to sample (split into sequences ending in pattern)
# pattern = pattern to split on (1=heads)
#
# returns vector with the length of each toss sequence
# Make a data string of many coin flips e.g. '011010011'.
random_data=paste(sample(c(0,1),n,replace=T,prob=c(1-p,p)),
collapse="")
# Split up by occurrence of pattern, count characters, and add the number of characters in pattern.
# Each element of random_FHH gives a number of coin-tosses to get pattern
random_FHH=nchar(unlist(strsplit(random_data,pattern)))+nchar(pattern)
# The last string may not have ended in pattern. Remove it.
random_FHH=random_FHH[-length(random_FHH)]
return(random_FHH)
}
现在我运行一个测试来检查模拟结果和分析结果是否“相同”(增加 1e+07 以获得更好的一致性)。
set.seed(1)
p=0.3
# Simulate coin-toss
qq=sim_FHH_p(p,n=1e+07, pattern='11')
Nmax=round(10/p**2) # Convenient upper limit where we check pmf_FHH
empirical_pmf=rep(NA,Nmax)
for(i in 1:Nmax) empirical_pmf[i] = (sum(qq==i)/length(qq))
png('test_analytical_relation.png',width=6,height=5,res=200,units='in')
plot(1:Nmax,empirical_pmf,main='Test of analytical relation',ylab='pmf')
points(1:Nmax,pmf_FHH(p, 1:Nmax),col='red',t='l')
legend('topright', c('Approximate empirical pmf', 'Analytical pmf'), pch=c(1,NA),lty=c(NA,1),col=c(1,2))
dev.off()
看起来不错。
第 2 步:计算预测区间的代码,假设已知。p
如果 p 已知,那么我们可以直接使用得到的预测区间。对于单边 (1- ) 预测区间,我们只需要获得 ) 分位数。代码是:P(FHH|p)FHHααP(FHH|p)
ci_FHH<-function(p, alpha=0.1,Nmax=round(10/max(p,0.001)**2), two.sided=FALSE){
## Compute a prediction interval for FHH, assuming p
## is known exactly
##
## By default, compute 1-sided prediction interval to bound the upper values of FHH
if(p==0){
return(c(Inf, Inf, NA, NA))
}else if(p==1){
return(c(2, 2, 0, 1))
}else{
cdf_FHH=cumsum(pmf_FHH(p, 1:Nmax))
if(two.sided){
lowerInd=max(which(cdf_FHH<(alpha/2)))+1
upperInd=min(which(cdf_FHH>(1-alpha/2)))
}else{
lowerInd=2
upperInd=min(which(cdf_FHH>(1-alpha)))
}
return(c(lowerInd,upperInd, cdf_FHH[lowerInd-1],cdf_FHH[upperInd]))
}
第 3 步:测试预测区间覆盖率估计正确,
我们预计上面开发的预测区间非常好,但如果不是,则可能非常糟糕。为了测试覆盖率,以下函数假设的真实值已知,然后根据 10 次掷硬币(使用观察到的正面的分数)的估计值计算预测区间.pppp
test_ci_with_estimated_p<-function(true_p=0.5, theoretical_coverage=0.9, len_data=10, Nsim=100){
# Simulate many coin-toss experiments
simRuns=sim_FHH_p(true_p,n=1e+07)
# Simulate many prediction intervals with ESTIMATED p, and see what their
# coverage is like
store_est_p=rep(NA,Nsim)
store_coverage=rep(NA,Nsim)
for(i in 1:Nsim){
# Estimate p from a sample of size len_data
mysim=rbinom(len_data,1,true_p)
est_p = mean(mysim) # sample estimate of p
myci=ci_FHH(est_p,alpha=(1-theoretical_coverage))
#
store_est_p[i]=est_p
store_coverage[i] = sum(simRuns>=myci[1] & simRuns<=myci[2])/length(simRuns)
}
return(list(est_p=store_est_p,coverage=store_coverage, simRuns=simRuns))
}
一些测试证实,当估计正确时,覆盖率几乎是正确的,但如果不是,则可能非常糟糕。这些图显示了实际 = 0.2 和 0.5(垂直线)以及 0.9(水平线)的理论覆盖率的测试。很明显,如果估计的太高,那么预测区间往往会覆盖,而如果估计的太低,它们就会过度覆盖,除非估计的为零,在这种情况下,我们无法计算任何预测间隔(因为使用插件估计,磁头应该永远不会出现)。只有 10 个样本来估计,通常覆盖率远低于理论水平。pppppp
t5=test_ci_with_estimated_p(0.5,theoretical_coverage=0.9)
t2=test_ci_with_estimated_p(0.2,theoretical_coverage=0.9)
png('test_CI.png',width=12,height=10,res=300,units='in')
par(mfrow=c(2,2))
plot(t2$est_p,t2$coverage,xlab='Estimated value of p', ylab='Coverage',cex=2,pch=19,main='CI performance when p=0.2')
abline(h=0.9)
abline(v=0.2)
plot(t5$est_p,t5$coverage,xlab='Estimated value of p', ylab='Coverage',cex=2,pch=19,main='CI performance when p=0.5')
abline(h=0.9)
abline(v=0.5)
#dev.off()
barplot(table(t2$est_p),main='Estimated p when p=0.2')
barplot(table(t5$est_p),main='Estimated p when p=0.5')
dev.off()

= 0.5(87% 与期望的 90% 相比)时,平均覆盖率非常接近期望的覆盖率 = 0.2 时(71% 对 90%)就不太好。pp
# Compute mean coverage + other stats
summary(t2$coverage)
summary(t5$coverage)