我知道它不那么优雅,但我不得不模拟它。我不仅构建了一个非常简单的模拟,而且运行起来不优雅且缓慢。不过,这已经足够好了。一个优点是,只要一些基础是正确的,它就会告诉我优雅的方法何时失败。
样本量将随着硬编码值的变化而变化。
所以这里是代码:
#main code
#want 95% CI to be no more than 3% from
# prevalence
#expect prevalence around 15% to 30%
#think sample size is ~1000
my_prev <- seq(from=0.15, to=0.30,
by = 0.002)
samp_sizes <- seq(from=400, to=800, by = 1)
samp_sizes
N_loops <- 2000
store <- matrix(0, nrow =
length(my_prev)*length(samp_sizes),
ncol = 3)
count <- 1
#for each prevalence
for (i in 1:length(my_prev)) {
#for each sample size
for(j in 1:length(samp_sizes)){
temp <- 0
for(k in 1:N_loops){
#draw samples
y <- rbinom(n =
samp_sizes[j],
size = 1,
prob =
my_prev[i])
#compute prevalence, store
temp[k] <- mean(y)
}
#compute 5% and 95% of temp
width <- diff(quantile(x = temp,
probs = c(0.05,0.95)))
#store samp_size, prevalence, and
# CI half-width
store[count, 1] <- my_prev[i]
store[count, 2] <- samp_sizes[j]
store[count, 3] <- width[[1]]
count <- count+1
}
}
store2 <- numeric(length(my_prev))
#go through store
for(i in 1:length(my_prev)){
#for each prevalence
#find first CI half-width below 3%
#store samp_size
idx_p <- which(store[, 1] ==
my_prev[i], arr.ind = T)
idx_p
temp <- store[idx_p, ]
temp
idx_2 <- which(temp[, 3] <= 0.03*2,
arr.ind = T)
idx_2
temp2 <- temp[idx_2, ]
temp2
if (length(temp2[,3])>1){
idx_3 <- which(temp2[, 3]==max(temp2[,
3]), arr.ind = T)
store2[i] <- temp2[idx_3[1], 2]
} else {
store2[i] <- temp2[2]
}
}
#plot it
plot(x=my_prev, y=store2,
xlab = "prevalence",
ylab = "sample size")
lines(smooth.spline(x=my_prev,y=store2),
col="Red")
grid()
And here is the plot of sample size vs. prevalence such that uncertainty in 95% CI for prevalence is as close as possible to $\pm$3% without going over it.
[![sample size vs prevalence][1]][1]
Away from 50%, "somewhat less observations" seem to be required, as kjetil suggested.
I think that you can get a decent estimate of prevalence before 400 samples, and adjust your sampling strategy as you go. I don't think there should be a jog in the middle, and so you might bump N_loops up to 10e3, and bump the "by" in "my_prev" down to 0.001.
[1]: https://i.stack.imgur.com/vNYcH.png