机器算法验证 - 功率计算，连续曝光的逻辑回归——队列 - 吾爱随笔录

我试图通过队列研究中的连续暴露来估计逻辑回归的功效（即，抽样概率的比率为 1）。我有人群累积发病率（概率）和人群暴露变异性和暴露平均值以及预期优势比。我也有一个总样本量。

我正在使用 R，它似乎Hmisc::bpower仅用于二进制曝光的逻辑回归，我似乎找不到任何可以估计连续曝光的二项式功率的包。

我尝试了以下模拟，但考虑到我的总样本量，它非常慢，我不确定它是否正确：

p <- vector()
betahat <- vector()
for(i in 1:1000){
n <- 40000  #total sample size
intercept = log(0.008662265)  #where exp(intercept) = P(D=1)
beta <- log(1.4) #where exp(beta)=OR corresponding to a one unit change in xtest
xtest <- rnorm(n,1.2,.31)  #xtest is vector length 40,000 with mean 1.2 and sd .31
linpred <- intercept + xtest*beta #linear predictor
prob <- exp(linpred)/(1 + exp(linpred)) #link function
runis <- runif(n,0,1) #generate a vector length n from a uniform distribution 0,1
ytest <- ifelse(runis < prob,1,0)  #if a random value from a uniform distribution 0,1 is less than prob, then the outcome is 1.  otherwise the outcome is 0
coefs <- coef(summary(glm(ytest~xtest, family="binomial")))  #run a logistic regression
p[i] <- coefs[2,4] #store the p value
betahat[i] <- coefs[2,1] #store the unexponentiated betahat
}

mean(p < .05)
#power

exp(mean(betahat))
#sanity check, should equal 1.4--it does

这种方法有什么问题吗？

我担心的一个问题是累积发生率（即，在给定时间段内发生事件的概率）来自没有接触 0 的人群。实际上，可以合理地假设我用于截距的值实际上来自具有与我相似的暴露变异性的人群。在这种情况下，给定优势比（以及我会在发表的论文中找到的其他信息），我将如何估计未暴露的概率以用于我的功效计算？