如何从引导回归中获得系数的 p 值?

机器算法验证 r 回归 p 值 引导程序
2022-03-26 22:04:14

从罗伯特卡巴科夫的Quick-R我有

# Bootstrap 95% CI for regression coefficients 
library(boot)
# function to obtain regression weights 
bs <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to select sample 
  fit <- lm(formula, data=d)
  return(coef(fit)) 
} 
# bootstrapping with 1000 replications 
results <- boot(data=mtcars, statistic=bs, 
     R=1000, formula=mpg~wt+disp)

# view results
results
plot(results, index=1) # intercept 
plot(results, index=2) # wt 
plot(results, index=3) # disp 

# get 95% confidence intervals 
boot.ci(results, type="bca", index=1) # intercept 
boot.ci(results, type="bca", index=2) # wt 
boot.ci(results, type="bca", index=3) # disp

如何获得引导回归系数的 p 值 H0:bj=0

3个回答

如果我错了,社区和@BrianDiggs 可能会纠正我,但我相信您可以获得如下问题的 p 值。两侧检验的 p 值定义为

2min[P(Xx|H0),P(Xx|H0)]

因此,如果您按大小对自举系数进行排序,然后确定比例大于零和小于零,则最小比例乘以 2 应该会给您一个 p 值。

在这种情况下,我通常使用以下功能:

twosidep<-function(data){
  p1<-sum(data>0)/length(data)
  p2<-sum(data<0)/length(data)
  p<-min(p1,p2)*2
  return(p)
}

只是另一个有点简单的变体,但我认为在没有明确使用库的情况下传递信息,boot这可能会使一些人对它使用的语法感到困惑。

我们有一个线性模型:$y = X \beta + \epsilon$, $\quad \epsilon \sim N(0,\sigma^2)$y=Xβ+ϵ, ϵN(0,σ2)

以下是该线性模型的参数引导程序,这意味着我们不会重新采样原始数据,但实际上我们会从拟合模型中生成新数据。此外,我们假设回归系数 $\beta$ 的自举分布是对称的,并且是平移不变的。(非常粗略地说,我们可以通过影响其属性来移动它的轴)背后的想法是 $\beta$ 的波动是由于 $\epsilon$ 造成的,因此如果有足够的样本,它们应该提供一个很好的近似值$\beta$真实分布。和之前一样,我们再次测试 $H_0 : 0 = \beta_j$ 并且我们将 p 值定义为β is symmetric and that is translation invariant. (Very roughly speaking that we can move the axis of it with affecting its properties) The idea behind is that the fluctuations in the β 's are due to ϵ and therefore with enough samples they should provide a good approximation of the β 's. As before we test again H0:0=βj and we defined our p-values as “在给定数据概率分布的零假设的情况下,结果将与观察到的结果一样极端或更极端的概率”(在这种情况下观察到的结果是 $\beta$ 的我们得到了我们的原始模型)。所以这里是:β 's we got for our original model). So here goes:

# Sample Size
N           <- 2^12;
# Linear Model to Boostrap          
Model2Boot  <- lm( mpg ~ wt + disp, mtcars)
# Values of the model coefficients
Betas       <- coefficients(Model2Boot)
# Number of coefficents to test against
M           <- length(Betas)
# Matrix of M columns to hold Bootstraping results
BtStrpRes   <- matrix( rep(0,M*N), ncol=M)

for (i in 1:N) {
# Simulate data N times from the model we assume be true
# and save the resulting coefficient in the i-th row of BtStrpRes
BtStrpRes[i,] <-coefficients(lm(unlist(simulate(Model2Boot)) ~wt + disp, mtcars))
}

#Get the p-values for coefficient
P_val1 <-mean( abs(BtStrpRes[,1] - mean(BtStrpRes[,1]) )> abs( Betas[1]))
P_val2 <-mean( abs(BtStrpRes[,2] - mean(BtStrpRes[,2]) )> abs( Betas[2]))
P_val3 <-mean( abs(BtStrpRes[,3] - mean(BtStrpRes[,3]) )> abs( Betas[3]))

#and some parametric bootstrap confidence intervals (2.5%, 97.5%) 
ConfInt1 <- quantile(BtStrpRes[,1], c(.025, 0.975))
ConfInt2 <- quantile(BtStrpRes[,2], c(.025, 0.975))
ConfInt3 <- quantile(BtStrpRes[,3], c(.025, 0.975))

如前所述,整个想法是你有 $\beta$ 的自举分布近似于它们的真实分布。(很明显,这段代码针对速度和可读性进行了优化。:))β 's approximates their true one. (Clearly this code is optimized for speed but for readability. :) )

引导程序可用于计算 $p$-values,但它需要对您的代码进行重大更改。由于我对 RI 不熟悉,因此只能给您一个参考,您可以在其中查找您需要做什么:(Davison and Hinkley 1997)的第 4 章。p-values, but it would need a substantial change to your code. As I am not familiar with R I can only give you a reference in which you can look up what you would need to do: chapter 4 of (Davison and Hinkley 1997).

Davison, AC 和 Hinkley, DV 1997。引导方法及其应用。剑桥:剑桥大学出版社。