我需要具有二进制响应的 GLM 中的二次和线性系数。最好的选择是什么?

机器算法验证 r 物流 广义线性模型 二进制数据
2022-03-17 11:00:54

我有三个预测变量和一个响应。如果我的响应变量是二进制的,我该怎么办?

1个回答

您可以使用逻辑回归添加二次项,就像使用常规的旧线性回归一样。这是在模型中包含“曲线”的简单方法。确保你明白这意味着什么。我怀疑您想要一个 R 教程,这与 CV 无关。在 R 中添加二次的基本方法是包含I(x^2)在公式中。这是一个简单的例子:

lo.to.p = function(lo){                 # we need this function to generate the data
  odds = exp(lo)
  prob = odds/(1+odds)
  return(prob)
}
set.seed(4649)                          # this makes the example exactly reproducible
x1 = runif(100, min=0, max=10)          # you have 3, largely uncorrelated predictors
x2 = runif(100, min=0, max=10)
x3 = runif(100, min=0, max=10)
lo = -78 + 35*x1 - 3.5*(x1^2) + .1*x2   # there is a quadratic relationship w/ x1, a
p  = lo.to.p(lo)                        #  linear relationship w/ x2 & no relationship
y  = rbinom(100, size=1, prob=p)        #  w/ x3

在此处输入图像描述

model = glm(y~x1+I(x1^2)+x2+x3, family=binomial)
summary(model)
# Call:
# glm(formula = y ~ x1 + I(x1^2) + x2 + x3, family = binomial)
# 
# Deviance Residuals: 
#      Min        1Q    Median        3Q       Max  
# -1.74280  -0.00387   0.00000   0.04145   1.74573  
# 
# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)   
# (Intercept) -53.65462   19.65288  -2.730  0.00633 **
# x1           24.78164    8.92910   2.775  0.00551 **
# I(x1^2)      -2.49888    0.89344  -2.797  0.00516 **
# x2            0.03318    0.20198   0.164  0.86952   
# x3           -0.09277    0.18650  -0.497  0.61890   
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# (Dispersion parameter for binomial family taken to be 1)
# 
#     Null deviance: 128.207  on 99  degrees of freedom
# Residual deviance:  18.647  on 95  degrees of freedom
# AIC: 28.647
# 
# Number of Fisher Scoring iterations: 10