数据挖掘 - 从 R 中的“glm”函数中提取模型方程和其他数据 - 吾爱随笔录

从 R 中的“glm”函数中提取模型方程和其他数据

数据挖掘 r 逻辑回归 glm

2022-03-05 21:15:17

我R使用pROC包进行了逻辑回归以将两个自变量组合在一起，我得到了这个：

 summary(fit)

Call: glm(formula = Case ~ X + Y, family = "binomial", data = data)

Deviance Residuals: 
  Min       1Q     Median     3Q      Max  
-1.5751  -0.8277  -0.6095   1.0701   2.3080  

Coefficients:
             Estimate  Std. Error z value Pr(>|z|)    
(Intercept) -0.153731   0.538511  -0.285 0.775281    
X           -0.048843   0.012856  -3.799 0.000145 ***
Y            0.028364   0.009077   3.125 0.001780 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 287.44  on 241  degrees of freedom
Residual deviance: 260.34  on 239  degrees of freedom
AIC: 266.34

Number of Fisher Scoring iterations: 4

>     fit

Call:  glm(formula = Case ~ X + Y, family = "binomial", data = data)

Coefficients:
  (Intercept)       X            Y  
   -0.15373     -0.04884      0.02836  

Degrees of Freedom: 241 Total (i.e. Null);  239 Residual
Null Deviance:      287.4 
Residual Deviance:  260.3        AIC: 266.3

现在我需要从这些数据中提取一些信息，但我不知道该怎么做。首先，我需要模型方程：假设 fit 是一个名为的组合预测器CP；可以CP=-0.15-0.05X+0.03Y吗？

然后，从回归得到的组合预测变量应该呈现一个中值，以便我可以比较两组的中值Case以及Controls我用来进行回归的中值（换句话说，我的X和Y变量是 N 维的N = N1+N2，其中N1 = Number of Controls，对于其中Case=0, 和N2 = Number of Cases, 对于哪个Case=1)。

1个回答

为了从拟合glm模型对象中提取一些数据，您需要确定该数据所在的位置（使用文档等str()）。某些数据可能可从summary.glm对象获得，而更详细的数据可从glm对象本身获得。对于提取模型参数，您可以使用coef()函数或直接访问结构。

更新：

从普林斯顿*介绍到 R 课程的网站，GLM 部分- 请参阅详细信息和示例：

可用于从拟合中提取结果的函数包括
- 'residuals' or 'resid', for the deviance residuals
- 'fitted' or 'fitted.values', for the fitted values (estimated probabilities)
- 'predict', for the linear predictor (estimated logits)
- 'coef' or 'coefficients', for the coefficients, and
- 'deviance', for the deviance. 
其中一些函数具有可选参数；例如，您可以提取五种不同类型的残差，分别称为“偏差”、“皮尔森”、“响应”（响应 - 拟合值）、“工作”（IRLS 算法中的工作因变量 - 线性预测器）和“部分”（通过省略模型中的每个项形成的工作残差矩阵）。您可以使用 type 参数指定您想要的那个，例如residuals(lrfit,type="pearson")。

*) 更准确地说，该网站由普林斯顿大学的 Germán Rodríguez 提供。

其它你可能感兴趣的问题

上一篇数据挖掘——聚类技术下一篇使用 Dryad 代替 Spark 有什么优势？