机器算法验证 - 如何计算逻辑回归系数的标准误差 - 吾爱随笔录

如何计算逻辑回归系数的标准误差

机器算法验证物流 Python 标准错误回归系数 scikit-学习

2022-01-16 15:43:34

我正在使用 Python 的 scikit-learn 来训练和测试逻辑回归。

scikit-learn 返回自变量的回归系数，但不提供系数的标准误差。我需要这些标准误差来计算每个系数的 Wald 统计量，然后将这些系数相互比较。

我找到了一个关于如何计算逻辑回归系数的标准误差的描述（here），但有点难以理解。

如果您碰巧知道如何计算这些标准错误的简单、简洁的解释和/或可以为我提供一个，我将不胜感激！我不是指特定代码（尽管请随意发布任何可能有用的代码），而是对所涉及步骤的算法解释。

3个回答

模型系数的标准误差是协方差矩阵的对角线项的平方根。考虑以下：

设计矩阵：

$\textbf{X = }\begin{bmatrix} 1 & x_{1,1} & \ldots & x_{1,p} \\ 1 & x_{2,1} & \ldots & x_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n,1} & \ldots & x_{n,p} \end{bmatrix}$ ，在哪里 $x_{i,j}$ 是的价值 $j$ 的预测器 $i$ 观察。

（注意：这假定模型具有截距。）

$\textbf{V = } \begin{bmatrix} \hat{\pi}_{1}(1 - \hat{\pi}_{1}) & 0 & \ldots & 0 \\ 0 & \hat{\pi}_{2}(1 - \hat{\pi}_{2}) & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \hat{\pi}_{n}(1 - \hat{\pi}_{n}) \end{bmatrix}$ ，在哪里 $\hat{\pi}_{i}$ 表示用于观察的类成员的预测概率 $i$ .

协方差矩阵可以写成：

$\textbf{(X}^{T}\textbf{V}\textbf{X)}^{-1}$

这可以使用以下代码来实现：

import numpy as np
from sklearn import linear_model

# Initiate logistic regression object
logit = linear_model.LogisticRegression()

# Fit model. Let X_train = matrix of predictors, y_train = matrix of variable.
# NOTE: Do not include a column for the intercept when fitting the model.
resLogit = logit.fit(X_train, y_train)

# Calculate matrix of predicted class probabilities.
# Check resLogit.classes_ to make sure that sklearn ordered your classes as expected
predProbs = resLogit.predict_proba(X_train)

# Design matrix -- add column of 1's at the beginning of your X_train matrix
X_design = np.hstack([np.ones((X_train.shape[0], 1)), X_train])

# Initiate matrix of 0's, fill diagonal with each predicted observation's variance
V = np.diagflat(np.product(predProbs, axis=1))

# Covariance matrix
# Note that the @-operater does matrix multiplication in Python 3.5+, so if you're running
# Python 3.5+, you can replace the covLogit-line below with the more readable:
# covLogit = np.linalg.inv(X_design.T @ V @ X_design)
covLogit = np.linalg.inv(np.dot(np.dot(X_design.T, V), X_design))
print("Covariance matrix: ", covLogit)

# Standard errors
print("Standard errors: ", np.sqrt(np.diag(covLogit)))

# Wald statistic (coefficient / s.e.) ^ 2
logitParams = np.insert(resLogit.coef_, 0, resLogit.intercept_)
print("Wald statistics: ", (logitParams / np.sqrt(np.diag(covLogit))) ** 2)

话虽如此，statsmodels如果您想访问大量“开箱即用”的诊断程序，可能会是一个更好的包。

您的软件是否为您提供参数协方差（或方差-协方差）矩阵？如果是这样，则标准误差是该矩阵对角线的平方根。您可能想查阅教科书（或 google 获取大学讲义）以了解如何获取 $V_\beta$ 线性和广义线性模型的矩阵。

如果您对推理感兴趣，那么您可能想看看statsmodels。标准误差和常见的统计测试可用。这是一个逻辑回归示例。

其它你可能感兴趣的问题

上一篇小样本的适当正态性检验下一篇对缺失值使用均值有什么缺点？