我正在使用 Python 的 scikit-learn 来训练和测试逻辑回归。

scikit-learn 返回自变量的回归系数,但不提供系数的标准误差。我需要这些标准误差来计算每个系数的 Wald 统计量,然后将这些系数相互比较。





  • 设计矩阵:

X = [1x1,1x1,p1x2,1x2,p1xn,1xn,p] , 在哪里xi,j是的价值j的预测器i观察。


  • V = [π^1(1π^1)000π^2(1π^2)000π^n(1π^n)] , 在哪里π^i表示用于观察的类成员的预测概率i.




import numpy as np
from sklearn import linear_model

# Initiate logistic regression object
logit = linear_model.LogisticRegression()

# Fit model. Let X_train = matrix of predictors, y_train = matrix of variable.
# NOTE: Do not include a column for the intercept when fitting the model.
resLogit = logit.fit(X_train, y_train)

# Calculate matrix of predicted class probabilities.
# Check resLogit.classes_ to make sure that sklearn ordered your classes as expected
predProbs = resLogit.predict_proba(X_train)

# Design matrix -- add column of 1's at the beginning of your X_train matrix
X_design = np.hstack([np.ones((X_train.shape[0], 1)), X_train])

# Initiate matrix of 0's, fill diagonal with each predicted observation's variance
V = np.diagflat(np.product(predProbs, axis=1))

# Covariance matrix
# Note that the @-operater does matrix multiplication in Python 3.5+, so if you're running
# Python 3.5+, you can replace the covLogit-line below with the more readable:
# covLogit = np.linalg.inv(X_design.T @ V @ X_design)
covLogit = np.linalg.inv(np.dot(np.dot(X_design.T, V), X_design))
print("Covariance matrix: ", covLogit)

# Standard errors
print("Standard errors: ", np.sqrt(np.diag(covLogit)))

# Wald statistic (coefficient / s.e.) ^ 2
logitParams = np.insert(resLogit.coef_, 0, resLogit.intercept_)
print("Wald statistics: ", (logitParams / np.sqrt(np.diag(covLogit))) ** 2)


您的软件是否为您提供参数协方差(或方差-协方差)矩阵?如果是这样,则标准误差是该矩阵对角线的平方根。您可能想查阅教科书(或 google 获取大学讲义)以了解如何获取Vβ线性和广义线性模型的矩阵。
