机器算法验证 - 使用 R 计算逻辑回归中的系数 - 吾爱随笔录

使用 R 计算逻辑回归中的系数

机器算法验证物流回归系数

2022-02-16 14:08:07

在多元线性回归中，可以使用以下公式找出系数。

$b = (X'X)^{-1}(X')Y$

beta = solve(t(X) %*% X) %*% (t(X) %*% Y) ; beta

例如：

> y <- c(9.3, 4.8, 8.9, 6.5, 4.2, 6.2, 7.4, 6, 7.6, 6.1)
> x0 <- c(1,1,1,1,1,1,1,1,1,1) 
> x1 <-  c(100,50,100,100,50,80,75,65,90,90)
> x2 <- c(4,3,4,2,2,2,3,4,3,2)
> Y <- as.matrix(y)
> X <- as.matrix(cbind(x0,x1,x2))

> beta = solve(t(X) %*% X) %*% (t(X) %*% Y);beta
         [,1]
x0 -0.8687015
x1  0.0611346
x2  0.9234254
> model <- lm(y~+x1+x2) ; model$coefficients
(Intercept)          x1          x2 
 -0.8687015   0.0611346   0.9234254

我想如何以相同的“手动”方式计算逻辑回归的 beta。y 当然是 1 或 0。假设我使用的是带有 logit 链接的二项式系列。

2个回答

线性回归模型中的 OLS 估计器很少有可以用封闭形式表示的性质，即不需要表示为函数的优化器。然而，它是一个函数的优化器——残差平方和函数——并且可以这样计算。

逻辑回归模型中的 MLE 也是适当定义的对数似然函数的优化器，但由于它不能以封闭形式表达，因此必须将其计算为优化器。

大多数统计估计器只能表示为适当构造的数据函数的优化器，称为标准函数。这种优化器需要使用适当的数值优化算法。函数的优化器可以在 R 中使用optim()提供一些通用优化算法的函数或更专业的包之一（如optimx. 了解针对不同类型的模型和统计标准函数使用哪种优化算法是关键。

线性回归残差平方和

OLS 估计器被定义为著名的残差平方和函数的优化器：

\begin{aligned} \hat{β} & = \arg min_{β} {(Y - X β)}^{'} (Y - X β) \\ = (X^{'} X)^{- 1} X^{'} Y \end{aligned}

$\begin{align} \hat{\boldsymbol{\beta}} &= \arg\min_{\boldsymbol{\beta}}\left(\boldsymbol{Y} - \mathbf{X}\boldsymbol{\beta}\right)'\left(\boldsymbol{Y} - \mathbf{X}\boldsymbol{\beta}\right) \\ &= (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{Y} \end{align}$

在像残差平方和这样的二次可微凸函数的情况下，大多数基于梯度的优化器都做得很好。在这种情况下，我将使用 BFGS 算法。

#================================================
# reading in the data & pre-processing
#================================================
urlSheatherData = "http://www.stat.tamu.edu/~sheather/book/docs/datasets/MichelinNY.csv"
dfSheather = as.data.frame(read.csv(urlSheatherData, header = TRUE))

# create the design matrices
vY = as.matrix(dfSheather['InMichelin'])
mX = as.matrix(dfSheather[c('Service','Decor', 'Food', 'Price')])

# add an intercept to the predictor variables
mX = cbind(1, mX)

# the number of variables and observations
iK = ncol(mX)
iN = nrow(mX)

#================================================
# compute the linear regression parameters as 
#   an optimal value
#================================================
# the residual sum of squares criterion function
fnRSS = function(vBeta, vY, mX) {
  return(sum((vY - mX %*% vBeta)^2))
}

# arbitrary starting values
vBeta0 = rep(0, ncol(mX))

# minimise the RSS function to get the parameter estimates
optimLinReg = optim(vBeta0, fnRSS,
                   mX = mX, vY = vY, method = 'BFGS', 
                   hessian=TRUE)

#================================================
# compare to the LM function
#================================================
linregSheather = lm(InMichelin ~ Service + Decor + Food + Price,
                    data = dfSheather)

这产生：

> print(cbind(coef(linregSheather), optimLinReg$par))
                    [,1]         [,2]
(Intercept) -1.492092490 -1.492093965
Service     -0.011176619 -0.011176583
Decor        0.044193000  0.044193023
Food         0.057733737  0.057733770
Price        0.001797941  0.001797934

逻辑回归对数似然

Logistic回归模型中MLE对应的准则函数是对数似然函数。

\begin{aligned} \log L_{n} (β) & = \sum_{i = 1}^{n} (Y_{i} \log Λ (X_{i}^{'} β) + (1 - Y_{i}) \log (1 - Λ (X_{i}^{'} β))) \end{aligned}

$\begin{align} \log L_n(\boldsymbol{\beta}) &= \sum_{i=1}^n \left(Y_i \log \Lambda(\boldsymbol{X}_i'\boldsymbol{\beta}) + (1-Y_i)\log(1 - \Lambda(\boldsymbol{X}_i'\boldsymbol{\beta}))\right) \end{align}$ 在哪里

Λ (k) = 1 / (1 + \exp (- k))

$\Lambda(k) = 1/(1+ \exp(-k))$ 是逻辑函数。参数估计是这个函数的优化器

\hat{β} = \arg max_{β} \log L_{n} (β)

$\hat{\boldsymbol{\beta}} = \arg\max_{\boldsymbol{\beta}}\log L_n(\boldsymbol{\beta})$

我再次展示了如何optim()使用 BFGS 算法构建和优化标准函数。

#================================================
# compute the logistic regression parameters as 
#   an optimal value
#================================================
# define the logistic transformation
logit = function(mX, vBeta) {
  return(exp(mX %*% vBeta)/(1+ exp(mX %*% vBeta)) )
}

# stable parametrisation of the log-likelihood function
# Note: The negative of the log-likelihood is being returned, since we will be
# /minimising/ the function.
logLikelihoodLogitStable = function(vBeta, mX, vY) {
  return(-sum(
    vY*(mX %*% vBeta - log(1+exp(mX %*% vBeta)))
    + (1-vY)*(-log(1 + exp(mX %*% vBeta)))
    ) 
  ) 
}

# initial set of parameters
vBeta0 = c(10, -0.1, -0.3, 0.001, 0.01) # arbitrary starting parameters

# minimise the (negative) log-likelihood to get the logit fit
optimLogit = optim(vBeta0, logLikelihoodLogitStable,
                   mX = mX, vY = vY, method = 'BFGS', 
                   hessian=TRUE)

#================================================
# test against the implementation in R
# NOTE glm uses IRWLS: 
# http://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
# rather than the BFGS algorithm that we have reported
#================================================
logitSheather = glm(InMichelin ~ Service + Decor + Food + Price,
                                  data = dfSheather, 
                         family = binomial, x = TRUE)

这产生

> print(cbind(coef(logitSheather), optimLogit$par))
                    [,1]         [,2]
(Intercept) -11.19745057 -11.19661798
Service      -0.19242411  -0.19249119
Decor         0.09997273   0.09992445
Food          0.40484706   0.40483753
Price         0.09171953   0.09175369

需要注意的是，请注意数值优化算法需要小心使用，否则您最终可能会遇到各种病态的解决方案。在您很好地理解它们之前，最好使用可用的打包选项，让您专注于指定模型，而不是担心如何数值计算估计值。

你不能从这里到达那里。一般线性模型和逻辑模型的解都来自求解各自的最大似然方程，但只有线性模型具有封闭形式的解。

如果您查阅 McCullagh 和 Nelder 的书，您可以了解如何在逻辑案例（或其他广义模型）中获得解决方案。实际上，解决方案是迭代生成的，其中每次迭代都涉及求解加权线性回归。权重部分取决于链接功能。

其它你可能感兴趣的问题

上一篇Google Inception模型：为什么会有多个softmax？下一篇在 DAG 中使用“时间”作为因果变量是否合适？