机器算法验证 - 具有岭或融合岭惩罚的非负身份链接泊松回归 - 吾爱随笔录

我想用岭或融合岭惩罚来拟合非负身份链接泊松回归模型，即对拟合系数、泊松误差噪声和拟合系数的岭或融合岭惩罚具有非负性约束。

在标准非负最小二乘岭回归的情况下，我知道这可以很容易地拟合，方法是用对角矩阵沿对角线对原始协变量矩阵进行行扩充，然后用零Xp x psqrt(lambdas)yp扩充结果向量，然后进行非负最小二乘 ( nnls)使用这些增强矩阵/向量进行回归。

不过有两个问题：

1）这个配方是否也适用于非负身份链接泊松模型，我的想法是然后我会简单地从nnls拟合变为nnpois拟合（使用 EM 算法拟合，使用addreg包）。是否有人能够分析证明这确实等同于优化岭惩罚非负身份链接泊松似然？

2）这样的配方是否也适用于具有融合岭惩罚的非负最小二乘或非负泊松模型，其中协变量矩阵的行增强部分将不再diag(sqrt(lambda),p))像常规岭回归那样，而是sqrt(lambda)*(t(D) %*% D)具有D=diff(diag(p), 1)。如果我尝试这个，如果我不施加非负性约束（例如，使用常规lm.fit），它会起作用，但是一旦我将它插入到nnlsor中，它就会失败nnpois，因为它们只采用具有所有正元素的协变量矩阵。有一个简单的解决方法吗？

关于问题 1）我做了一些快速的数值测试，我确实得到了非常相似的结果，基于使用nnpois协变量矩阵行，diag(sqrt(lambda),p))并基于直接优化惩罚 ML 目标，但我正在寻找一个分析证明这是正确的做法（或者如果它不准确，那么它至少会非常接近实际的惩罚性 ML 目标）......有什么想法吗？

数值示例，这里是在试图从具有已知脉冲位置但未知模糊核的模糊脉冲的叠加中估计峰值形状函数的上下文中，并且模糊峰值的测量叠加上的噪声将是泊松：

# simulated data
require(Matrix)
n = 200
x = 1:n
npeaks = 20
set.seed(123)
u = sample(x, npeaks, replace=FALSE) # peak locations, which are assumed to be known here
peakhrange = c(10,1E3) # peak height range
h = 10^runif(npeaks, min=log10(min(peakhrange)), max=log10(max(peakhrange))) # peak heights, which are assumed to be known here
a = rep(0, n) # locations of spikes of simulated spike train, which are assumed to be known here
a[u] = h
gauspeak = function(x, u, w, h=1) h*exp(((x-u)^2)/(-2*(w^2))) # simulated peak shape, assumed to be unknown
bM = do.call(cbind, lapply(1:n, function (u) gauspeak(x, u=u, w=5, h=1) )) # banded matrix with theoretical peak shape function used
p = 50 # desired size of peak shape model
g_theor = gauspeak(x=1:p, u=p/2, w=5, h=1) # theoretical unknown peak shape function
y_nonoise = as.vector(bM %*% a) # noiseless simulated signal = linear convolution of spike train with peak shape function
y = rpois(n, y_nonoise) # simulated signal with random poisson noise on it
G = matrix(0, n + p - 1, p)
G = Matrix(G, sparse=TRUE)
for (k in 1:p) G[(1:n) + k - 1, k] = a 
G = G[(1:n) + floor(p/2) - 1, ]
par(mfrow=c(1,1))
plot(y, type="l", ylab="Signal", xlab="x", main="Known spike train (red) convoluted with unknown blur kernel and Poisson noise")
lines(a, type="h", col="red")

# peak shape function estimated using weighted nnls ridge regression via nnls with a row augmented covariate matrix
weights = 1/(y+1) # 1/variance obs weights to account for Poisson noise in weighted nonnegative least square approximation
weights = n*weights/sum(weights)
library(nnls)
lambda = 1E4 # regularization parameter for ridge penalty
library(microbenchmark)
microbenchmark(g_wnnls <- nnls(A=as.matrix(rbind(G*sqrt(weights), diag(sqrt(lambda),p))), b=c(y*sqrt(weights),rep(0,p)))$x) # 1.4 ms
g_wnnls = (g_wnnls-min(g_wnnls))/(max(g_wnnls)-min(g_wnnls))
dev.off()
par(mfrow=c(2,1))
plot(g_theor, type="l", lwd=7, col="grey", main="Ground truth (grey), nonnegative weighted LS ridge estimate (red)", ylab="Peak shape", xlab="x")
lines(g_wnnls, col="red", lwd=2) 

# peak shape function estimated using nonnegative Poisson adaptive ridge regression via nnpois with a row augmented covariate matrix
library(addreg)
lambda = 1E6 # regularization parameter for ridge penalty
adpenweights = (1/(g_wnnls^2+1E-5)) # adaptive penalty weights to do adaptive ridge regression using wnnls estimates as truncated Gaussian prior
adpenweights = n*adpenweights/sum(adpenweights)
lambdas = lambda*adpenweights # adaptive penalties for adaptive ridge penalty
system.time(g_nnpoisridge <- nnpois(y=c(y,rep(0,p)), # nonnegative identity link Poisson adaptive ridge regression, solved using EM algo
                       x=as.matrix(rbind(G, diag(sqrt(lambdas),p))),
                       standard=rep(1,p),
                       offset=0,
                       start=rep(1E-5, p), # or we can use g_wnnls+1E-5
                       accelerate="squarem")$coefficients) # 0.02
g_nnpoisridge = (g_nnpoisridge-min(g_nnpoisridge))/(max(g_nnpoisridge)-min(g_nnpoisridge))
plot(gauspeak(x=1:p, u=25, w=5, h=1), type="l", lwd=7, col="grey", main="Ground truth (grey), nonnegative Poisson adaptive ridge estimate (red)", ylab="Peak shape", xlab="x")
lines(g_nnpoisridge, col="red", lwd=2)
# PS: instead of using nnpois we could also use nnlm (NNLM package) with loss="mkl" (Kullback-Leibler) which is then solved using coordinate descent and gives almost the same result

# check that nonnegative poisson adaptive ridge regression is calculated correctly using ML maximization with port algo (Quasi-Newton BFGS)
# negative log-likelihood for identity link poisson ridge regression
NLL_RIDGEPOIS = function(coefs, X, y, lambdas) { 
  preds = X %*% coefs
  LLs = stats::dpois(y, lambda=preds, log=TRUE) # log likelihood contributions of each observation
  n = nrow(X)
  -(1/n)*sum(LLs) - lambdas*sum(coefs^2)
}
g_nnpoisridgeML = nlminb(start = g_nnpoisridge, # we use our estimates above as starting values 
                         objective = NLL_RIDGEPOIS, X = as.matrix(G), y = y, lambdas=lambdas,
                         control = list(iter.max=1000, abs.tol=1E-20, rel.tol=1E-15), 
                         lower = rep(0,p) # we use nonnegativity constraints 
)$par 
max(g_nnpoisridgeML) # not sure why I get such a large value here, 6.269765e+68, should be around 1 as above - maybe mistake here?
# as a quick fix I rescale coefficients
g_nnpoisridgeML = (g_nnpoisridgeML-min(g_nnpoisridgeML))/(max(g_nnpoisridgeML)-min(g_nnpoisridgeML))
max(abs(g_nnpoisridgeML-g_nnpoisridge)) # 0.01726672 - ML estimate is close to solution above, but not identical - could be optimizer issue though or mistake in code...
par(mfrow=c(1,1))
plot(g_theor,col="grey", type="l", lwd=5, main="Ground truth (grey), nonnegative Poisson adaptive ridge estimate (red),\n nonnegative Poisson adaptive ridge ML estimate (blue)", ylab="Peak shape", xlab="x")
lines(g_nnpoisridge,col="red",lwd=2)
lines(g_nnpoisridgeML,col="blue",lwd=2)

编辑：只是认为最终它可能是准确的-原因是您可以使用迭代重新加权的非负最小二乘法轻松拟合非负 GLM（就像常规 IRLS 算法一样，但用 weighted 代替常规 solve / lm.wfit nnls）和身份链接你只需要回归yhat你的协变量矩阵，权重W只是 1/预期方差（即在 IRLS 算法中，http: //bwlewis.github.io/GLM/z=yhat和W=1/expected variance=1/yhat泊松噪声）。

因此，鉴于无论如何我们只是在进行加权（非负）最小二乘，通过行增强在每次迭代中添加岭惩罚是没有问题的（行增强部分在每次迭代中也将保持不变，并且Ws 将被视为一个对于增强的部分）并且结果是准确的。作为方差权重的初始化1/(y+eps)（R实际上eps=0.1在glm.fit源代码中使用），这也给出了已经非常接近第一次迭代中最终估计的估计（这就是我上面的加权非负最小二乘估计的原因效果很好 - 它对应于迭代重新加权的 nnls 的 1 次迭代，以适应非负身份链接 Poisson GLM）。