机器算法验证 - 为什么二次（正态/拉普拉斯）逼近在多级模型上失败？ - 吾爱随笔录

在 Statistical Rethinking，第 2 版，第 13.1 节中，Richard McElreath 说：

为什么简单的二次逼近（例如使用 quap）不能与多级模型一起使用？当先验本身是参数的函数时，存在两个级别的不确定性。这意味着数据的概率（以参数为条件）必须在每个级别上平均。普通的二次逼近无法处理可能性的平均，因为通常不可能得出解析解。这意味着没有用于计算对数后验的统一函数。所以你的计算机不能直接找到它的最小值（后验的最大值）。

对应rethinking代码：

m12.2 <- ulam(
    alist(
        surv ~ dbinom(density, p),
        logit(p) <- a_tank[tank],
        a_tank[tank] ~ dnorm(a, sigma),
        a ~ dnorm(0, 1),
        sigma ~ dcauchy(0, 1)
    ),
    data=d, iter=4000, chains=4
)

对应Stan代码：

data{
    int<lower=1> N;
    int<lower=1> N_tank;
    int surv[N];
    int density[N];
    int tank[N];
}
parameters{
    vector[N_tank] a_tank;
    real a;
    real<lower=0> sigma;
}
model{
    vector[N] p;
    sigma ~ cauchy( 0 , 1 );
    a ~ normal( 0 , 1 );
    a_tank ~ normal( a , sigma );
    for ( i in 1:N ) {
        p[i] = a_tank[tank[i]];
    }
    surv ~ binomial_logit( density , p );
}
generated quantities{
    vector[N] p;
    for ( i in 1:N ) {
        p[i] = a_tank[tank[i]];
    }
}

我无法弄清楚“先验本身就是参数的函数”可能是一个问题。看起来它们只是组合成一个先验，所以与单级模型没有什么不同？

这是我对拉普拉斯近似的推导，也许我在某些步骤中错了，所以我无法理解理查德的陈述？请纠正我。

来自贝叶斯数据分析，第 3 版，P84。

\begin{aligned} p (θ | y) \approx N (\hat{θ}, (I (\hat{θ}))^{- 1}) \\ I (θ) = - \frac{d^{2}}{d θ^{2}} \log p (θ | y) \end{aligned}

$\begin{align*} p(\theta|y) \approx N(\hat{\theta}, (I(\hat{\theta}))^{-1}) \\ I(\theta) = -\frac{d^2}{d\theta^2} \log p(\theta|y) \end{align*}$

所以

I (θ) = - \frac{d^{2}}{d θ^{2}} \log p (θ | y) = - \frac{d^{2}}{d θ^{2}} (\log p (θ, y) - \log p (y)) = - \frac{d^{2}}{d θ^{2}} \log p (θ, y)

$I(\theta) = -\frac{d^2}{d\theta^2} \log p(\theta|y) = -\frac{d^2}{d\theta^2} (\log p(\theta,y) - \log p(y)) = -\frac{d^2}{d\theta^2} \log p(\theta,y)$

正如代码刚刚指定的联合分布 $\log p(\theta,y)$ ，喜欢：

function logp_theta_y(){
    target = 0
    target += dcauchy(sigma, 0, 1, log=TRUE);
    target += dnorm(a, 0, 1, log=TRUE);
    target += dnorm(a_tank, a, sigma, log=TRUE);
    for ( i in 1:N ) {
        target += dbinom(surv[i], density[i], logistic(a_tank[tank[i]]), log=TRUE);
    }
    return target;
}

我看不到导致二次逼近不起作用的任何缺失材料...

事实上，我运行以下Julia代码（对不起，我对 R 不太熟悉）来获得拉普拉斯近似值，发现它们彼此接近：

using CSV
using DataFrames
using Distributions
using DistributionsAD
using ForwardDiff
using Optim
using LinearAlgebra

df = DataFrame(CSV.File("reedfrogs.csv")) # https://github.com/rmcelreath/rethinking/blob/master/data/reedfrogs.csv

S = df.surv
N = df.density

N_tank = length(S)

logistic(x) = 1/(1+exp(-x))

function logp_y_p(par)
    a_tank = par[1:N_tank]
    a = par[N_tank+1]
    sigma = exp(par[N_tank+2])
    
    target = 0
    target += logpdf(Cauchy(0, 1), sigma) + par[N_tank+2]
    target += logpdf(Normal(0, 1), a)
    target += logpdf.(Normal.(a, sigma), a_tank) |> sum
    target += logpdf.(Binomial.(N, logistic.(a_tank)), S) |> sum
    
    target
end

f(x) = -logp_y_p(x)
x0 = randn(50)

res = optimize(f, x0, LBFGS(); autodiff = :forward)

par_est = Optim.minimizer(res)
FI = -ForwardDiff.hessian(logp_y_p, par_est)
Sigma = inv(FI)

function normal_to_lognormal(mu, sigma)
    exp(mu+sigma^2/2), sqrt((exp(sigma^2)-1) * exp(2*mu+sigma^2))
end

mu_sigma, sigma_sigma = normal_to_lognormal(par_est[end], sqrt(diag(Sigma)[end]))

mu_vec = copy(par_est)
mu_vec[end] = mu_sigma

sigma_vec = sqrt.(diag(Sigma))
sigma_vec[end] = sigma_sigma

[mu_vec sigma_vec]