与 MCMC Metropolis-Hastings 变体混淆:Random-Walk、Non-Random-Walk、Independent、Metropolis

机器算法验证 马尔可夫链蒙特卡罗 大都会黑斯廷斯
2022-02-03 23:26:37

在过去的几周里,我一直在尝试理解 MCMC 和 Metropolis-Hastings 算法。每次我认为我理解它时,我意识到我错了。我在网上找到的大多数代码示例都实现了与描述不一致的东西。即:他们说他们实现了 Metropolis-Hastings,但他们实际上实现了随机游走大都会。其他人(几乎总是)默默地跳过黑斯廷斯校正比率的实施,因为他们使用的是对称提议分布。实际上,到目前为止,我还没有找到一个简单的例子来计算比率。这让我更加困惑。有人可以给我以下代码示例(任何语言):

  • Vanilla Non-Random Walk Metropolis-Hastings Algorithm with Hastings Correction ratio 计算(即使在使用对称提议分布时最终为 1)。
  • Vanilla Random Walk Metropolis-Hastings 算法。
  • Vanilla Independent Metropolis-Hastings 算法。

不需要提供 Metropolis 算法,因为如果我没记错的话,Metropolis 和 Metropolis-Hastings 之间的唯一区别是第一个总是从对称分布中采样,因此它们没有 Hastings 校正比率。无需对算法进行详细解释。我确实了解基础知识,但我对 Metropolis-Hastings 算法的不同变体的所有不同名称以及如何在 Vanilla 非随机游走 MH 上实际实现 Hastings 校正率感到困惑。请不要复制部分回答我的问题的粘贴链接,因为我很可能已经看过它们。这些链接使我感到困惑。谢谢你。

2个回答

给你 - 三个例子。为了使逻辑更清晰(我希望),我已经使代码的效率低于实际应用程序中的效率。

# We'll assume estimation of a Poisson mean as a function of x
x <- runif(100)
y <- rpois(100,5*x)  # beta = 5 where mean(y[i]) = beta*x[i]

# Prior distribution on log(beta): t(5) with mean 2 
# (Very spread out on original scale; median = 7.4, roughly)
log_prior <- function(log_beta) dt(log_beta-2, 5, log=TRUE)

# Log likelihood
log_lik <- function(log_beta, y, x) sum(dpois(y, exp(log_beta)*x, log=TRUE))

# Random Walk Metropolis-Hastings 
# Proposal is centered at the current value of the parameter

rw_proposal <- function(current) rnorm(1, current, 0.25)
rw_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.25, log=TRUE)
rw_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.25, log=TRUE)

rw_alpha <- function(proposal, current) {
   # Due to the structure of the rw proposal distribution, the rw_p_proposal_given_current and
   # rw_p_current_given_proposal terms cancel out, so we don't need to include them - although
   # logically they are still there:  p(prop|curr) = p(curr|prop) for all curr, prop
   exp(log_lik(proposal, y, x) + log_prior(proposal) - log_lik(current, y, x) - log_prior(current))
}

# Independent Metropolis-Hastings
# Note: the proposal is independent of the current value (hence the name), but I maintain the
# parameterization of the functions anyway.  The proposal is not ignorable any more
# when calculation the acceptance probability, as p(curr|prop) != p(prop|curr) in general.

ind_proposal <- function(current) rnorm(1, 2, 1) 
ind_p_proposal_given_current <- function(proposal, current) dnorm(proposal, 2, 1, log=TRUE)
ind_p_current_given_proposal <- function(current, proposal) dnorm(current, 2, 1, log=TRUE)

ind_alpha <- function(proposal, current) {
   exp(log_lik(proposal, y, x)  + log_prior(proposal) + ind_p_current_given_proposal(current, proposal) 
       - log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}

# Vanilla Metropolis-Hastings - the independence sampler would do here, but I'll add something
# else for the proposal distribution; a Normal(current, 0.1+abs(current)/5) - symmetric but with a different
# scale depending upon location, so can't ignore the proposal distribution when calculating alpha as
# p(prop|curr) != p(curr|prop) in general

van_proposal <- function(current) rnorm(1, current, 0.1+abs(current)/5)
van_p_proposal_given_current <- function(proposal, current) dnorm(proposal, current, 0.1+abs(current)/5, log=TRUE)
van_p_current_given_proposal <- function(current, proposal) dnorm(current, proposal, 0.1+abs(proposal)/5, log=TRUE)

van_alpha <- function(proposal, current) {
   exp(log_lik(proposal, y, x)  + log_prior(proposal) + ind_p_current_given_proposal(current, proposal) 
       - log_lik(current, y, x) - log_prior(current) - ind_p_proposal_given_current(proposal, current))
}


# Generate the chain
values <- rep(0, 10000) 
u <- runif(length(values))
naccept <- 0
current <- 1  # Initial value
propfunc <- van_proposal  # Substitute ind_proposal or rw_proposal here
alphafunc <- van_alpha    # Substitute ind_alpha or rw_alpha here
for (i in 1:length(values)) {
   proposal <- propfunc(current)
   alpha <- alphafunc(proposal, current)
   if (u[i] < alpha) {
      values[i] <- exp(proposal)
      current <- proposal
      naccept <- naccept + 1
   } else {
      values[i] <- exp(current)
   }
}
naccept / length(values)
summary(values)

对于香草采样器,我们得到:

> naccept / length(values)
[1] 0.1737
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.843   5.153   5.388   5.378   5.594   6.628 

这是一个较低的接受概率,但仍然......调整提案将在这里有所帮助,或者采用不同的提案。这是随机游走提案的结果:

> naccept / length(values)
[1] 0.2902
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.718   5.147   5.369   5.370   5.584   6.781 

类似的结果,正如人们所希望的那样,以及更好的接受概率(一个参数的目标是~50%。)

并且,为了完整起见,独立采样器:

> naccept / length(values)
[1] 0.0684
> summary(values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.990   5.162   5.391   5.380   5.577   8.802 

因为它不“适应”后验的形状,所以它往往具有最差的接受概率,并且最难针对这个问题进行调整。

请注意,一般而言,我们更喜欢尾巴较粗的提案,但这是另一个话题。

看:

通过构造,该算法不依赖于归一化常数,因为重要的是 pdf 的比率。提案中算法的变化 pdfq()不对称是由于 Hasting (1970),因此该算法通常也称为 Metropolis-Hasting。此外,这里描述的是全局 Metropolis 算法,与局部算法不同,在局部算法中,循环仅影响X.

维基百科文章是一个很好的补充阅读。正如你所看到的,Metropolis 也有一个“修正率”,但是,如上所述,Hastings 引入了一个允许非对称提案分布的修改。

Metropolis 算法mcmc在命令下的 R 包中实现metrop()

其他代码示例:

http://www.mas.ncl.ac.uk/~ndjw1/teaching/sim/metrop/

http://pcl.missouri.edu/jeff/node/322

http://darrenjw.wordpress.com/2010/08/15/metropolis-hastings-mcmc-algorithms/