Metropolis-Hastings 用于线性回归,先于 sigma?

机器算法验证 回归 贝叶斯 马尔可夫链蒙特卡罗 异方差
2022-04-03 23:08:55

出于好奇,我正在尝试为贝叶斯线性回归构建一个 Metropolis-Hastings 采样器。下面,您将注意到我的脚本,更具体地说,是内联注释,注意注释掉/注释多行以更改脚本的行为。

照原样,采样器迭代地建议对 b(斜率)或 a(斜率截距)进行更改。但是,线性函数 y=bx+a 的这些参数不会同时更新。效果很好!但是,照原样,该脚本不建议对线性函数的标准偏差 s 进行更改。当我详细更改代码块时,采样器无法更改。它只是停滞在初始可能值。

我的问题是:

(1) 采样不同的 sigma 值有什么好处?我对带有 s 常数的 b 和 a 有了很好的理解。
(2) 我提议修改 s 错了吗?我知道它不能是负数,但它也需要从对称分布中采样,允许增加和减少。我使用了当前值的绝对值加上一些随机变化。(-0.15 -> 0.15) (3) sigma 有更好的优先选择吗?我正在使用反伽玛。另外,你会注意到我的分布函数都没有涉及标准化常数,因为这在 MH 中通常是不必要的。

我的代码:

import numpy as np
import random

def normalPDF(x,mu,sigma):
  num = np.exp((x-mu)**2/-2*sigma**2)
  return num

def invGamma(x,a,b):
  non_zero = int(x>=0)
  func = x**(a-1)*np.exp(-x/b)
  return non_zero*func

def lr_mcmc(X,Y,hops=10_000):
  samples = []
  
  curr_b = 1
  curr_a = 1
  curr_s = 1

  prior_b_curr = normalPDF(x=curr_b,mu=2,sigma=1)
  prior_a_curr = normalPDF(x=curr_a,mu=1,sigma=1)
  prior_s_curr = invGamma(x=curr_s, a=2,b=2)
  
  log_lik_curr = sum([np.log(normalPDF(x=curr_b*x + curr_a,mu=y,sigma=curr_s)) for x,y in zip(X,Y)])
  current_numerator =  log_lik_curr + np.log(prior_a_curr) + np.log(prior_b_curr) + np.log(prior_s_curr)

  count = 0
  for i in range(hops):
    samples.append((curr_b,curr_a,curr_s))    

    if count == 0:
      mov_b = curr_b + random.uniform(-0.25,0.25)
      mov_a = curr_a 
      mov_s = curr_s
      count += 1

    elif count == 1:
      mov_a = curr_a + random.uniform(-0.25,0.25)
      mov_b = curr_b
      mov_s = curr_s

      # to change behavior:
      # count += 1 # uncomment line 
      count = 0    # comment line out

      # to change behavior, uncomment below code block:
    # else:
    #   mov_s = np.abs(curr_s + random.uniform(-0.25,0.25))
    #   mov_b = curr_b
    #   mov_a = curr_a
    #   count = 0

    prior_b_mov = normalPDF(x=mov_b,mu=2,sigma=1)
    prior_a_mov = normalPDF(x=mov_a,mu=1,sigma=1)
    prior_s_mov = invGamma(x=mov_s,a=2,b=2)
    log_lik_mov = sum([np.log(normalPDF(x=mov_b*x + mov_a,mu=y,sigma=mov_s)) for x,y in zip(X,Y)])
    movement_numerator = log_lik_mov + np.log(prior_a_mov) + np.log(prior_b_mov) + np.log(prior_s_mov)
  
    ratio = np.exp(movement_numerator - current_numerator)
    event = random.uniform(0,1)
    if event <= ratio:
      curr_b = mov_b
      curr_a = mov_a
      current_numerator = movement_numerator
      
  return samples

test2 = lr_mcmc(Y=y,X=x,hops=25_000)
sns.kdeplot([test2[i][0] for i in range(len(test2))],[test2[i][1] for i in range(len(test2))],cmap="inferno",shade=True)  

我的情节在没有代码块更改的情况下成功运行。x 轴 = 斜率,y 轴 = y 截距。 图形

以及我更改代码时的错误

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:57: RuntimeWarning: invalid value encountered in double_scalars
/usr/local/lib/python3.6/dist-packages/statsmodels/nonparametric/kernels.py:128: RuntimeWarning: divide by zero encountered in true_divide
  return (1. / np.sqrt(2 * np.pi)) * np.exp(-(Xi - x)**2 / (h**2 * 2.))
/usr/local/lib/python3.6/dist-packages/statsmodels/nonparametric/kernels.py:128: RuntimeWarning: invalid value encountered in true_divide
  return (1. / np.sqrt(2 * np.pi)) * np.exp(-(Xi - x)**2 / (h**2 * 2.))
/usr/local/lib/python3.6/dist-packages/matplotlib/contour.py:1483: UserWarning: Warning: converting a masked element to nan.
  self.zmax = float(z.max())
/usr/local/lib/python3.6/dist-packages/matplotlib/contour.py:1484: UserWarning: Warning: converting a masked element to nan.
  self.zmin = float(z.min())
/usr/local/lib/python3.6/dist-packages/matplotlib/contour.py:1132: RuntimeWarning: invalid value encountered in less
  under = np.nonzero(lev < self.zmin)[0]
/usr/local/lib/python3.6/dist-packages/matplotlib/contour.py:1134: RuntimeWarning: invalid value encountered in greater
  over = np.nonzero(lev > self.zmax)[0]
<matplotlib.axes._subplots.AxesSubplot at 0x7f614fe62ba8>

当我查看样本时,它只是所有 25,00 次迭代的一个 b,a,s 组合。

[(1, 1, 1),
 (1, 1, 1),
 (1, 1, 1),
 (1, 1, 1),
 (1, 1, 1),
 (1, 1, 1),
 (1, 1, 1),
 ...
]
1个回答

您的采样器存在一些小问题。首先,您应该始终以对数比例计算 PDF 或 PMF,因为对于任何重要的问题,您都可能会遇到上溢/下溢问题(根据经验,这种情况发生得非常快)。我已将这些功能切换为对数刻度,但值得检查我是否正确完成了此操作。

您的采样器在您发布时无法正常工作的原因主要是因为您没有分配curr_smov_s(即,如果 Metropolis-Hastings 比率高于统一值,您没有更新当前值)。

对于对称非负提案,我见过人们使用对数正态分布。但是,您也可以将接受率调整为 Metropolis-Hastings 比率而不是 Metropolis 比率(即,包括提案的密度比率以及后验比率,请参阅以获得快速指南)。我不认为截断的均匀分布是对称的,因此您的采样器在 sigma < 0.25 值时的行为可能不正确。

您也没有定义您的数据 ( X, y) 或导入您使用的所有库,这总是很好的,因为它可以更轻松地加载和调试您的代码。

如果我可以大胆地建议一些简单的后续步骤 - 允许任意设计矩阵而不是简单的斜率和截距将非常简单。

至于你的问题,

采样不同的 sigma 值有什么好处?

对于模型,yiN(Xiβ,σ2),在大多数情况下,很自然地假设您不知道条件均值周围的噪声大小。这个幅度由σ2. 我很难想到回归系数(β) 是未知的,但残差的大小ϵi=yiXiβ是已知的。

import numpy as np
import random
import seaborn as sns

def normalPDF(x,mu,sigma):
  num = np.exp(-1/2*((x-mu)/sigma)**2)
  den = np.sqrt(2*np.pi)*sigma
  return num/den

def invGamma(x,a,b):
  non_zero = int(x>=0)
  func = x**(a-1)*np.exp(-x/b)
  return non_zero*func

def lr_mcmc(X,Y,hops=10_000):
  samples = []
  curr_a = random.gauss(1,1)
  curr_b = random.gauss(2,1)
  curr_s = random.uniform(3,1)

  prior_a_curr = normalPDF(x=curr_a,mu=1,sigma=1)
  prior_b_curr = normalPDF(x=curr_b,mu=2,sigma=1)
  prior_s_curr = invGamma(x=curr_s,a=3,b=1)
  
  log_lik_curr = sum([np.log(normalPDF(x=curr_b*x + curr_a,mu=y,sigma=curr_s)) for x,y in zip(X,Y)])
  current_numerator =  log_lik_curr + np.log(prior_a_curr) + np.log(prior_b_curr) + np.log(prior_s_curr)

  count = 0
  for i in range(hops):
    samples.append((curr_b,curr_a,curr_s))    

    if count == 0: #propose movement to b
      mov_a = curr_a
      mov_b = curr_b + random.uniform(-0.25,0.25) 
      mov_s = curr_s
      count += 1

    elif count == 1: #propose movement to a
      mov_a = curr_a + random.uniform(-0.25,0.25)
      mov_b = curr_b
      mov_s = curr_s
      count += 1

    else: #propose movement to s
      mov_a = curr_a
      mov_b = curr_b
      mov_s = curr_s + random.uniform(-0.25,0.25)
      count = 0

    prior_b_mov = normalPDF(x=mov_b,mu=2,sigma=1)
    prior_a_mov = normalPDF(x=mov_a,mu=1,sigma=1)
    prior_s_mov = invGamma(x=mov_s,a=3,b=1)
    if prior_s_mov <=0: 
      continue #automatically reject because variance cannot equal 0.
    
    log_lik_mov = sum([np.log(normalPDF(x=mov_b*x + mov_a,mu=y,sigma=mov_s)) for x,y in zip(X,Y)])
    movement_numerator = log_lik_mov + np.log(prior_a_mov) + np.log(prior_b_mov) + np.log(prior_s_mov)
  
    ratio = np.exp(movement_numerator - current_numerator)
    event = random.uniform(0,1)
    if event <= ratio:
      curr_b = mov_b
      curr_a = mov_a
      curr_s = mov_s
      current_numerator = movement_numerator
      
  return samples


beta = np.random.normal(0, 1, [1, ])
X = np.random.normal(0, 1, [20, 1])
y = np.matmul(X, beta)

test2 = lr_mcmc(X=X, y=y, hops=25_000)



sns.kdeplot([test2[i][0] for i in range(len(test2))],[test2[i][1] for i in range(len(test2))],cmap="inferno",shade=True)  

plt.show()
```