机器算法验证 - 使用 MCMC 从已知密度的双变量分布中采样 - 吾爱随笔录

我试图从双变量密度进行模拟 $p(x,y)$ 在 R 中使用 Metropolis 算法并没有运气。密度可以表示为 $p(y|x)p(x)$ ，在哪里 $p(x)$ 是 Singh-Maddala 分布

$p(x)=\dfrac{aq x^{a-1}}{b^a (1 + (\frac{x}{b})^a)^{1+q}}$

带参数 $a$ , $q$ , $b$ ，和 $p(y|x)$ 是对数正态的，对数均值是 $x$ , 和 log-sd 一个常数。为了测试我的样本是否是我想要的样本，我查看了 $x$ ，应该是 $p(x)$ . 我从 R 包 MCMCpack、mcmc 和 dream 中尝试了不同的 Metropolis 算法。我丢弃了老化，使用了减薄，使用了尺寸高达百万的样品，但由此产生的边际密度从来不是我提供的。

这是我使用的代码的最终版本：

logvrls <- function(x,el,sdlog,a,scl,q.arg) {
    if(x[2]>0) {
         dlnorm(x[1],meanlog=el*log(x[2]),sdlog=sdlog,log=TRUE)+
         dsinmad(x[2],a=a,scale=scl,q.arg=q.arg,log=TRUE)
    }
    else -Inf    
}

a <- 1.35
q <- 3.3
scale <- 10/gamma(1 + 1/a)/gamma(q - 1/a)*  gamma(q) 

Initvrls <- function(pars,nseq,meanlog,sdlog,a,scale,q) {
    cbind(rlnorm(nseq,meanlog,sdlog),rsinmad(nseq,a,scale,q))
}

library(dream)
aa <- dream(logvrls,
        func.type="logposterior.density",
        pars=list(c(0,Inf),c(0,Inf)),
        FUN.pars=list(el=0.2,sdlog=0.2,a=a,scl=scale,q.arg=q),
        INIT=Initvrls,
        INIT.pars=list(meanlog=1,sdlog=0.1,a=a,scale=scale,q=q),
        control=list(nseq=3,thin.t=10)
        )

我已经选择了梦想包，因为它采样直到收敛。我已经通过三种方式测试了我是否有正确的结果。使用 KS 统计量，比较分位数，并从结果样本中以最大似然估计 Singh-Maddala 分布的参数：

ks.test(as.numeric(aa$Seq[[2]][,2]),psinmad,a=a,scale=scale,q.arg=q)

lsinmad <- function(x,sample)
    sum(dsinmad(sample,a=x[1],scale=x[2],q.arg=x[3],log=TRUE))
 optim(c(2,20,2),lsinmad,method="BFGS",sample=aa$Seq[[1]][,2])

 qq <- eq(0.025,.975,by=0.025)   
 tst <- cbind(qq,
              sapply(aa$Seq,function(l)round(quantile(l[,2],qq),3)),
              round(qsinmad(qq,a,scale,q),3))
 colnames(tst) <- c("Quantile","S1","S2","S3","True")

 library(ggplot2)
 qplot(x=Quantile,y=value,
       data=melt(data.frame(tst),id=1), 
       colour=variable,group=variable,geom="line")

当我查看这些比较的结果时，KS 统计量几乎总是拒绝零假设，即样本来自 Singh-Maddala 分布并提供参数。最大似然估计参数有时会接近其真实值，但通常离舒适区太远，无法接受采样过程是成功的。分位数也是如此，经验分位数不是太远，而是太远了。

我的问题是我做错了什么？我自己的假设：

MCMC 不适用于此类采样
由于理论上的原因，MCMC 不能收敛（分布函数不满足所需的属性，无论它们是什么）
我没有正确使用 Metropolis 算法
我的分布测试不正确，因为我没有独立样本。