机器算法验证 - 在 JAGS/BUGS 中指定零膨胀（障碍）伽马模型 - 吾爱随笔录

在 JAGS/BUGS 中指定零膨胀（障碍）伽马模型

机器算法验证 r 回归贝叶斯锯齿错误

2022-03-27 10:30:42

我正在尝试使用零膨胀伽马模型（或伽马“障碍”模型）。该模型是逻辑回归和广义线性建模的混合体。我可以分两步进行此分析：1）对存在/不存在数据进行逻辑回归，然后 2）使用具有正值伽马分布的广义线性模型。我正在尝试将模型设置为，如果 y = 0，则 E(y) = p，但如果 y > 0，则 E(y) 是伽马分布的。

我正在尝试在 BUGS/JAGS 中进行设置，因为我之前已经看到这些模型适用于 poisson-distributions。我试图将该链接中的代码调整为伽马分布，但我似乎无法让它工作。我知道代码不起作用，因为我的数据为零，并且似然函数不能用零进行评估。但是，即使限制为正值，我也会收到无效父值的错误。即便如此，我什至不确定模型是否正确指定。

这是模型：

# For the ones trick
C <- 10000

# for every observation
for(i in 1:N){
    # log-likelihood of the observation from the gamma likelihood
    LogPos[i] <- (shape[i] - 1)*log(y[i]) - y[i]/scale[i] - N*shape[i]*log(scale[i]) - N*loggam(scale[i])
    #likelihood
    Lpos[i] <- exp(LogPos[i])

    # redefine the shape and rate parameters as a function of the mean and sd
    shape[i] <- pow(mu[i], 2) / pow(sd, 2)
    scale[i] <- mu[i] / pow(sd, 2)

    # mu is a function of MTD: use the inverse link
    #mu[i] <- 1/eta[i]
    mu[i] <- beta0 + beta1*MTD[i]


    # zero-inflated part, where w[i] is the probability of being zero
    w[i] <- exp(zeta[i]) / (1 + exp(zeta[i]))
    zeta[i] <- gamma0 + gamma1*MPD[i]

    # ones trick
    p[i] <- Lpos[i] / C
    ones[i] ~ dbern(p[i])

    # Full likelihood
    Lik[i] <- Lpos[i]*(1 - w[i]) + equals(y[i], 0)*w[i]
  } 

# PRIORS
beta0 ~ dnorm(0, 0.001)
beta1 ~ dnorm(0, 0.001)

gamma0 ~ dnorm(0, 0.001)
gamma1 ~ dnorm(0, 0.001)

sd ~ dunif(0, 100)

有没有人像这样设置模型或对如何正确设置有任何建议？

更新

我尝试了一组类似但略有不同的新代码。我还没有让它工作

model{

  # For the ones trick
  C <- 10000

  # for every observation
  for(i in 1:N){

    # make a dummy variable that is 0 if y is < 0.0001 and 1 if y > 0.0001. This is essentially a presence
    # absence dummy variable
    z[i] <- step(y[i] - 0.0001)

    # define the logistic regression model, where w is the probability of occurance.
    # use the logistic transformation exp(z)/(1 + exp(z)), where z is a linear function
    w[i] <- exp(zeta[i]) / (1 + exp(zeta[i]))
    zeta[i] <- gamma0 + gamma1*MPD[i]

    # define the gamma regression model for the mean. use the log link the ensure positive, non-zero mu
    mu[i] <- exp(eta[i])
    eta[i] <- beta0 + beta1*MTD[i]

    # redefine the mu and sd of the continuous part into the shape and scale parameters
    shape[i] <- pow(mu[i], 2) / pow(sd, 2)
    scale[i] <- mu[i] / pow(sd, 2)

    # for readability, define the log-likelihood of the gamma here
    logGamma[i] <- (shape[i] - 1)*log(y[i]) - y[i]/scale[i] - shape[i]*log(scale[i]) - loggam(scale[i])

    # define the total likelihood, where the likelihood is (1 - w) if y < 0.0001 (z = 0) or
    # the likelihood is w * gammalik if y >= 0.0001 (z = 1). So if z = 1, then the first bit must be
    # 0 and the second bit 1. Use 1 - z, which is 0 if y > 0.0001 and 1 if y < 0.0001
    logLik[i] <- (1 - z[i]) * log(1 - w[i]) + z[i] * ( log(w[i]) + logGamma[i] )

    # Use the ones trick
    p[i] <- logLik[i] / C
    ones[i] ~ dbern(p[i])
  } 

  # PRIORS
  beta0 ~ dnorm(0, 0.001)
  beta1 ~ dnorm(0, 0.001)

  gamma0 ~ dnorm(0, 0.001)
  gamma1 ~ dnorm(0, 0.001)

  sd ~ dgamma(1, 2)

}

更新 2

我通过删除 logGamma[i] 的定义并将其直接放入似然函数中来运行它，现在读取为：

logLik[i] <- (1 - z[i]) * log(1 - w[i]) + z[i] * ( log(w[i]) + (shape[i] - 1)*log(y[i]) - y[i]/scale[i] - shape[i]*log(scale[i]) - loggam(scale[i]) )

问题是 JAGS 试图首先评估所有观测值的对数似然性，导致 NA 为 0。以新的方式，它仅在 z = 1 时评估它（我认为）。我仍然无法让参数估计对齐。例如，伽玛几乎与相同形式的逻辑回归相同（万岁！）。但是 beta 与正值的伽马 GLM 相去甚远。我不知道这是否正常，但我怀疑我的模型规格仍然存在问题。

1个回答

在 Martyn Plummer 的帮助下，我找到了答案。我的代码使用伽马模型的反向链接（并且没有预测变量的逆）。此外，此代码需要 JAGS 的“glm”模块。

model{

  # For the ones trick
  C <- 10000

  # for every observation
  for(i in 1:N){

    # define the logistic regression model, where w is the probability of occurance.
    # use the logistic transformation exp(z)/(1 + exp(z)), where z is a linear function
    logit(w[i]) <- zeta[i]
    zeta[i] <- gamma0 + gamma1*MPD[i] + gamma2*MTD[i] + gamma3*int[i] + gamma4*MPD[i]*int[i] + gamma5*MTD[i]*int[i]

    # define the gamma regression model for the mean. use the log link the ensure positive, non-zero mu
    mu[i] <- pow(eta[i], -1)
    eta[i] <- beta0 + beta1*MPD[i] + beta2*MTD[i] + beta3*int[i] + beta4*MPD[i]*int[i] + beta5*MTD[i]*int[i]

    # redefine the mu and sd of the continuous part into the shape and scale parameters
    shape[i] <- pow(mu[i], 2) / pow(sd, 2)
    rate[i] <- mu[i] / pow(sd, 2)

    # for readability, define the log-likelihood of the gamma here
    logGamma[i] <- log(dgamma(y[i], shape[i], rate[i]))

    # define the total likelihood, where the likelihood is (1 - w) if y < 0.0001 (z = 0) or
    # the likelihood is w * gammalik if y >= 0.0001 (z = 1). So if z = 1, then the first bit must be
    # 0 and the second bit 1. Use 1 - z, which is 0 if y > 0.0001 and 1 if y < 0.0001
    logLik[i] <- (1 - z[i]) * log(1 - w[i]) + z[i] * ( log(w[i]) + logGamma[i] )

    Lik[i] <- exp(logLik[i])

    # Use the ones trick
    p[i] <- Lik[i] / C
    ones[i] ~ dbern(p[i])
  }

  # PRIORS
  beta0 ~ dnorm(0, 0.0001)
  beta1 ~ dnorm(0, 0.0001)
  beta2 ~ dnorm(0, 0.0001)
  beta3 ~ dnorm(0, 0.0001)
  beta4 ~ dnorm(0, 0.0001)
  beta5 ~ dnorm(0, 0.0001)

  gamma0 ~ dnorm(0, 0.0001)
  gamma1 ~ dnorm(0, 0.0001)
  gamma2 ~ dnorm(0, 0.0001)
  gamma3 ~ dnorm(0, 0.0001)
  gamma4 ~ dnorm(0, 0.0001)
  gamma5 ~ dnorm(0, 0.0001)

  sd ~ dgamma(2, 2)

}

其它你可能感兴趣的问题

上一篇来自 flexsurvreg 的标准错误下一篇两个正态分布变量的乘积分布何时接近正态分布？