机器算法验证 - R中的威布尔生存模型 - 吾爱随笔录

R中的威布尔生存模型

机器算法验证 r 生存

2022-02-27 03:47:34

如果我使用代码在 R 中运行 Weibull 生存模型

survreg(Surv(t,delta)~expalatory variables, dist="w")

如何解释模型的输出？也就是说，模型的形式只是 $1-\exp(e/\lambda)^k)$ 和 $\lambda$ 规模和 $k$ 形状还是采取不同的形式？

我发现了一些东西，它说输出的形式是

\exp (- \exp (- α_{0} - α_{1} - \dots)^{k} x^{k}),

$\exp(-\exp(-\alpha_0-\alpha_1-\ldots)^kx^k),$ 在哪里

α_{i}

$\alpha_i$ 是协变量的系数。如果是这样，输出将为我提供以下参数：

k = k and λ = \frac{1}{\exp (- α_{0} - α_{1} \dots)} .

$k=k \quad \mbox{and} \quad \lambda=\frac{1}{\exp(-\alpha_0-\alpha_1\ldots)}.$

3个回答

好的，所以我将使用 DWin 描述的 R 帮助在此处发布答案。使用rweibullR 中的函数给出了 Weibull 分布的通常形式，其累积函数为：

F (x) = 1 - \exp (- {(\frac{x}{b})}^{a})

$F(x)=1-\exp(-\left ( \frac{x}{b}\right )^a)$

所以我们将形状参数表示rweibull为 $a$ 和比例参数rweibullby $b$ .

现在的问题是的输出survreg给出的形状和比例参数与的形状和比例参数不同rweibull。让我们将形状参数表示survreg为 $a_s$ 和比例参数survregby $b_s$ .

然后，?survreg我们有：

survreg 的比例 = 1/(rweibull 形状)

survreg 的截距 = log(rweibull scale)

所以这给了我们：

a = \frac{1}{b_{s}} and b = \exp (a_{s})

$a=\frac{1}{b_s}\quad \mbox{and} \quad b=\exp(a_s)$

所以如果我们假设我们运行这个survreg函数 $n$ 协变量，那么输出将是：

$\alpha_0,\ldots, \alpha_{n-1}$ ，协变量的系数和一些尺度参数 $k$ . 以标准形式给出的 Weibull 模型由下式给出：

F (x) = 1 - \exp (- {(\frac{x}{\exp (α_{0} + α_{1} + \dots + α_{n - 1})})}^{\frac{1}{k}})

$F(x)=1-\exp\left (- \left (\frac{x}{\exp(\alpha_0+\alpha_1+\ldots +\alpha_{n-1})} \right ) ^{\frac{1}{k}}\right )$

为了进一步清楚，我想添加一个带有代码示例的答案。

我们基本上追求的是获取survreg输出模型并从中导出生存函数。为了避免常见的符号混淆，我将继续展示执行此操作的代码：

fit <- survreg(Surv(time,status) ~ age, data=stanford2) # this is the survreg output model
survreg.lp <- predict(fit, type = "lp")
survreg.scale <- fit$scale

# this is the survival function!
S_t <- function(t, survreg.scale, survreg.lp){
  shape <- 1/survreg.scale 
  scale <- exp(survreg.lp)
  ans <- 1 - pweibull(t, shape = shape, scale = scale)
}

正如vkehayas R 的pweibull参数化所提到的：

F (x) = 1 - e x p (- {(\frac{x}{b})}^{a}

$F(x) = 1-exp(-\left(\frac{x}{b}\right)^a$

其中a是威布尔分布形状，b是尺度。

然后我们得到a = 1/fit$scale了b = exp(predict(fit, type = "lp"))

我们可以在下面验证导出的生存函数

# next let's verify it's correct:
fit <- survreg(Surv(time,status) ~ age, data=stanford2) # this is the survreg output model

# this is the survival function!
S_t <- function(t, survreg.scale, survreg.lp){
  shape <- 1/survreg.scale 
  scale <- exp(survreg.lp)
  ans <- 1 - pweibull(t, shape = shape, scale = scale)
}

new_dat <- data.frame(age = c(0, seq(min(stanford2$age), max(stanford2$age), length.out = 10)))

pct <- seq(0.01, 0.99, 0.01)

surv_curves <- sapply(pct,
                      function(x) predict(fit, type = "quantile", p = 1 - x,
                                          newdata = new_dat))

matplot(y = pct, t(surv_curves), type = "l")

# you can vary the below subject_i variable to see it works for all of them
subject_i <- 1
single_curve <- surv_curves[subject_i, ]
plot(single_curve, pct, type = "l") # this is we know to be true
  
times <- round(seq(1, max(single_curve), length.out = 100))
lp <- predict(fit, newdata = new_dat, type = "lp")[subject_i]
surv <- sapply(times, function(t) S_t(t, survreg.scale = fit$scale, survreg.lp = lp))
lines(times, surv, col = "red", lty = 2) # this is the new S_t function
# They match!

所以，总结一下：

a = 1/fit$scale和b = exp(predict(fit, type = "lp"))

希望这可以帮助。我知道在弄清楚这一点之前我拉了几根头发。

帮助页面?Weibull说：

具有形状参数 a 和尺度参数 b 的 Weibull 分布的密度为

f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)

然后帮助页面?survreg说：

# There are multiple ways to parameterize a Weibull distribution. The survreg 
# function embeds it in a general location-scale family, which is a 
# different parameterization than the rweibull function, and often leads
# to confusion.
#   survreg's scale  =    1/(rweibull shape)
#   survreg's intercept = log(rweibull scale)

并且在示例部分中详细说明了这是如何处理的 ?survreg.distributions

其它你可能感兴趣的问题

上一篇相关和不相关数据均值的方差下一篇在时间序列回归中使用滚动窗口