机器算法验证 - 冠状病毒数学模型的详细描述/脚本 - 吾爱随笔录

冠状病毒数学模型的详细描述/脚本

机器算法验证 r 造型随机过程流行病学

2022-03-15 23:21:51

几乎是标题，我正在寻找对帝国理工学院和《柳叶刀》论文中使用的模型的更深入的解释。在第二个中，他们使用了一种叫做分支过程模型的东西，但他们是从我从未听说过的奇怪分布（如串行间隔分布）中采样的。

如果有人有很好的资源，或者更好的工作R, Matlab or Pyhton脚本，任何帮助将不胜感激。我想把模型放在我的Shiny仪表板上，我的团队用它来为我的银行报告/创建图表。

编辑：我主要对这张图感兴趣：所以我的同事可以使用这些参数并创建不同的场景。

2个回答

柳叶刀文章

Lancet 文章中的方法非常简单。他们通过模拟每个受感染个体的传播来模拟病毒的传播。对于每个感染者，他们（随机）计算他们将感染多少其他人以及这些人感染其他人需要多长时间（或者其他人因隔离等政策措施而被控制的可能性有多大）。

他们使用这个模型来估计在给定时间后新病例的潜在数量的变化，方法是为每组特定的模型参数计算随机模型一千次。如果数量很大，那么他们认为爆发不受控制，然后这个随机模型可以用来表达控制给定参数集的爆发的概率。

因此，确定性模型和随机模型之间的区别如下：

确定性病毒以恒定的数量和速度传播。例如，每个人在给定的特定时间间隔内将病毒传染给另外两个人，然后增长将像 1、2、4、8、16 等。
随机病毒的传播是随机的。它增加多少是随机的，而不是每次都是相同的因素。有些人传播了很多其他人只传播了一点点。例如，有时一个人将其传递给其他三个人，有时只传递给一个人（但平均相同，即两个人）。然后增长将是随机的，它可能很高（当它翻了三倍时）它可能很低（当只有一个人得到它时）。这种随机性是通过多次重复模型来表达的，然后看看它在所有这些情况下是如何结束的。

序列区间分布在图 2a 中解释。我没有详细阅读这篇文章，但在快速浏览之后，在我看来，连续时间是一个人感染另一个人的时刻之间的时间。序列时间分布就是那些时间的分布。它不是特定发行版的名称。

请注意，柳叶刀文章的代码可在线获取。https://github.com/cmmid/ringbp/tree/master/R

帝国理工的文章

感染不会呈指数增长。这还只是开始。感染率降低的原因是因为您无法感染以前已经感染过的人。因此，传播病毒的可能性会随着时间的推移而变小。（此外，感染率也取决于天气/季节，有时称为呼吸季节，我相信这些模型中没有包含）

考虑到感染率降低的一个众所周知的模型是SIR 模型（它已经生成了您正在寻找的图形）。然而，该模型假设均匀混合，这不是很现实。所以他们在帝国理工学院使用的模型使用了许多较小的隔间，这些隔间是学校、工作场所、家庭，可能还有更多。然后，传播在不同的水平/距离发生不同。室友已经生病了不能传染，所以很多时候只有一个人会传染给家里其他所有成员（而且那个人传染性比较高），其他人传播的相对较少（但是可以通过关于其他地方的病毒，比如学校工作教堂等。如果这些还没有饱和。

这不容易复制。您需要做的是真实地模拟空间结构。像家庭中的年龄分布和谁去哪个工作/学校/教堂等的网络。在参考文献之一中给出了这项工作的描述。该模型通常用于流感。https://www.pnas.org/content/suppl/2008/02/28/0706849105.DC1

我制作了一个展示这些效果的玩具模型（但不是真实的分布）。你得到的不是指数增长，而是更像幂律。分布在空间中增长，并在受感染人群的边缘传播。这有点像一个圆的面积随着它的周长而增长。但随后是分形维数结构。 $dA/dt = \text{constant} \times \text{circumference}$

玩具模型的结果是一条曲线，一开始是指数型的（均匀混合增长），然后变成幂律关系（在某个几何图形的边缘增长）。在任何情况下，增长都不是以连续速率呈指数增长，而是动态正在发生变化（在本例中，增长仅在前 5 代中呈指数增长）。

# create 500x500 people in matrix
set.seed(1)
L <- 5*10^2
people <- matrix(rep(0,(L)^2),L)

# trackers for the locations of the people that got sick:
# we start with index patient in the middle
orderx <- c(round(L/2))
ordery <- c(round(L/2))
generation <- c(1) 
spread <- 0
R0 <- 3
R1 <- 0.25  # a probabiliy to spread the virus on long distance, e.g. due to travel.


##### run the virus ######


# compute probability density function 
# for probabilty of spreading out to nearby locations
Lr <- 7
Lspread <- 1+Lr*2
# targets will be in a cube of LrxLr around the patient
targets <- matrix(1:Lspread^2,Lspread)
xt <- matrix(rep(c(1:Lspread)-(Lspread+1)/2,Lspread),Lspread)
yt <- t(xt)
# ps is some probability to get infected as function of distance
ps <- c(exp(-c(Lr:1)*0.2),0,exp(-c(1:Lr)*0.2)) 
probs  <- ps[xt+(Lspread+1)/2]*ps[yt+(Lspread+1)/2]  

### plot for visualization of the spread

plot(orderx,ordery,xlim=c(1,L),ylim=c(1,L), 
     xlab = "", ylab= "",
     col=1,bg = 1,cex=0.2,pch=21)

# itterate all the patients untill all have been dealt with 
# during this loop the number of patients increases
while (spread < length(generation)) {
  spread <- spread + 1
  x <- orderx[spread]
  y <- ordery[spread]
  g <- generation[spread]
  
  # selecting Rn people in the neighbourhood of the patient
  # Rn is sampled from a Poisson distribution with mean R0
  Rn <- rpois(1,R0)
  if (Rn>0) {
    sel <- sample(targets,Rn, prob = probs)
    xt[sel]
    yt[sel]
    ## this loop picks out the R0 people 
    ## these are gonna become new patients if they are susceptible
    for (i in 1:Rn) {
      #the modulo is to patch left with right and top with bottom
      xq <- (x+xt[sel[i]]-1)%%L+1  
      yq <- (y+yt[sel[i]]-1)%%L+1
      # if the 'target' is not sick yet then add it as new patient
      if  (people[xq,yq] == 0) {  
        generation <- c(generation,g+1)
        orderx <- c(orderx,xq)
        ordery <- c(ordery,yq)
        people[xq,yq] <- g+1
        colv <- (g+1)/30-floor((g+1)/30)
        points(xq,yq,
               col=hsv(colv,1,1),bg = hsv(colv,1,1),cex=0.1,pch=21)
      }
    }
  }
  ### additionally make R1 random people from far away sick
  nfar <- rpois(1,R1)
  ifar <- 0
  while (ifar<nfar) {
    ifar = ifar +1
    xq <- sample(1:L,1)
    yq <- sample(1:L,1)
    if  ((people[xq,yq] == 0)*(rbinom(1,1,0.1)==1)) {  
      generation <- c(generation,g+1)
      orderx <- c(orderx,xq)
      ordery <- c(ordery,yq)
      people[xq,yq] <- g+1
      colv <- (g+1)/30-floor((g+1)/30)
      points(xq,yq,
             col=hsv(colv,1,1),bg = hsv(colv,1,1),cex=0.1,pch=21)
    }
  }
}

# ratio of people that got sick
spread/L^2

# plot the spread in colours
colv <- (generation+1)/40-floor((generation+1)/40)
plot(orderx,ordery,xlim=c(1,L),ylim=c(1,L), 
     xlab = "", ylab= "",
     col=hsv(colv,1,1),bg = hsv(colv,1,1),cex=0.1,pch=21)

# plot the epidemiological curve
I <- sapply(1:50, FUN = function(x) sum(generation == x))
plot(I, log = 'xy', 
     xlab = "x, generation", ylab = "number of infectious people", type = "l",
     ylim = c(1,5*10^4), xlim = c(1,70))
gen <- 1:50
colv <- (gen+1)/40-floor((gen+1)/40)
points(I,pch=21,col = 1, bg = hsv(colv,1,1))
lines((R0+R1)^c(0:50), lty=2)
sm <- 4:50
lines(sm,0.5*sm^3.5, lty = 3)
lines(sm,0.002*sm^6, lty = 4)

legend(1,5*10^4, c(expression((R[0]+R[1])^x),expression(0.5*x^3.5),
                   expression(0.002*x^6)), lty = c(2,3,4), 
       xjust = 0, cex = 0.7)

我知道答案不应该只是链接，但这是我能做的最好的。

此页面链接到在 R 中完成的一些公共 covid 19 研究。

https://refind.com/znmeb/r-tools-for-coronavirus

这个重点深入流行病学模型。

https://timchurches.github.io/blog/posts/2020-02-18-analysing-covid-19-2019-ncov-outbreak-data-with-r-part-1/

直接链接到一些很酷的东西。

https://www.statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/

其它你可能感兴趣的问题

上一篇可视化队列中患者的包含和排除的图表的名称是什么？下一篇是奇异拟合，在 +/- 1 附近没有相关性或方差为零，在 lme4 中为误报