机器算法验证 - 如何生成随机自动相关的二进制时间序列数据？ - 吾爱随笔录

如何生成随机自动相关的二进制时间序列数据？

机器算法验证时间序列随机变量模拟随机生成

2022-02-09 09:35:36

如何生成二进制时间序列：

指定观察 1 的平均概率（例如 5%）；
一次观察到 1 的条件概率 $t$ 给定值 $t-1$ （如果说 30% $t-1$ 值为 1)？

4个回答

使用两态马尔可夫链。

如果状态被称为 0 和 1，那么链可以用 2x2 矩阵表示 $P$ 给出状态之间的转移概率，其中 $P_{ij}$ 是从状态移动的概率 $i$ 陈述 $j$ . 在此矩阵中，每一行的总和应为 1.0。

从陈述 2，我们有 $P_{11} = 0.3$ ，然后简单的守恒说 $P_{10} = 0.7$ .

根据陈述 1，您希望长期概率（也称为平衡或稳态）为 $P_1 = 0.05$ . 这说

P_{1} = 0.05 = 0.3 P_{1} + P_{01} (1 - P_{1})

$P_1 = 0.05 = 0.3 P_1 + P_{01}(1-P_1)$ 解决给

P_{01} = 0.0368421

$P_{01} = 0.0368421$ 和一个转移矩阵

P = (\begin{array}{cc} 0.963158 & 0.0368421 \\ 0.7 & 0.3 \end{array})

$P = \left( \begin{array}{cc} 0.963158 & 0.0368421 \\ 0.7 & 0.3 \end{array} \right)$

（您可以通过将转换矩阵提高到高次方来检查它的正确性——在这种情况下，14 可以完成这项工作——结果的每一行都给出了相同的稳态概率）

现在在你的随机数程序中，从随机选择状态 0 或 1 开始；这选择哪一行 $P$ 你正在使用。然后使用一个统一的随机数来确定下一个状态。吐出那个数字，冲洗，必要时重复。

我在 R 中对@Mike Anderson 的答案进行了编码。我不知道如何使用 sapply 来完成它，所以我使用了一个循环。我稍微改变了概率以获得更有趣的结果，我使用“A”和“B”来表示状态。让我知道你的想法。

set.seed(1234)
TransitionMatrix <- data.frame(A=c(0.9,0.7),B=c(0.1,0.3),row.names=c('A','B'))
Series <- c('A',rep(NA,99))
i <- 2
while (i <= length(Series)) {
    Series[i] <- ifelse(TransitionMatrix[Series[i-1],'A']>=runif(1),'A','B')
    i <- i+1
}
Series <- ifelse(Series=='A',1,0)
> Series
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1
 [38] 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [75] 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

/edit：为了回应保罗的评论，这里有一个更优雅的表述

set.seed(1234)

createSeries <- function(n, TransitionMatrix){
  stopifnot(is.matrix(TransitionMatrix))
  stopifnot(n>0)

  Series <- c(1,rep(NA,n-1))
  random <- runif(n-1)
  for (i in 2:length(Series)){
    Series[i] <- TransitionMatrix[Series[i-1]+1,1] >= random[i-1]
  }

  return(Series)
}

createSeries(100, matrix(c(0.9,0.7,0.1,0.3), ncol=2))

刚学R的时候写了原始代码，所以请放过我一点。;-)

以下是给定系列的估计转换矩阵的方法：

Series <- createSeries(100000, matrix(c(0.9,0.7,0.1,0.3), ncol=2))
estimateTransMatrix <- function(Series){
  require(quantmod)
  out <- table(Lag(Series), Series)
  return(out/rowSums(out))
}
estimateTransMatrix(Series)

   Series
            0         1
  0 0.1005085 0.8994915
  1 0.2994029 0.7005971

顺序与我的原始转换矩阵交换，但它得到了正确的概率。

这是一个基于markovchain包的答案，可以推广到更复杂的依赖结构。

library(markovchain)
library(dplyr)

# define the states
states_excitation = c("steady", "excited")

# transition probability matrix
tpm_excitation = matrix(
  data = c(0.2, 0.8, 0.2, 0.8), 
  byrow = TRUE, 
  nrow = 2,
  dimnames = list(states_excitation, states_excitation)
)

# markovchain object
mc_excitation = new(
  "markovchain",
  states = states_excitation,
  transitionMatrix = tpm_excitation,
  name = "Excitation Transition Model"
)

# simulate
df_excitation = data_frame(
  datetime = seq.POSIXt(as.POSIXct("01-01-2016 00:00:00", 
                                   format = "%d-%m-%Y %H:%M:%S", 
                                   tz = "UTC"), 
                        as.POSIXct("01-01-2016 23:59:00", 
                                   format = "%d-%m-%Y %H:%M:%S", 
                                   tz = "UTC"), by = "min"),
  excitation = rmarkovchain(n = 1440, mc_excitation))

# plot
df_excitation %>% 
  ggplot(aes(x = datetime, y = as.numeric(factor(excitation)))) + 
  geom_step(stat = "identity") + 
  theme_bw() + 
  scale_y_discrete(name = "State", breaks = c(1, 2), 
                   labels = states_excitation)

这给了你：

我已经忘记了描述这种方法的论文，但是这里有。

将转移矩阵分解为

\begin{aligned} T & = (1 - p_{t}) [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] + p_{t} [\begin{matrix} p_{0} & p_{0} \\ (1 - p_{0}) & (1 - p_{0}) \end{matrix}] \\ = (1 - p_{t}) I + p_{t} E \end{aligned}

$\begin{aligned} T &= (1-p_t) \left[ \begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \right] + p_t \left[ \begin{matrix} p_0 & p_0 \\ (1-p_0) & (1-p_0) \end{matrix} \right] \\ &= (1-p_t) I + p_t E \end{aligned}$

直觉上，这对应于存在一定概率的想法 $1-p_t$ 系统保持在相同的状态，并且概率 $p_t$ 状态被随机化，其中随机化意味着从下一个状态的均衡分布中独立抽取（ $p_0$ 是处于第一种状态的平衡概率）。

请注意，根据您指定的数据，您需要解决 $p_t$ 从指定的 $T_{11}$ 通过 $T_{11} = (1-p_t)+p_t(1-p_0)$ .

这种分解的一个有用特征是它可以非常直接地推广到高维问题中的相关马尔可夫模型类。

其它你可能感兴趣的问题

上一篇什么是实际良好的数据分析过程？下一篇计量经济学和统计学有什么区别？