该table
函数返回一个类似矩阵的对象:
> symptom <- sample(c("yes","no"), 100, prob=c(0.2, 0.8), rep=TRUE)
> disease <- sample(c("yes","no"), 100, prob=c(0.2, 0.8), rep=TRUE)
> dataset <- data.frame(symptom, disease)
> dst_S_D <-with(dataset, table(symptom, disease))
> dst_S_D
disease
symptom no yes
no 65 13
yes 17 5
所以 Pr(D|S="yes") =
> probD_Sy <- dst_S_D[2, 2]/sum(dst_S_D[2, ] )
> probD_Sy
[1] 0.2272727
我改变了问题,因为我第一次使用您的参数运行它时,我得到了:
> dst_S_D <-with(dataset, table(symptom, disease)); dst_S_D
disease
symptom no yes
no 9954 22
yes 24 0
而且我认为 0 的 Pr(D|S="yes") 相当无聊。如果您要运行多次,您应该构造一个函数并将该函数与该函数一起使用replicate
。
这是一种构建数据集的方法,该数据集在有症状组中应用不同的疾病概率,该概率比在无症状组中使用的高 3 倍:
symptom <- sample(c("yes","no"), 10000, prob=c(0.02, 0.98), rep=TRUE)
dataset <- data.frame(symptom, disease=NA)
dataset$disease[dataset$symptom == "yes"] <-
sample(c("yes","no"), sum(dataset$symptom == "yes"), prob=c(0.15, 1-0.15), rep=TRUE)
dataset$disease[dataset$symptom == "no"] <-
sample(c("yes","no"), sum(dataset$symptom == "no"), prob=c(0.05, 1-0.05), rep=TRUE)
dst_S_D <-with(dataset, table(symptom, disease)); dst_S_D
# disease
symptom no yes
no 9284 509
yes 176 31