我跟踪了 2500 人 20 年来罕见病发病率的模拟数据
year number_affected
1 0
2 0
3 1
4 0
5 0
6 0
7 1
8 0
9 1
10 0
11 1
12 0
13 0
14 1
15 1
16 0
17 1
18 0
19 2
20 1
我可以应用什么测试来证明这种疾病变得越来越普遍?
编辑:正如@Wrzlprmft 所建议的,我尝试使用Spearman 和Kendall 方法进行简单的关联:
Spearman's rank correlation rho
data: year and number_affected
S = 799.44, p-value = 0.08145
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.3989206
Warning message:
In cor.test.default(year, number_affected, method = "spearman") :
Cannot compute exact p-value with ties
>
Kendall's rank correlation tau
data: year and number_affected
z = 1.752, p-value = 0.07978
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.3296319
Warning message:
In cor.test.default(year, number_affected, method = "kendall") :
Cannot compute exact p-value with ties
这些对于此类数据是否足够好?使用@AWebb 所示方法的 Mann Kendall 测试给出的 P 值为 [1] 0.04319868。@dsaxton 建议的泊松回归给出以下结果:
Call:
glm(formula = number_affected ~ year, family = poisson, data = mydf)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3187 -0.8524 -0.6173 0.5248 1.2158
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.79664 0.85725 -2.096 0.0361 *
year 0.09204 0.05946 1.548 0.1217
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 16.636 on 19 degrees of freedom
Residual deviance: 14.038 on 18 degrees of freedom
AIC: 36.652
Number of Fisher Scoring iterations: 5
这里的年份分量并不重要。我最终能得出什么结论?此外,在所有这些分析中,没有使用数字 2500(分母人口数)。这个数字没有区别吗?我们可以使用简单的线性回归(高斯)使用发生率(number_affected/2500)与年份吗?