机器算法验证 - 增加罕见事件发生率的统计检验 - 吾爱随笔录

我跟踪了 2500 人 20 年来罕见病发病率的模拟数据

year number_affected
1   0
2   0
3   1
4   0
5   0
6   0
7   1
8   0
9   1
10  0
11  1
12  0
13  0
14  1
15  1
16  0
17  1
18  0
19  2
20  1

我可以应用什么测试来证明这种疾病变得越来越普遍？

编辑：正如@Wrzlprmft 所建议的，我尝试使用Spearman 和Kendall 方法进行简单的关联：

        Spearman's rank correlation rho

data:  year and number_affected
S = 799.44, p-value = 0.08145
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.3989206 

Warning message:
In cor.test.default(year, number_affected, method = "spearman") :
  Cannot compute exact p-value with ties
> 



        Kendall's rank correlation tau

data:  year and number_affected
z = 1.752, p-value = 0.07978
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.3296319 

Warning message:
In cor.test.default(year, number_affected, method = "kendall") :
  Cannot compute exact p-value with ties

这些对于此类数据是否足够好？使用@AWebb 所示方法的 Mann Kendall 测试给出的 P 值为 [1] 0.04319868。@dsaxton 建议的泊松回归给出以下结果：

Call:
glm(formula = number_affected ~ year, family = poisson, data = mydf)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3187  -0.8524  -0.6173   0.5248   1.2158  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept) -1.79664    0.85725  -2.096   0.0361 *
year         0.09204    0.05946   1.548   0.1217  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 16.636  on 19  degrees of freedom
Residual deviance: 14.038  on 18  degrees of freedom
AIC: 36.652

Number of Fisher Scoring iterations: 5

这里的年份分量并不重要。我最终能得出什么结论？此外，在所有这些分析中，没有使用数字 2500（分母人口数）。这个数字没有区别吗？我们可以使用简单的线性回归（高斯）使用发生率（number_affected/2500）与年份吗？