机器算法验证 - 为什么 Phi 系数近似于 Pearson 的相关性？ - 吾爱随笔录

浏览关于 Phi 系数的Wiki 文章，我注意到对于成对的二进制数据，“为两个二进制变量估计的 Pearson 相关系数将返回 phi 系数”。

在运行快速模拟后，我发现情况并非如此。但是，似乎 phi 系数确实接近 pearson 的相关系数。

x <- c(1,   1,  0,  0,  1,  0,  1,  1,  1)
y <- c(1,   1,  0,  0,  0,  0,  1,  1,  1)
cor(x,y)
sqrt(chisq.test(table(x,y))$statistic/length(x)) # phi

x <- rep(x, 1000)
y <- rep(y, 1000)
sqrt(chisq.test(table(x,y))$statistic/length(x)) # phi
# it now DOES approximates the pearsons correlation.
cor(x,y)

但我不清楚为什么（数学上）会出现这种情况。