机器算法验证 - 生成具有精确预先指定相关性的两个变量 - 吾爱随笔录

更新：解决方案

感谢 Greg Snow 指出empirical = TRUEmvrnorm 中的命令（多元随机正常的东西）！这是显式代码：

samples = 200
r = 0.83

library('MASS')
data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(1, r, r, 1), nrow=2), empirical=TRUE)
X = data[, 1]  # standard normal (mu=0, sd=1)
Y = data[, 2]  # standard normal (mu=0, sd=1)

# Assess that it works
cor(X, Y)  # yay, r = 0.83!
cor(X*0.01 + 42, Y*3 - 1)  # Linear transformations of X and Y won't change r.

原始问题

我想用（伪）随机数和精确的皮尔逊 r 生成两个变量。我怎么做？Python 和/或 R 解决方案会很好！

我能够通过以下方式生成近似于 python 中预先指定的 r 的随机数据。我不是在搜索近似值，而是在搜索具有精确预先指定的 r 的数据，即在下面的示例中 r=0.83000：

samples = 200
r = 0.83

# Generate pearson correlated data with approximately cor(X, Y) = r
import numpy as np
data = np.random.multivariate_normal([0, 0], [[1, r], [r, 1]], size=samples)
X, Y = data[:,0], data[:,1]

# That's it! Now let's take a look at the actual correlation:
import scipy.stats as stats
print 'r=', stats.pearsonr(X, Y)[0]

了解 r 的动机是我正在测试可以从数据中推断出 r 的（贝叶斯）统计模型，并且当 r 被明确指定时，它们更容易评估。