Question 1

我找到了以下文章，解决了这个问题：Jiang, Tiefeng (2004)。样本相关矩阵的最大条目的渐近分布。应用概率年鉴， 14（2），865-880

姜显示统计量, 其中是长度为 (个和第个随机向量之间的相关性, 是 $L_n = \max_{1\leq i<j\leq N} |\rho_{ij}|$ $\rho_{ij}$ $i$ $j$ $n$ $i\neq j$

lim_{n \to \infty} Pr [n L_{n}^{2} - 4 \log n + \log (\log (n)) \leq y] = \exp (- \frac{1}{a^{2} \sqrt{8 π}} \exp (- y / 2)),

$\lim_{n \to \infty} \Pr[ nL_n^2 - 4\log n + \log(\log(n)) \leq y] = \exp\left(-\frac{1}{a^2\sqrt{8\pi}}\exp(-y/2)\right) \,,$ 其中假设存在于论文中，是。

a = lim_{n \to \infty} n / N

$a = \lim_{n\to\infty} n/N$

N

$N$

n

$n$

显然，此结果适用于任何具有足够数量的有限矩的分布分布（编辑：请参阅下面的@cardinal 评论）。蒋指出，这是一种 I 型极值分布。位置和规模是

σ = 2, μ = 2 \log (\frac{1}{a^{2} \sqrt{8 π}}) .

$\sigma=2,\quad\mu = 2\log\left( \frac{1}{a^2\sqrt{8\pi}} \right).$

Type-I EV 分布的期望值为，其中表示欧拉常数。然而，正如评论中所指出的，分布收敛本身并不能保证均值收敛到极限分布。 $\mu + \sigma \gamma$ $\gamma$

如果的渐近期望值将是 $n L_n^2 -4\log n + \log(\log(n))$

lim_{n \to \infty} E [n L_{n}^{2} - 4 \log n + \log (\log (n))] = - 2 \log (a^{2} \sqrt{8 π}) + 2 γ .

$\lim_{n\to\infty} \mathbb E\left[ nL_n^2 - 4\log n + \log(\log(n)) \right] = -2\log\left(a^2\sqrt{8\pi} \right) + 2\gamma \,.$

请注意，这将给出最大平方相关的渐近期望值，而问题要求最大绝对相关的期望值。所以不是 100%，但很接近。

我做了一些简短的模拟，让我想到 1）我的模拟有问题（可能），2）我的转录/代数有问题（也可能），或者 3）近似值对我使用和的值。也许 OP 可以使用这种近似值权衡一些模拟结果？ $n$ $N$

Question 2

除了@jmtroos 提供的答案之外，以下是我的模拟细节，以及与@jmtroos 对Jiang (2004)的期望推导的比较，即：

E [L_{n}^{2}] = \frac{1}{n} {2 \log (\frac{N^{2}}{n^{2} \sqrt{8 π}}) + 2 γ + 4 \log n - \log (\log (n))}

$E\left[L_n^2 \right]= \frac{1}{n} \left \{ 2\log\left( \frac{N^2}{n^2\sqrt{8\pi}} \right) + 2\gamma+ 4\log n - \log(\log(n))\right \}$

这种期望值似乎高于小的模拟值，低于大的增加，它们似乎略有不同。然而，随着的增加，差异会减小，正如我们所期望的那样，论文声称分布是渐近的。我尝试了各种。下面的模拟使用。我对 R 很陌生，所以任何使我的代码更好的提示或建议都会受到热烈欢迎。 $N$ $N$ $N$ $n$ $n \in [100,500]$ $n=200$

set.seed(1)

ns <- 500
# number of simulations for each N

n <- 200
# length of each vector

mu <- 0
sigma <- 1
# parameters for the distribution we simulate from

par(mfrow=c(5,5))
x<-trunc(seq(from=5,to=n, length=20))
#vector of Ns

y<-vector(mode = "numeric")
#vector to store the mean correlations

k<- 1
#index for y

for (N in x) {
# loop over a range of N

    dt <- matrix(nrow=n,ncol=N)

    J <- vector(mode = "numeric")
    # vector to store the simulated largest absolute 
    # correlations for each N

    for (j in 1:ns) {
    # for each N, simulated ns times    

      for (i in 1:N) {
        dt[,i] <- rnorm(n,mu,sigma)
      }
      # perform the simulation

      M<-matrix(cor(dt),nrow=N,ncol=N)
      m <- M
      diag(m) <- NA
      J[j] <- max(abs(m), na.rm=TRUE)   
      # obtain the largest absolute correlation
      # these 3 lines came from stackoverflow
  }

    hist(J,main=paste("N=",N, " n=",n, " N(0,1)", "\nmean=",round(J[j],4))) 
    y[k]<-mean(J)
    k=k+1
}

lm1 <- lm(y~log(x))
summary(lm1)

logx_sq=log(x)^2
lm2<-lm(y~log(x)+logx_sq)
summary(lm2)
# linear models for these simulations

# Jiang 2004 paper, computation:

gamma = 0.5772
yy <- vector(mode = "numeric")
yy <- sqrt((2*log((x^2)/(sqrt(8*pi)*n^2)) + 2*gamma-(-4*log(n)+log(log(n))))/n)


plot(x,yy)
# plot the simulated correlations
points(x,y,col='red')
# add the points using the expectation

Answer 1