已经讨论过英特尔 MKL 在某些条件下会表现出不可重现的行为。事实上,这是众所周知的事情,英特尔将其描述为Conditional Numerical Reproducibility。
一位同事最近在尝试查找矩阵的特征值时遇到了这个问题,他意识到eigenR 中的函数在他使用的服务器上不是确定性的。特别是,特征值可能略有不同(<1e-12),并且特征向量的符号是不确定的。
我通过考虑这个矩阵来检验他的假设
corr <- structure(c(1, 0.250050163743823, 0.23435347961023, 0.482039807584937,
0.260244618444847, 0.245304368749426, 0.486808023309575, 0.479348415857738,
0.250050163743823, 1, 0.24648658148082, 0.246713558484002, 0.249547953702824,
0.249424786521267, 0.247921981042718, 0.493767069248003, 0.23435347961023,
0.24648658148082, 1, 0.0140300592972217, 0.229687680789224, 0.251843633087952,
0.0102624595100619, 0.0160021143836412, 0.482039807584937, 0.246713558484002,
0.0140300592972217, 1, 0.47661703967983, 0.00364166259778339,
0.257242621615393, 0.507588614471663, 0.260244618444847, 0.249547953702824,
0.229687680789224, 0.47661703967983, 1, 0.243407621930236, 0.482342591096063,
0.473425275160343, 0.245304368749426, 0.249424786521267, 0.251843633087952,
0.00364166259778339, 0.243407621930236, 1, 0.00266373582273809,
0.00415353194451873, 0.486808023309575, 0.247921981042718, 0.0102624595100619,
0.257242621615393, 0.482342591096063, 0.00266373582273809, 1,
0.505031653312059, 0.479348415857738, 0.493767069248003, 0.0160021143836412,
0.507588614471663, 0.473425275160343, 0.00415353194451873, 0.505031653312059,
1), .Dim = structure(c(8L, 8L), .Names = c("rt", "rt")), .Dimnames = structure(list(
rt = c("A", "B", "C", "D", "E", "F", "G", "H"
), rt = c("A", "B", "C", "D", "E", "F", "G",
"H")), .Names = c("rt", "rt")))
在计算了它的特征向量和特征值之后,我发现它们确实不一样:
> percDet <- 100 * mean(sapply(1:1000, function(...) {
+ identical(eigen(corr, symmetric = TRUE),
+ eigen(corr, symmetric = TRUE))}))
> message("% eigen determinism: ", percDet, "%")
% eigen determinism: 75.7%
> percDet <- 100 * mean(sapply(1:1000, function(...) {
+ identical(eigen(corr, symmetric = TRUE)$values,
+ eigen(corr, symmetric = TRUE)$values)}))
> message("% eigenvalues determinism: ", percDet, "%")
% eigenvalues determinism: 76.3%
经过一番调查,我发现问题出在我们用于 R 的英特尔 MKL 配置。具体来说,在我们的 Ansible 角色中,我们按如下方式配置 R:
./configure --enable-R-shlib --with-blas=\"-Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -L{{ mkl_install_dir }}/compiler/lib/intel64 -lmkl_gf_lp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread -lm\"
将“-lmkl_gf_lp64”部分更改为“-lmkl_intel_lp64”,如下所示
./configure --enable-R-shlib --with-blas="-Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -L{{ mkl_install_dir }}/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -liomp5 -lpthread\"
解决了这个问题:
> percDet <- 100 * mean(sapply(1:1000, function(...) {
+ identical(eigen(corr, symmetric = TRUE),
+ eigen(corr, symmetric = TRUE))}))
> message("% eigen determinism: ", percDet, "%")
% eigen determinism: 100%
> percDet <- 100 * mean(sapply(1:1000, function(...) {
+ identical(eigen(corr, symmetric = TRUE)$values,
+ eigen(corr, symmetric = TRUE)$values)}))
> message("% eigenvalues determinism: ", percDet, "%")
% eigenvalues determinism: 100%
虽然我发现了一个相关的问题“英特尔 MKL - mkl_intel_lp64 和 mkl_gf_lp64 之间的区别”很有帮助,但它没有解释为什么在使用多个线程时会导致重现性问题。
所以,我想知道这种行为是否有很好的解释,以及为什么将“mkl_gf_lp64”标志更改为“mkl_intel_lp64”可以解决问题。