因子分析中观察变量与因子保存分数之间的载荷和相关性之间的差异

机器算法验证 r 因子分析
2022-03-20 07:00:11

我认为因子分析中的载荷是观察变量与潜在因子之间的相关性。但是,当我使用 psych 包在 R 中进行因子分析时,情况似乎并非如此:

    library(psych)
    set.seed(1)
    X <- matrix(rnorm(200), ncol=10)
    fa1 <- fa(X, nfactors=3, rotate="none", scores=TRUE)

    cor(X, fa1$scores)  #correlations between original variables and factor scores
                   MR2         MR1         MR3
     [1,]  0.465509161  0.87299813  0.03241641
     [2,] -0.010609644 -0.32714571  0.64968725
     [3,] -0.219685860  0.47331827 -0.39132195
     [4,] -0.815516983  0.22669390  0.42273446
     [5,] -0.075178935 -0.40431701 -0.69661843
     [6,] -0.204917832  0.07472006  0.05508017
     [7,]  0.240675941  0.13027263  0.23238220
     [8,]  0.756677687 -0.05621205  0.23746738
     [9,]  0.004384459  0.12095273  0.55100943
    [10,]  0.640507568 -0.67810600  0.18597947

    fa1$loadings[1:10, 1:3]
                   MR2         MR1         MR3
     [1,]  0.433925641  0.82218385  0.02717957
     [2,] -0.009889808 -0.30810366  0.54473104
     [3,] -0.204780777  0.44576800 -0.32810435
     [4,] -0.760186392  0.21349881  0.35444221
     [5,] -0.070078250 -0.38078308 -0.58408054
     [6,] -0.191014719  0.07037085  0.04618204
     [7,]  0.224346738  0.12268990  0.19484113
     [8,]  0.705339180 -0.05294013  0.19910480
     [9,]  0.004086985  0.11391248  0.46199451
    [10,]  0.597050885 -0.63863574  0.15593470

    cor(fa1$scores)  # Check that factor scores are uncorrelated
              MR2          MR1           MR3
    MR2  1.000000e+00 4.266996e-16 -1.299606e-16
    MR1  4.266996e-16 1.000000e+00  1.961151e-16
    MR3 -1.299606e-16 1.961151e-16  1.000000e+00

载荷和相关性是相似的,但我希望它们是相同的。我尝试查看源代码,fa但无法理解它。有人可以告诉我负载与相关性有何不同?

更新:对于每个因素,与观察变量的相关性是载荷的恒定倍数:

cor(X, fa1$scores)/fa1$loadings[1:10, 1:3]
           MR2      MR1      MR3
 [1,] 1.072786 1.061804 1.192675
 [2,] 1.072786 1.061804 1.192675
 [3,] 1.072786 1.061804 1.192675
 [4,] 1.072786 1.061804 1.192675
 [5,] 1.072786 1.061804 1.192675
 [6,] 1.072786 1.061804 1.192675
 [7,] 1.072786 1.061804 1.192675
 [8,] 1.072786 1.061804 1.192675
 [9,] 1.072786 1.061804 1.192675
[10,] 1.072786 1.061804 1.192675
2个回答

我不太了解 R,所以我无法跟踪您的代码。但是因子分数(除非因子只是主成分)总是近似的:无法计算精确的分数,因为每个案例和变量的唯一性值永远是不可观察的。因此,观察到的计算因子得分和变量之间的相关性仅近似于因子和变量之间的真实相关性,即载荷。

fa()默认情况下使用 minres 分解方法fm="minres"

载荷仅与主成分分解方法的相关性相对应。你可以计算它principal()

 fa1 <- principal(X, nfactors = 3, rotate = 'none')
 cor(X, fa1$scores)
              PC1         PC2         PC3
 [1,] -0.10920804  0.53177096  0.62089920
 [2,]  0.38040379  0.25737641 -0.61853742
 [3,] -0.63568952 -0.07448425  0.42456182
 [4,] -0.65982013  0.31649913 -0.44502612
 [5,]  0.01177613 -0.74010933  0.10943722
 [6,] -0.23698177  0.22859832  0.21876281
 [7,]  0.22409045  0.43785156  0.36644127
 [8,]  0.69310850  0.26912793  0.47151066
 [9,]  0.15024503  0.65373157 -0.39777599
 [10,]  0.85889193 -0.23091790 -0.02241569
 fa1$loadings[1:10, 1:3]
              PC1         PC2         PC3
 [1,] -0.10920804  0.53177096  0.62089920
 [2,]  0.38040379  0.25737641 -0.61853742
 [3,] -0.63568952 -0.07448425  0.42456182
 [4,] -0.65982013  0.31649913 -0.44502612
 [5,]  0.01177613 -0.74010933  0.10943722
 [6,] -0.23698177  0.22859832  0.21876281
 [7,]  0.22409045  0.43785156  0.36644127
 [8,]  0.69310850  0.26912793  0.47151066
 [9,]  0.15024503  0.65373157 -0.39777599
 [10,]  0.85889193 -0.23091790 -0.02241569