机器算法验证 - FA：选择旋转矩阵，基于“简单结构标准” - 吾爱随笔录

FA：选择旋转矩阵，基于“简单结构标准”

机器算法验证 r 算法因子分析心理测量学 matlab

2022-03-30 16:58:35

使用因子分析最重要的问题之一是它的解释。因子分析经常使用因子轮换来增强其解释。经过令人满意的旋转后，旋转后的因子加载矩阵L'将具有相同的表示相关矩阵的能力，可以用作因子加载矩阵，而不是未旋转的矩阵L。

旋转的目的是使旋转后的因子加载矩阵具有一些理想的性质。使用的一种方法是旋转因子加载矩阵，使旋转后的矩阵具有简单的结构。

LL Thurstone 介绍了简单结构原理，作为因子旋转的一般指南：

简单结构标准：

因子矩阵的每一行应至少包含一个零
如果有 m 个公因子，则因子矩阵的每一列应至少有 m 个零
对于因子矩阵中的每一对列，应该有几个变量的条目在一个列中接近于零，但在另一列中不接近
对于因子矩阵中的每一对列，当有四个或更多因子时，大部分变量在两列中的条目都应接近零
对于因子矩阵中的每一对列，两列中应该只有少量具有非零条目的变量

理想的简单结构是这样的：

每个项目仅在一个因素上具有高负载或有意义的负载，并且
每个因素仅对某些项目具有高负载或有意义的负载。

问题是，尝试几种旋转方法的组合以及每种方法接受的参数（特别是对于倾斜的参数），候选矩阵的数量增加了，很难看出哪个更符合上述标准。

当我第一次遇到这个问题时，我意识到我无法仅仅通过“查看”它们来选择最佳匹配，我需要一个算法来帮助我做出决定。在项目截止日期的压力下，我最多只能在 MATLAB 中编写以下代码，它一次接受一个旋转矩阵并返回（在某些假设下）是否满足每个标准。一个新版本（如果我想升级它）将接受一个 3d 矩阵（一组 2d 矩阵）作为参数，并且算法应该返回更符合上述标准的那个。

您将如何从这些标准中提取算法？我只是在征求您的意见（我也认为有人对该方法本身的有用性提出了批评），也许还有更好的方法来解决旋转矩阵选择问题。

另外，我想知道你喜欢用什么软件来执行 FA。如果是 R，你用什么包？（我必须承认，如果我必须做 FA，我会再次求助于 SPSS）。如果有人想提供一些代码，我更喜欢 R 或 MATLAB。

PS 上述简单结构准则公式可以在PETT , M., LACKEY, N., SULLIVAN, J.

PS2（来自同一本书）：“因子分析成功的测试是它可以再现原始 corr 矩阵的程度。如果您还使用斜解，请在所有中选择产生最多最高和最低因子的那个载荷。” 这听起来像是算法可以使用的另一个约束。

PS3 这个问题也被问到这里。但是，我认为它更适合这个网站。

function [] = simple_structure_criteria (my_pattern_table)
%Simple Structure Criteria
%Making Sense of Factor Analysis, page 132

disp(' ');
disp('Simple Structure Criteria (Thurstone):');
disp('1. Each row of the factor matrix should contain at least one zero');
disp( '2. If there are m common factors, each column of the factor matrix should have at least m zeros');
disp( '3. For every pair of columns in the factor matrix, there should be several variables for which entries approach zero in the one column but not in the other');
disp( '4. For every pair of columns in the factor matrix, a large proportion of the variables should have entries approaching zero in both columns when there are four or more factors');
disp( '5. For every pair of columns in the factor matrix, there should be only a small number of variables with nonzero entries in both columns');
disp(' ');
disp( '(additional by Pedhazur and Schmelkin) The ideal simple structure is such that:');
disp( '6. Each item has a high, or meaningful, loading on one factor only and');
disp( '7. Each factor have high, or meaningful, loadings for only some of the items.');

disp('')
disp('Start checking...')

%test matrix
%ct=[76,78,16,7;19,29,10,13;2,6,7,8];
%test it by giving: simple_structure_criteria (ct)

ct=abs(my_pattern_table);

items=size(ct,1);
factors=size(ct,2);
my_zero = 0.1;
approach_zero = 0.2;
several = floor(items / 3);
small_number = ceil(items / 4);
large_proportion = 0.30;
meaningful = 0.4;
some_bottom = 2;
some_top = floor(items / 2);

% CRITERION 1
disp(' ');
disp('CRITERION 1');
for i = 1 : 1 : items
    count = 0;
    for j = 1 : 1 : factors
        if (ct(i,j) < my_zero)
            count = count + 1;
            break
        end
    end
    if (count == 0)
        disp(['Criterion 1 is NOT MET for item ' num2str(i)])
    end
end


% CRITERION 2
disp(' ');
disp('CRITERION 2');
for j = 1 : 1 : factors 
    m=0;
    for i = 1 : 1 : items
        if (ct(i,j) < my_zero)
            m = m + 1;
        end
    end
    if (m < factors)
        disp(['Criterion 2 is NOT MET for factor ' num2str(j) '. m = ' num2str(m)]);
    end
end

% CRITERION 3
disp(' ');
disp('CRITERION 3');
for c1 = 1 : 1 : factors - 1
    for c2 = c1 + 1 : 1 : factors
        test_several = 0;
        for i = 1 : 1 : items
            if ( (ct(i,c1)>my_zero && ct(i,c2)<my_zero) || (ct(i,c1)<my_zero && ct(i,c2)>my_zero) ) % approach zero in one but not in the other
                test_several = test_several + 1;
            end
        end
        disp(['several = ' num2str(test_several) ' for factors ' num2str(c1) ' and ' num2str(c2)]);
        if (test_several < several)
            disp(['Criterion 3 is NOT MET for factors ' num2str(c1) ' and ' num2str(c2)]);
        end
    end
end

% CRITERION 4
disp(' ');
disp('CRITERION 4');
if (factors > 3)
    for c1 = 1 : 1 : factors - 1
        for c2 = c1 + 1 : 1 : factors
            test_several = 0;
            for i = 1 : 1 : items
                if (ct(i,c1)<approach_zero && ct(i,c2)<approach_zero) % approach zero in both
                    test_several = test_several + 1;
                end
            end
            disp(['large proportion = ' num2str((test_several / items)*100) '% for factors ' num2str(c1) ' and ' num2str(c2)]);
            if ((test_several / items) < large_proportion)
                pr = sprintf('%4.2g',  (test_several / items) * 100 );
                disp(['Criterion 4 is NOT MET for factors ' num2str(c1) ' and ' num2str(c2) '. Proportion is ' pr '%']);
            end
        end
    end
end

% CRITERION 5
disp(' ');
disp('CRITERION 5');
for c1 = 1 : 1 : factors - 1
    for c2 = c1 + 1 : 1 : factors
        test_number = 0;
        for i = 1 : 1 : items
            if (ct(i,c1)>approach_zero && ct(i,c2)>approach_zero) % approach zero in both
                test_number = test_number + 1;
            end
        end
        disp(['small number = ' num2str(test_number) ' for factors ' num2str(c1) ' and ' num2str(c2)]);
        if (test_number > small_number)
            disp(['Criterion 5 is NOT MET for factors ' num2str(c1) ' and ' num2str(c2)]);
        end
    end
end

% CRITERION 6
disp(' ');
disp('CRITERION 6');
for i = 1 : 1 : items
    count = 0;
    for j = 1 : 1 : factors
        if (ct(i,j) > meaningful)
            count = count + 1;
        end
    end
    if (count == 0 || count > 1)
        disp(['Criterion 6 is NOT MET for item ' num2str(i)])
    end
end

% CRITERION 7
disp(' ');
disp('CRITERION 7');
for j = 1 : 1 : factors 
    m=0;
    for i = 1 : 1 : items
        if (ct(i,j) > meaningful)
            m = m + 1;
        end
    end
    disp(['some items = ' num2str(m) ' for factor ' num2str(j)]);
    if (m < some_bottom || m > some_top)
        disp(['Criterion 7 is NOT MET for factor ' num2str(j)]);
    end
end
disp('')
disp('Checking completed.')
return

4个回答

R psych软件包包括应用因子分析的各种例程（无论是基于 PCA、ML 还是 FA），但请参阅我对crantastic的简短评论。大多数常用的旋转技术，以及依赖于简单结构标准的算法都是可用的；您可能想看看 W. Revelle 关于这个主题的论文，非常简单的结构：估计可解释因子的最佳数量的替代程序（MBR 1979 (14)）和VSS()函数。

许多作者正在使用正交旋转 (VARIMAX)，考虑到载荷高于 0.3 或 0.4（相当于因子解释的方差的 9% 或 16%），因为它为解释和评分目的提供了更简单的结构（例如，在质量生命研究）；其他人（例如 Cattell，1978 年；Kline，1979 年）会推荐倾斜旋转，因为“在现实世界中，认为作为重要的行为决定因素的因素是相关的”（我引用 Kline，Intelligence.心理测量学观点，1991 年，第 19 页）。

据我所知，研究人员通常从 FA（或 PCA）开始，使用碎石图和模拟数据（平行分析）来帮助选择正确数量的因子。我经常发现项目聚类分析和 VSS 很好地补充了这种方法。当人们对二阶因子感兴趣，或者继续使用基于 SEM 的方法时，显然您需要使用倾斜旋转并分解得到的相关矩阵。

其他软件包/软件：

lavaan，用于 R 中的潜在变量分析；
基于Mx的OpenMx是一种通用软件，包括用于结构方程建模的矩阵代数解释器和数值优化器。

参考文献
1. Cattell, RB (1978)。因子分析在行为和生命科学中的科学应用。纽约，全会。
2. Kline, P. (1979)。心理测量学和心理学。伦敦，学术出版社。

我发现自己经常使用并行分析（O'Connor，2000）。这解决了要提取多少因子的问题。

见：https ://people.ok.ubc.ca/brioconn/nfactors/nfactors.html

英国石油公司奥康纳 (2000)。SPSS 和 SAS 程序，用于使用并行分析和 Velicer 的 MAP 测试确定组件的数量。行为研究方法、仪器和计算机，32, 396-402。

我将不得不接受 chl 对 psych 包的建议，它非常有用，并且具有 MAP 的实现和因素数量的并行分析标准。根据我自己的经验，我发现如果您为 MAP 返回的数字和并行分析之间的所有数字创建因子分析解决方案，通常可以找到相对最优的解决方案。

我还支持使用 OpenMx 进行验证性因子分析，因为它似乎给出了所有这些结果中最好的结果，并且对于表现不佳的矩阵（就像我的倾向于那样）要好得多。语法也很好，一旦你习惯了。我唯一遇到的问题是优化器不是开源的，因此它在 CRAN 上不可用。显然，他们正在开发优化器的开源实现，因此这可能不再是一个问题。

好问题。这不是一个真正的答案，而只是一些想法。

在我使用因子分析的大多数应用程序中，允许相关因子具有更多的理论意义。我倾向于依赖 proxmax 旋转方法。我曾经在 SPSS 中执行此操作，现在我factanal在 R 中使用该函数。

其它你可能感兴趣的问题

上一篇您将如何处理调查数据中的“不知道”和“缺失数据”？下一篇幸存者偏差案例？