机器算法验证 - 帮助进行 SEM 建模（OpenMx、polycor） - 吾爱随笔录

帮助进行 SEM 建模（OpenMx、polycor）

机器算法验证 r 造型多重回归结构方程建模

2022-03-18 16:39:07

我尝试应用 SEM 的一个数据集有很多问题。

我们假设存在 5 个潜在因素 A、B、C、D、E，分别带有指标。A1 到 A5（有序因子）、B1 到 B3（定量）、C1、D1、E1（所有最后三个有序因子，E1 只有 2 个水平。我们对所有因子之间的协方差感兴趣。

我试着用OpenMx这样做。以下是我的一些尝试：

我首先尝试对所有有序因子使用阈值矩阵，但收敛失败。
我决定使用hetcor库中的函数polycor（我计划引导样本以获得置信区间）来使用多变量/多序列相关性而不是原始数据。它也无法收敛！
我试图限制拥有完整数据的个人，它也失败了！

我的第一个问题是：有没有一种自然的方式来解释这些失败？

我的第二个问题是：我该怎么办？？？

编辑：对于可能遇到同样问题的未来读者，在浏览完函数的代码后polycor......解决方案只是hetcor()与选项一起使用std.err=FALSE。这给出的估计值与 StasK 给出的值非常相似。我现在没有时间更好地了解这里发生了什么！StasK 已经很好地回答了以下问题。

我还有其他问题，但在此之前，这是一个带有 RData 文件的 url，其中包含一个L1仅包含完整数据的数据框：data_sem.RData

这里有几行代码显示hetcor.

> require("OpenMx")
> require("polycor")
> load("data_sem.RData")
> hetcor(L1)
Erreur dans cut.default(scale(x), c(-Inf, row.cuts, Inf)) : 
  'breaks' are not unique
De plus : Il y a eu 11 avis (utilisez warnings() pour les visionner)
> head(L1)
   A1 A2 A3 A4 A5       B1       B2       B3 C1 D1 E1
1   4  5  4  5  7 -0.82759  0.01884 -3.34641  4  6  1
4   7  5  0  4  6 -0.18103  0.14364  0.35730  0  1  0
7   7  5  7  6  9 -0.61207 -0.18914  0.13943  0  0  0
10  5  5 10  7  3 -1.47414  0.10204  0.13943  2  0  0
11  7  5  8  9  9 -0.61207  0.06044 -0.73203  0  2  0
12  5  5  9 10  5  0.25000 -0.52192  1.44662  0  0  0

但是我仍然可以以非常肮脏的方式计算相关性或协方差矩阵，将我的有序因子视为定量变量：

> Cor0 <- cor(data.frame(lapply(L1, as.numeric)))

这是一段OpenMx代码以及我的下一个问题：以下模型是否正确？没有太多的自由参数？

manif <- c("A1","A2","A3","A4","A5", "B1","B2","B3", "C1", "D1", "E1");

model1 <- mxModel(type="RAM",
        manifestVars=manif, latentVars=c("A","B","C","D","E"),
        # factor variance
        mxPath(from=c("A","B","C","D","E"), arrows=2, free=FALSE, values = 1),
        # factor covariance
        mxPath(from="A", to="B",  arrows=2, values=0.5),
        mxPath(from="A", to="C",  arrows=2, values=0.5),
        mxPath(from="A", to="D",  arrows=2, values=0.5),
        mxPath(from="A", to="E",  arrows=2, values=0.5),
        mxPath(from="B", to="C",  arrows=2, values=0.5),
        mxPath(from="B", to="D",  arrows=2, values=0.5),
        mxPath(from="B", to="E",  arrows=2, values=0.5),
        mxPath(from="C", to="D",  arrows=2, values=0.5),
        mxPath(from="C", to="E",  arrows=2, values=0.5),
        mxPath(from="D", to="E",  arrows=2, values=0.5),
        # factors → manifest vars
        mxPath(from="A", to=c("A1","A2","A3","A4","A5"), free=TRUE, values=1),
        mxPath(from="B", to=c("B1","B2","B3"), free=TRUE, values=1),
        mxPath(from="C", to=c("C1"), free=TRUE, values=1),
        mxPath(from="D", to=c("D1"), free=TRUE, values=1),
        mxPath(from="E", to=c("E1"), free=TRUE, values=1),
        # error terms
        mxPath(from=manif, arrows=2, values=1, free=TRUE),
        # data
        mxData(Cor0, type="cor",numObs=dim(L1)[1])
       );

最后一个问题。使用这个模型（让我们暂时忘记计算相关矩阵的不恰当方式），我运行 OpenMx：

> mxRun(model1) -> fit1
Running untitled1 
> summary(fit1)

在摘要中，这是：

observed statistics:  55 
estimated parameters:  32 
degrees of freedom:  23 
-2 log likelihood:  543.5287 
saturated -2 log likelihood:  476.945 
number of observations:  62 
chi-square:  66.58374 
p:  4.048787e-06

尽管有大量参数，但拟合似乎非常糟糕。这意味着什么？这是否意味着我们应该在清单变量之间添加协方差？

非常感谢您的所有回答，我正在慢慢变得疯狂......

1个回答

您一定发现了中的一个错误polycor，您想向 John Fox 报告。polychoric使用我的包在 Stata 中一切正常：

    . polychoric *

    Polychoric correlation matrix

               A1          A2          A3          A4          A5          B1          B2          B3          C1          D1          E1
   A1           1
   A2   .34544812           1
   A3   .39920225   .19641726           1
   A4   .09468652   .04343741   .31995685           1
   A5   .30728339   -.0600463   .24367634   .18099061           1
   B1   .01998441  -.29765985   .13740987   .21810968   .14069473           1
   B2  -.19808738   .17745687  -.29049459  -.21054867   .02824307  -.57600551          1
   B3   .17807109  -.18042045   .44605383   .40447746   .18369998   .49883132  -.50906364           1
   C1  -.35973454  -.33099295  -.19920454  -.14631621  -.36058235   .00066762  -.05129489  -.11907687           1
   D1   -.3934594  -.21234022  -.39764587  -.30230591  -.04982743  -.09899428   .14494953   -.5400759   .05427906           1
   E1  -.13284936   .17703745  -.30631236  -.23069382  -.49212315  -.26670382   .24678619  -.47247566    .2956692   .28645516           1

对于使用单个指标（C, D, E）测量的潜变量，需要在其连续版本中固定指标的方差，否则无法识别潜变量的尺度。鉴于对于二进制/序数响应，它无论如何都固定为 1 与（序数）概率型链接，这可能意味着您必须假设您的潜在值等于观察到的指标，或者您必须假设标准化加载. 这实质上使您的模型等效于 CFA 模型，其中您的潜在因子 A 和 B 分别用 {A1-A5, C1, D1, E1} 和 {B1-B3, C1, D1, E1} 测量。

其它你可能感兴趣的问题

上一篇SVM rbf kernel - 估计伽玛的启发式方法下一篇R的summary.lm对象的LaTeX输出-在表外显示信息时