机器算法验证 - 偏相关解释 - 吾爱随笔录

偏相关解释

机器算法验证相关性解释偏相关

2022-03-12 13:33:19

我正在计算两个变量（A 和 B）之间的相关性，这表明这些变量是高度相关的。我知道一个变量也与另一个变量 (C) 高度相关，因此我在 A 和 B 之间进行了偏相关，以控制 C。现在我得到的 A 和 B 之间的相关性比以前更高。- 我该如何解释？

2个回答

为了理解这一点，我总是更喜欢相关矩阵的cholesky分解。
假设三个变量 $XYZ$ 的相关矩阵R为 $$ \text{ R =} \left[ \begin{array} {rrr} 1.00& -0.29& -0.45\\ -0.29& 1.00& 0.93\\ -0.45& 0.93& 1.00 \end{array} \right] $$ 然后是cholesky-decomposition L $X.Y.Z$ as

R = [\begin{array}{rrr} 1.00 & - 0.29 & - 0.45 \\ - 0.29 & 1.00 & 0.93 \\ - 0.45 & 0.93 & 1.00 \end{array}]

$\text{ R =} \left[ \begin{array} {rrr} 1.00& -0.29& -0.45\\ -0.29& 1.00& 0.93\\ -0.45& 0.93& 1.00 \end{array} \right]$ Then the cholesky-decomposition 是 $$ \text{ L =} \left[ \begin{array} {rrr} X\\ Y \\ Z \end{array} \right] = \left[ \begin{array} {rrr} 1.00& 0.00 & 0.00\\ -0.29& 0.96& 0.00\\ -0.45& 0.83& 0.32 \end{array} \right] $$ 如果变量被视为向量，矩阵 L 以某种方式给出了三个变量在欧几里得空间中的坐标从原点开始，其中 x 轴用变量/向量 X 标识，依此类推。

L = [\begin{array}{rrr} X \\ Y \\ Z \end{array}] = [\begin{array}{rrr} 1.00 & 0.00 & 0.00 \\ - 0.29 & 0.96 & 0.00 \\ - 0.45 & 0.83 & 0.32 \end{array}]

$\text{ L =} \left[ \begin{array} {rrr} X\\ Y \\ Z \end{array} \right] = \left[ \begin{array} {rrr} 1.00& 0.00& 0.00\\ -0.29& 0.96& 0.00\\ -0.45& 0.83& 0.32 \end{array} \right]$ The matrix L gives somehow the coordinates of the three variables in an euclidean space if the variables are seen as vectors from the origin, where the x-axis is identified with the variable/vector X and so on.

那么 X 和 Y 的相关性是 $\newcommand{\corr}{\rm corr} \corr(X,Y)=x_1 y_1 + x_2 y_2 + x_3 y_3 $ 我们立即看到它 $\corr(X,Y )=-0.29 $ 因为零和单位因子。我们还立即看到相关性 $\corr(X,Z)=-0.45$ 因为零和单位辅因子。然而，Y 和 Z 之间的相关性为 $\corr(Y,Z) = -0.29 \cdot -0.45 + 0.96 \cdot 0.83$偏相关性（去除 X 后）是 X- 中没有变化的部分变量存在，所以 $\corr(Y,Z)._X = 0.96 \cdot 0.83 $。现在想象一下，价值 $0.83$ 将改为 $-0.83$。则偏相关为负，Y 和 Z 之间的相关为 $ 0.29 \cdot 0.45 - 0.96 \cdot 0.83$ $\newcommand{\corr}{\rm corr} \corr(X,Y)=x_1 y_1 + x_2 y_2 + x_3 y_3$ and we see immediately it it $\corr(X,Y)=-0.29$ because of the zeros and the unit-factor. We see also immediately the correlation $\corr(X,Z)=-0.45$ again because of the zeros and the unit-cofactor. However, the correlation between Y and Z is $\corr(Y,Z) = -0.29 \cdot -0.45 + 0.96 \cdot 0.83$ The $\corr(Y,Z)._X = 0.96 \cdot 0.83$ . Now imagine, the value $0.83$ would be $-0.83$ instead. Then the partial correlation would be negative and the correlation between Y and Z were $0.29 \cdot 0.45 - 0.96 \cdot 0.83$

我们看到的是，部分相关性部分独立于整体相关性（尽管在某些范围内）

@Gottfried Helms 给了你一个很好的答案。如果您正在寻找一种更直观易懂的解释，标准答案是：想象将 A 回归到 C 上，将 B 回归到 C 上，并且在这两种情况下都保存残差。控制 C 的 A 和 B 的偏相关是这两组残差之间的相关。换句话说，它对 A 和 B 的可变性部分之间的线性关联强度进行了索引，而这部分不能通过求助于 C 的可变性来解释。这可以与部分（或半部分）相关性形成对比，其中A 或 B 之一的残差与另一个完整变量相关。有关如何使用它的示例，Baron & Kenny (1986)和Kenny 的调解网页）。如果你想了解更多关于这些主题的信息，我在这里讨论，有一个不错的维基百科页面，我特别喜欢这个网页。

其它你可能感兴趣的问题

上一篇为什么必须在偏差和方差之间进行权衡？下一篇回归中的均值结构和均值/方差关系