我正在分析一个非常简单的数据集,其中包含数值因变量 y 和自变量 x。该数据集还具有 z,这是一个具有 2 个级别 A 和 B 的分类变量。
如果我分别对水平 A 和 B 运行两个相关性,我会得到两个非常不同的值(rA = 0.87 和 rB = 0.28),指向 z 的交互作用。然而,如果我运行回归模型,效果就会消失(交互 ß = -0.1591 ± 0.23)。
我的理解是,差异源于这样一个事实,即相关性不考虑数据的规模(即它对数据进行规范化),而相关性却考虑了(即默认情况下它使用原始数据)。
但即使我明白为什么系数如此不同,我也不明白我应该如何解释这种差异。
有没有交互作用?我应该标准化回归中的数据,还是报告空结果?
数据和R代码:
x = c(140.43,139.19,116.27,137.37,146.00,110.43,137.75,151.81,66.04,87.86,149.50,97.30,206.52,180.41,139.58,111.01,183.72,129.39,126.03,117.50,142.39,126.58,199.74,164.36,112.85,150.72,140.43,139.19,116.27,137.37,146.00,110.43,137.75,151.81,66.04,87.86,149.50,97.30,206.52,180.41,139.58,111.01,183.72,129.39,126.03,117.50,142.39,126.58,199.74,164.36,112.85,150.72)
y = c(154,159,147,161,149,143,162,164,118,147,169,125,182,163,167,144,191,160,152,142,156,141,195,158,133,145,105,105,185,127,103,104,194,134,89,169,114,100,135,138,191,108,197,111,192,111,165,123,179,98,95,90)
z = factor(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"))
coef(lm(y~x*z))
#(Intercept) x zB x:zB
# 89.4084893 0.4767568 0.1206448 -0.1591085
cor(x[z=="A"],y[z=="A"]) #0.8708543
cor(x[z=="B"],y[z=="B"]) #0.2766038
