从多重线性模型中直观呈现关系的最佳方式

机器算法验证 r 回归 数据可视化 多重回归 局部图
2022-02-05 05:09:04

我有一个包含大约 6 个预测变量的线性模型,我将展示估计值、F 值、p 值等。但是,我想知道什么是表示单个预测变量的个体影响的最佳视觉图响应变量?散点图?条件情节?效果图?ETC?我将如何解释那个情节?

我将在 R 中执行此操作,因此如果可以,请随时提供示例。

编辑:我主要关心呈现任何给定预测变量和响应变量之间的关系。

1个回答

在我看来,您所描述的模型并不适合绘图,因为绘图在显示难以理解的复杂信息(例如,复杂的交互)时效果最佳。但是,如果您想在模型中显示关系图,您有两个主要选项:

  1. 显示您感兴趣的每个预测变量与结果之间的一系列双变量关系图,以及原始数据点的散点图。在线条周围绘制错误信封。
  2. 显示选项 1 的图,但不是显示原始数据点,而是显示其他预测变量被边缘化的数据点(即,在减去其他预测变量的贡献之后)

选项 1 的好处是它允许查看者评估原始数据中的分散情况。选项 2 的好处是它显示了实际导致您正在显示的焦点系数的标准误差的观察级误差。

我使用 R包中数据Prestige集的数据在下面包含了 R 代码和每个选项的图表。car

## Raw data ##

mod <- lm(income ~ education + women, data = Prestige)
summary(mod)

# Create a scatterplot of education against income
plot(Prestige$education, Prestige$income, xlab = "Years of education", 
     ylab = "Occupational income", bty = "n", pch = 16, col = "grey")
# Create a dataframe representing the values on the predictors for which we 
# want predictions
pX <- expand.grid(education = seq(min(Prestige$education), max(Prestige$education), by = .1), 
                  women = mean(Prestige$women))
# Get predicted values
pY <- predict(mod, pX, se.fit = T)

lines(pX$education, pY$fit, lwd = 2) # Prediction line
lines(pX$education, pY$fit - pY$se.fit) # -1 SE
lines(pX$education, pY$fit + pY$se.fit) # +1 SE

使用原始数据点的图表

## Adjusted (marginalized) data ##

mod <- lm(income ~ education + women, data = Prestige)
summary(mod)

# Calculate the values of income, marginalizing out the effect of percentage women
margin_income <- coef(mod)["(Intercept)"] + coef(mod)["education"] * Prestige$education + 
    coef(mod)["women"] * mean(Prestige$women) + residuals(mod)

# Create a scatterplot of education against income
plot(Prestige$education, margin_income, xlab = "Years of education", 
     ylab = "Adjusted income", bty = "n", pch = 16, col = "grey")
# Create a dataframe representing the values on the predictors for which we 
# want predictions
pX <- expand.grid(education = seq(min(Prestige$education), max(Prestige$education), by = .1), 
              women = mean(Prestige$women))
# Get predicted values
pY <- predict(mod, pX, se.fit = T)

lines(pX$education, pY$fit, lwd = 2) # Prediction line
lines(pX$education, pY$fit - pY$se.fit) # -1 SE
lines(pX$education, pY$fit + pY$se.fit) # +1 SE

调整后的数据