我对 R 比较陌生,我正在尝试将模型拟合到由分类列和数字(整数)列组成的数据。因变量是一个连续数。
数据格式如下:
predCateg、predIntNum、ResponseVar
数据看起来像这样:
ranking, age_in_years, wealth_indicator
category_A, 99, 1234.56
category_A, 21, 12.34
category_A, 42, 234.56
....
category_N, 105, 77.27
我将如何在 R 中对此进行建模(大概是使用 GLM)?
[[编辑]]
我刚刚想到(在更彻底地分析数据之后),分类自变量实际上是有序的。因此,我将之前提供的答案修改如下:
> fit2 <- glm(wealth_indicator ~ ordered(ranking) + age_in_years, data=amort2)
>
> fit2
Call: glm(formula = wealth_indicator ~ ordered(ranking) + age_in_years,
data = amort2)
Coefficients:
(Intercept) ordered(ranking).L ordered(ranking).Q ordered(ranking).C age_in_years
0.0578500 -0.0055454 -0.0013000 0.0007603 0.0036818
Degrees of Freedom: 39 Total (i.e. Null); 35 Residual
Null Deviance: 0.004924
Residual Deviance: 0.00012 AIC: -383.2
>
> fit3 <- glm(wealth_indicator ~ ordered(ranking) + age_in_years + ordered(ranking)*age_in_years, data=amort2)
> fit3
Call: glm(formula = wealth_indicator ~ ordered(ranking) + age_in_years +
ordered(ranking) * age_in_years, data = amort2)
Coefficients:
(Intercept) ordered(ranking).L ordered(ranking).Q
0.0578500 -0.0018932 -0.0039667
ordered(ranking).C age_in_years ordered(ranking).L:age_in_years
0.0021019 0.0036818 -0.0006640
ordered(ranking).Q:age_in_years ordered(ranking).C:age_in_years
0.0004848 -0.0002439
Degrees of Freedom: 39 Total (i.e. Null); 32 Residual
Null Deviance: 0.004924
Residual Deviance: 5.931e-05 AIC: -405.4
我对输出中的 what和mean 感到有些困惑ordered(ranking).C
,希望能在理解此输出以及如何使用它来预测响应变量方面提供一些帮助。ordered(ranking).Q
ordered(ranking).L