如果您可以将属性视为离散变量(也称为“虚拟变量”、“因子”、“单热”),则可以使用线性回归来分解效果。即。您估计每个属性的“直线”。
# Data
df = data.frame(y=c(1,2,3,5,6, 10,12,13,15,16),prop=c(0,0,0,0,0,1,1,1,1,1),temp=c(5,6,7,8,9,6,7,8,9,10))
# Plot data
plot(df$temp[df$prop==1], df$y[df$prop==1],xlim=c(4,10),ylim=c(0,16),xlab="temp",ylab="y")
lines(df$temp[df$prop==1], df$y[df$prop==1])
lines(df$temp[df$prop==0], df$y[df$prop==0], col="blue")

# Linear regression
reg = lm(y~temp+prop,data=df)
summary(reg)
Residuals:
Min 1Q Median 3Q Max
-0.4 -0.2 0.0 0.2 0.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.40000 0.55032 -11.63 7.85e-06 ***
temp 1.40000 0.07559 18.52 3.32e-07 ***
prop 8.40000 0.22678 37.04 2.72e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3381 on 7 degrees of freedom
Multiple R-squared: 0.9971, Adjusted R-squared: 0.9963
F-statistic: 1222 on 2 and 7 DF, p-value: 1.245e-09
# Prediction for "prop" 1, 0
pred0 = predict(reg,newdata=df[df$prop==0,])
pred1 = predict(reg,newdata=df[df$prop==1,])
# Add prediction to plot
lines(df$temp[df$prop==1], pred1, col="red")
lines(df$temp[df$prop==0], pred0, col="purple")

所以你得到一个“预测线”每个prop
.
为此prop=0
将计算为。−6.4+1.4∗temp+0∗8.4
因为prop=1
这将被计算为−6.4+1.4∗temp+1∗8.4.
本质上:相同的截距和斜率。只有行的“niveau”根据prop
指标移动。