据我了解,你有因素(“坏”、“好”等)和连续的“邀请”。如果你想比较两组,你可以使用 t 检验(例如 Wilcoxon)。如果你想比较所有这些组,您可以使用以下形式的简单线性回归:
invitations=β0satisfaction1+β1satisfaction2+...+u.
示例:
library("e1071")
iris = iris
table(iris$Species)
#iris = iris[!(iris$Species=="versicolor"),]
library(dplyr)
iris %>%
group_by(Species) %>%
summarise_at(vars(Sepal.Length), funs(mean(., na.rm=TRUE)))
结果(手段):
# A tibble: 3 x 2
Species Sepal.Length
<fct> <dbl>
1 setosa 5.01
2 versicolor 5.94
3 virginica 6.59
比较两组:
# Two-samples Wilcoxon test
wilcox.test(iris$Sepal.Length[iris$Species=="setosa"], iris$Sepal.Length[iris$Species=="virginica"])
# The p-value is less than the significance level alpha = 0.05. We can conclude that Sepal Length is significantly different
结果:
Wilcoxon rank sum test with continuity correction
data: iris$Sepal.Length[iris$Species == "setosa"] and iris$Sepal.Length[iris$Species == "virginica"]
W = 38.5, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
回归:
# Simple linear regression
summary(lm(Sepal.Length~Species, data=iris))
# p-values are smaller than 0.05 which means each factor's contribution is statistically different from the intercept
结果:
Call:
lm(formula = Sepal.Length ~ Species, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.6880 -0.3285 -0.0060 0.3120 1.3120
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0060 0.0728 68.762 < 2e-16 ***
Speciesversicolor 0.9300 0.1030 9.033 8.77e-16 ***
Speciesvirginica 1.5820 0.1030 15.366 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5148 on 147 degrees of freedom
Multiple R-squared: 0.6187, Adjusted R-squared: 0.6135
F-statistic: 119.3 on 2 and 147 DF, p-value: < 2.2e-16
这里有趣的一点是Pr(>|t|)。如果此列中的数字小于 0.05,则可以说该因子与截距(这是基本类别,在本例中为“setosa”)显着不同。
在此应用程序中,该列Estimate直接为您提供截距的“setosa”平均值。“杂色”的效果是 0.9300,其中 5.0060+0.9300=5.936,这是“杂色”的平均值,依此类推。