是的,您可以简单地走 t 检验路线,因为这些与正态性的偏差与这样的样本量无关。显然,引导是一个完美的选择,我可以通过以下注释的 R 代码展示它是多么容易:
# examples raw wins in A and be
raw_win_A <- abs(rnorm(100000, mean=5, sd=15))
hist(raw_win_A, xlim=c(-10,100), breaks=20) #skewed
raw_win_B <- abs(rnorm(2000, mean=4.9, sd=20))
hist(raw_win_B, xlim=c(-10,100), breaks=20) #skewed
#compute means of n bootstrap samples of wins in A
n <- 10000
wins_A <- replicate(n, mean(sample(raw_win_A, replace=TRUE)))
#the same with B
wins_B <- replicate(n, mean(sample(raw_win_B, replace=TRUE)))
# show distribution of bootstrapped wins in A and B,
# these aber bound to be normally distributed with increasing n
hist(wins_A)
hist(wins_B)
# show distribution of wins_A minus wins_B
hist(wins_A - wins_B)
cat("Mean of wins_A minus wins-B: ")
cat(mean(wins_A - wins_B))
cat("1.96 times standard deviation of that:")
cat(1.96*sd(wins_A - wins_B))
cat("Confidence interval lower bound: ")
cat(mean(wins_A-wins_B)-1.96*sd(wins_A - wins_B))
cat("Confidence intercal upper bound:")
cat(mean(wins_A-wins_B)+1.96*sd(wins_A - wins_B))
cat("---\n Compare to t-test results:")
print(t.test(raw_win_A, raw_win_B))
这需要几秒钟(不到一分钟)才能运行。通过在前几行中模拟的示例数据,我得到了一个从 -3.973402 到 -2.906095 的自举置信区间,并且该t.test函数给出了从 -3.971132 到 -2.895014 的置信区间,即使数据高度偏斜(参见所有生成的直方图)通过我的代码)。所以是的,只要 n 足够高,t 检验就可以抵抗正态性违规。中心极限定理成立。