编辑:这对于统计堆栈交换可能更好,但我在数据挖掘课程中并且我们使用 R,所以我也会在这里问它,以防万一有人知道如何使用 R 而不是手动执行此操作。
问题来了:一项针对 500 名男性和 700 名女性的调查显示,132 名男性和 226 名女性同意特定的说法。使用此信息来计算同意该陈述的男性和女性的比例。这将为您提供 p1 和 p2 的值。使用它来计算 q1 和 q2。现在计算两个独立比例之差的标准误。然后确定 95 置信水平的两个独立比例之间差异的置信区间。
我认为我的公式是错误的,因为它们不是两个独立比例之间差异的标准误差,也不是两个独立比例之间差异的置信区间,就像这些图表显示的那样,我放大了特定的底部方程。我是仍然不确定 q1 和 q2 指的是什么。
到目前为止,这是我对公式的了解:
p1 = 0.264 (132/500)
p2 = 0.322857 (226/700)
q1 =
q2 =
Stdev1 = sqrt (p(1-p)) = (1-0.264)*0.264 = sqrt(0.194304) = 0.44079927404
Stdev2 = sqrt(p(1-p)) = (1-0.322857)*0.322857 = sqrt(0.218620357) = 0.4675685586
Std error = standard deviation / square root(number of samples)
Std error1 = 0.44079927404/sqrt(500) = 0.44079927404/22.360679775 = 0.019713142
Std error2 = 0.4675685586/sqrt(700) = 0.4675685586/26.4575131106 = 0.017672430
Standard deviation = in R it’s sd() and in sd you need series of values,
m = mean of values
x – m = difference of values minus mean
sum of squared diff from the mean = sum(x-m)^2
square root [(sum of squared diff from the mean) / (sample size -1)]
Confidence interval (95%) =
Margin of error = Square root [p(1-p)/n] * 1.96 //n = sample size, 1.96 is 95% confidence interval
Margin error1 = sqrt(0.194304/500) * 1.96 = 0.01971314282 * 1.96 = 0.038637759
Margin error2 = sqrt(0.218620357/700) ] * 1.96 = 0.01767240787 * 1.96 = 0.034637919
P + margin of error = Upper confidence interval
p1 = 0.264+0.038637759 = 0.302637759
p2 = 0.322857 + 0.034637919 = 0.35749419
P – margin of error = Lower confidence interval
p1 = 0.264-0.038637759 = 0.225362241
p2 = 0.322857 - 0.034637919 = 0.288219081
P1 CI = 0.225362241 < 0.264 < 0.302637759
P2 CI = 0.288219081 < 0.322857 < 0.35749419

