机器算法验证 - When does one use 1n√1n and when does one use 1.96p(1−p)n−−−−−√1.96p(1−p)n? - 吾爱随笔录

When does one use 1n√1n and when does one use 1.96p(1−p)n−−−−−√1.96p(1−p)n?

机器算法验证 self-study confidence-interval proportion sample population

2022-03-31 15:54:15

I'm not sure when to use $\frac{1}{\sqrt{n}}$ and when to use $1.96\sqrt{\frac{p(1-p)}{n}}$ .

Are they both used to calculate the confidence interval at 95% for a population proportion?

Here are two questions from my book.

In a survey in a large city, 170 households out of 250 owned a pet. Find the 95% confidence interval for the proportion of households in the city who own a pet.
From a random sample, 136 out of 400 people experience discomfort after receiving a vaccine. Construct a 95% confidence interval for the population proportion who might experience discomfort.

For each question do I use: $\hat{p}\pm\frac{1}{\sqrt{n}}$ or $\hat{p}\pm 1.96\sqrt{\frac{p(1-p)}{n}}$ ?

And why ?

2个回答

The confidence interval with $\frac{1}{\sqrt{n}}$ is based on the same idea as the confidence interval with $1.96\sqrt{\frac{p(1-p)}{n}}$ but is more "conservative", in the sense that it is larger. The reason for that is that the function

f (x) = x (1 - x), x \in [0, 1]

$f(x) = x \left(1-x\right), x \in [0,1]$

can be shown (with elementary calculus) to be maximal for $x= \frac{1}{2}$ . Thus

1.96 \sqrt{\frac{p (1 - p)}{n}} \leq 1.96 \sqrt{\frac{1}{4 n}} \approx \frac{1}{\sqrt{n}}

$1.96\sqrt{\frac{p(1-p)}{n}} \leq 1.96 \sqrt{ \frac{1}{4n}} \approx \frac{1}{\sqrt{n}}$

The confidence interval with $1.96\sqrt{\frac{p(1-p)}{n}}$ will be close to the one with $\frac{1}{\sqrt{n}}$ for $p \approx 1/2$ but since in none of your problems is that the case, I would probably use the classical $1.96\sqrt{\frac{p(1-p)}{n}}$ CI.

For confidence intervals we always use the form $\hat{p} \pm z\sqrt{\frac{p(1-p)}{n}}$ . For the 95% confidence interval $z=1.96$

Since the term $z\sqrt{\frac{p(1-p)}{n}}$ depends on $p$ there are some proportions which have bigger confidence intervals. The worst case scenario is where $p=0.5$ , this has the most variation which is why the confidence interval is larger.

Before we collect any data we might want to estimate how good the results might be. Since we don't have data for the proportion we can just use the worst case scenario. For the worst case scenario the accuracy indicated by the confidence interval is $z\sqrt{\frac{0.5(1-0.5)}{n}}$

For the 95% confidence interval where $z=1.96$ this simplifies to $\frac{0.98}{\sqrt{n}}$

This is where your idea of $\frac{1}{\sqrt{n}}$ comes from. The constant in the numerator is $0.98$ which is close to $1$ but this is only for the 95% confidence interval, other levels of confidence will have other values.

In your questions you have data for every case so you won't need to use the worst case scenario. However, it's still useful to remember for times when you are planning an experiment and you want to see if it is feasible before collecting any data.

其它你可能感兴趣的问题

上一篇斯蒂格勒是如何从伯努利的弱大数定律中得出这个结果的？下一篇为什么最大似然估计最大化概率密度而不是概率