When does one use 1n√1n and when does one use 1.96p(1−p)n−−−−−√1.96p(1−p)n?

机器算法验证 self-study confidence-interval proportion sample population
2022-03-31 15:54:15

I'm not sure when to use 1n and when to use 1.96p(1p)n.

Are they both used to calculate the confidence interval at 95% for a population proportion?

Here are two questions from my book.

  1. In a survey in a large city, 170 households out of 250 owned a pet. Find the 95% confidence interval for the proportion of households in the city who own a pet.

  2. From a random sample, 136 out of 400 people experience discomfort after receiving a vaccine. Construct a 95% confidence interval for the population proportion who might experience discomfort.

For each question do I use: p^±1n or p^±1.96p(1p)n ?

And why ?

2个回答

The confidence interval with 1n is based on the same idea as the confidence interval with 1.96p(1p)n but is more "conservative", in the sense that it is larger. The reason for that is that the function

f(x)=x(1x),x[0,1]

can be shown (with elementary calculus) to be maximal for x=12. Thus

1.96p(1p)n1.9614n1n

The confidence interval with 1.96p(1p)n will be close to the one with 1n for p1/2 but since in none of your problems is that the case, I would probably use the classical 1.96p(1p)n CI.

For confidence intervals we always use the form p^±zp(1p)n. For the 95% confidence interval z=1.96

Since the term zp(1p)n depends on p there are some proportions which have bigger confidence intervals. The worst case scenario is where p=0.5, this has the most variation which is why the confidence interval is larger.

Before we collect any data we might want to estimate how good the results might be. Since we don't have data for the proportion we can just use the worst case scenario. For the worst case scenario the accuracy indicated by the confidence interval is z0.5(10.5)n

For the 95% confidence interval where z=1.96 this simplifies to 0.98n

This is where your idea of 1n comes from. The constant in the numerator is 0.98 which is close to 1 but this is only for the 95% confidence interval, other levels of confidence will have other values.

In your questions you have data for every case so you won't need to use the worst case scenario. However, it's still useful to remember for times when you are planning an experiment and you want to see if it is feasible before collecting any data.