Suppose, we have
Based on the samples, we want to estimate the probability that
Intuitively, I would first estimate:
and then calculate:
However
Is this reasoning right?
If so, how can I estimate
Suppose, we have
Based on the samples, we want to estimate the probability that
Intuitively, I would first estimate:
and then calculate:
However
Is this reasoning right?
If so, how can I estimate
The method you are using is very close to the MLE, which has reasonable estimation properties when the underlying parametric model is correct. The MLE has a property called functional invariance, which means that the MLE of a function of the parameters is that function of the MLE. Your method uses the sample variance estimator, which is a bias-corrected version of the MLE of the true variance, but your estimator should have reasonable properties if the underlying model is correct. Of course, you are correct that your estimator involves some variance, but that is true of any estimator in this situation.
If you are confident that your data is from an exchangeable sequence (i.e., it is an IID model) then I would recommend you give serious consideration to instead using the empirical estimator, which is:
This latter estimator also has good properties, but crucially, it does not rely on the assumption that the data are from a normal distribution. The empirical estimator is consistent for any underlying distribution (which your estimator is not) which makes it highly robust to model misspecification.
You can do better.
In the sense that among all the unbiased estimators, you can find the one with the smallest variance.
Our goal is to estimate . An unbiased estimator is
Since this is unbiased, from Lehmann–Scheffé theorem, we know that
A few more step. The above equals to
Now, all we are left to do is to find out the distribution of random variable , which is given precisely here.
For more information, please see "Theory of Point Estimation SECOND EDITION Lehmann & Casella", example 2.2.2.
------------------- a comment on your estimator ------------------
Your estimator is good enough in the sense that it is consistent, namely, when you have a large sample size, your estimator will be very close to the true value. This is because of the consistency of the plug-in method, or it can be shown by the continuous mapping theorem. However, it is a biased estimator.
Suppoxe then
pnorm(0, 1, 10)
[1] 0.4601722
Now suppose you have a random sample x
of size from
set.seed(225)
x = rnorm(50, 1, 10)
mean(x); sd(x)
[1] 2.747168
[1] 10.84025
Then the usual estimates are and Accordingly, you suggest
pnorm(0, mean(x), sd(x))
[1] 0.3999707
There are various ways of assessing the variability of this estimate of the probability. One possibility is to give a 95% parametric bootstrap confidence interval of the probability. There are many styles of bootstraps and bootstrapping is not the only possibility.
To get the discussion started, the simple quantile bootstrap method shown below gives the interval centered near our point estimate Because this is a simulation procedure (starting with known and to get the data for estimation), we know that the true probability is and thus that this CI contains the true probability. However, in an actual application we would not know whether such a bootstrap CI covers (contains) the true probability; we can hope that, for 95% of samples of size it does.
set.seed(2021)
a = mean(x); s = sd(x) # sample 'x' from above
B = 3000; p.re = numeric(B)
for(i in 1:B) {
x.re = rnorm(50, a, s)
p.re[i] = pnorm(0, mean(x.re), sd(x.re))
}
quantile(p.re, c(.025,.975))
2.5% 97.5%
0.2861269 0.5109705
length(unique(p.re))
[1] 3000
Here is a histogram of the 3000 uniquely different re-sampled probability estimates used in making the bootstrap. Sometimes correction for skewness in bootstrap CIs is warranted, but our bootstrap distribution seems roughly symmetrical.
hist(p.re, prob=T, br=20, col="skyblue2")
abline(v = c(0.286, 0.511), col="red", lwd=2, lty="dotted")
I will be interested to see other ideas how to approach this problem.
The variance of your estimates doesn't matter. The mean and variance estimates are consistent, so the functional representation is consistent as well. We use the degree of freedom correction for the estimate of the sample variance because it's oh-so slightly biased, but that bias disappears in moderate samples. Note: the biased estimator (without the degrees of freedom correction) is the maximum likelihood estimator.
MLEs and functions of them are consistent estimators of the parameters and functions of the parameters respectively a result of the continuous mapping theorem. When the normal assumption holds, your proposed estimator is better than the non-parametric estimate (the proportion of that is negative). It likely has lower variance too. To find the variance, you can use the law of total variance, or the delta-method.