机器算法验证 - How to estimate P(x≤0)P(x≤0)nnxx - 吾爱随笔录

How to estimate P(x≤0)P(x≤0)nnxx

机器算法验证 variance sampling mean estimators

2022-03-07 10:56:15

Suppose, we have $n$ $x_i$

x \sim N (μ, σ^{2})

$x \sim \mathcal N(\mu,\sigma^2)$

Based on the samples, we want to estimate the probability that $x$

P (x \leq 0)

$P(x\le0)$

Intuitively, I would first estimate:

\hat{μ} = \frac{1}{n} \sum x_{i}

$\hat \mu=\frac 1 n \sum x_i$

{\hat{σ}}^{2} = \frac{1}{n - 1} \sum (x_{i} - \hat{μ})^{2}

$\hat \sigma^2={\frac 1 {n-1} \sum (x_i-\hat \mu)^2}$

and then calculate:

P (x \leq 0) = \frac{1}{\sqrt{2 π {\hat{σ}}^{2}}} \int_{- \infty}^{0} e^{\frac{x - \hat{μ}}{\hat{σ}}} d x

$P(x\le0)=\frac1{\sqrt{2\pi\hat\sigma^2}} \int_{-\infty}^0 e^\frac{x-\hat\mu}{\hat \sigma} dx$

However $\hat \mu$ $\hat \sigma$

Is this reasoning right?

If so, how can I estimate $P(x\le0)$ $\text {VAR}[\hat \mu]$ $\text {VAR}[\hat \sigma]$

4个回答

The method you are using is very close to the MLE, which has reasonable estimation properties when the underlying parametric model is correct. The MLE has a property called functional invariance, which means that the MLE of a function of the parameters is that function of the MLE. Your method uses the sample variance estimator, which is a bias-corrected version of the MLE of the true variance, but your estimator should have reasonable properties if the underlying model is correct. Of course, you are correct that your estimator involves some variance, but that is true of any estimator in this situation.

If you are confident that your data is from an exchangeable sequence (i.e., it is an IID model) then I would recommend you give serious consideration to instead using the empirical estimator, which is:

\hat{P (X ⩽ 0)} = \frac{1}{n} \sum_{i = 1}^{n} I (x_{i} ⩽ 0) .

$\widehat{\mathbb{P}(X \leqslant 0)} = \frac{1}{n} \sum_{i=1}^n \mathbb{I}(x_i \leqslant 0).$

This latter estimator also has good properties, but crucially, it does not rely on the assumption that the data are from a normal distribution. The empirical estimator is consistent for any underlying distribution (which your estimator is not) which makes it highly robust to model misspecification.

You can do better.

In the sense that among all the unbiased estimators, you can find the one with the smallest variance.

Our goal is to estimate $\mathbb{P}(X<0)$ . An unbiased estimator is

1 (X_{1} < 0)

$\mathbb{1}(X_1<0)$ because

E [1 (X_{1} < 0)] = P (X_{1} < 0) = P (X < 0)

$\mathbb{E}[\mathbb{1}(X_1<0)]=\mathbb{P}(X_1<0)=\mathbb{P}(X<0)$ .

Since this is unbiased, from Lehmann–Scheffé theorem, we know that

P (X_{1} < 0 | \bar{X}, S) = P (X_{1} < 0 | \hat{μ}, {\hat{σ}}^{2})

$\mathbb{P}(X_1<0|\bar{X},S)=\mathbb{P}(X_1<0|\hat{\mu},\hat{\sigma}^2)$ is the uniform-minimum variance unbiased estimator or UMVUE, because

(\bar{X}, S)

$(\bar{X},S)$ is complete and sufficient.

A few more step. The above equals to

P (\frac{X_{1} - \bar{X}}{S} < \frac{- \bar{X}}{S} | \bar{X}, S)

$\mathbb{P}(\frac{X_1-\bar{X}}{S}<\frac{-\bar{X}}{S}|\bar{X},S)$ By Basu's thorem, we know

\frac{X_{1} - \bar{X}}{S}

$\frac{X_1-\bar{X}}{S}$ is independent with

(\bar{X}, S)

$(\bar{X}, S)$ , because

\frac{X_{1} - \bar{X}}{S}

$\frac{X_1-\bar{X}}{S}$ is ancillary and

(\bar{X}, S)

$(\bar{X}, S)$ is a sufficient complete statistic. This means the above equation is nothing but

P (\frac{X_{1} - \bar{X}}{S} < \frac{- \bar{X}}{S})

$\mathbb{P}(\frac{X_1-\bar{X}}{S}<\frac{-\bar{X}}{S})$

Now, all we are left to do is to find out the distribution of random variable $T=\frac{X_1-\bar{X}}{S}$ , which is given precisely here.

For more information, please see "Theory of Point Estimation SECOND EDITION Lehmann & Casella", example 2.2.2.

------------------- a comment on your estimator ------------------

Your estimator is good enough in the sense that it is consistent, namely, when you have a large sample size, your estimator will be very close to the true value. This is because of the consistency of the plug-in method, or it can be shown by the continuous mapping theorem. However, it is a biased estimator.

Suppoxe $X \sim \mathsf{Norm}(\mu=1,\sigma=10).$ then $P(X < 0\,|\,\mu=1, \sigma=10) = 0.4602.$

pnorm(0, 1, 10)
[1] 0.4601722

Now suppose you have a random sample x of size $n = 50$ from $\mathsf{Norm}(\mu=1,\sigma=10).$

set.seed(225)
x = rnorm(50, 1, 10)
mean(x);  sd(x)
[1] 2.747168
[1] 10.84025

Then the usual estimates are $\hat\mu=\bar X =2.7472,$ and $\hat\sigma = S = 10.84025.$ Accordingly, you suggest $\hat P(X < 0\,|\,\hat\mu,\hat\sigma) = 0.400.$

pnorm(0, mean(x), sd(x))
[1] 0.3999707

There are various ways of assessing the variability of this estimate of the probability. One possibility is to give a 95% parametric bootstrap confidence interval of the probability. There are many styles of bootstraps and bootstrapping is not the only possibility.

To get the discussion started, the simple quantile bootstrap method shown below gives the interval $(0.286, 0.511),$ centered near our point estimate $0.40.$ Because this is a simulation procedure (starting with known $\mu$ and $\sigma$ to get the data for estimation), we know that the true probability is $0.46,$ and thus that this CI contains the true probability. However, in an actual application we would not know whether such a bootstrap CI covers (contains) the true probability; we can hope that, for 95% of samples of size $n=50,$ it does.

set.seed(2021)
a = mean(x);  s = sd(x)  # sample 'x' from above
B = 3000;  p.re = numeric(B)
for(i in 1:B) {
  x.re = rnorm(50, a, s)
  p.re[i] = pnorm(0, mean(x.re), sd(x.re))
  }
quantile(p.re, c(.025,.975))
      2.5%     97.5% 
0.2861269 0.5109705 

length(unique(p.re))
[1] 3000

Here is a histogram of the 3000 uniquely different re-sampled probability estimates used in making the bootstrap. Sometimes correction for skewness in bootstrap CIs is warranted, but our bootstrap distribution seems roughly symmetrical.

hist(p.re, prob=T, br=20, col="skyblue2")
 abline(v = c(0.286, 0.511), col="red", lwd=2, lty="dotted")

I will be interested to see other ideas how to approach this problem.

The variance of your estimates doesn't matter. The mean and variance estimates are consistent, so the functional representation is consistent as well. We use the degree of freedom correction for the estimate of the sample variance because it's oh-so slightly biased, but that bias disappears in moderate samples. Note: the biased estimator (without the degrees of freedom correction) is the maximum likelihood estimator.

MLEs and functions of them are consistent estimators of the parameters and functions of the parameters respectively a result of the continuous mapping theorem. When the normal assumption holds, your proposed estimator is better than the non-parametric estimate (the proportion of $X$ that is negative). It likely has lower variance too. To find the variance, you can use the law of total variance, or the delta-method.

其它你可能感兴趣的问题

上一篇方差的卡方置信区间下一篇Silverman (1981) 所说的“临界带宽”是什么意思？