机器算法验证 - Approximating Pr[n≤X≤m]Pr[n≤X≤m] - 吾爱随笔录

Approximating Pr[n≤X≤m]Pr[n≤X≤m]

机器算法验证 probability distributions moments approximation saddlepoint-approximation

2022-03-16 23:18:20

What's the best way to approximate $Pr[n \leq X \leq m]$ $m,n$ $\mu$ $\sigma^2$ $\gamma_1$ $\gamma_2$ $X$ $\gamma_1$ $\gamma_2$

Ordinarily, I would use a normal approximation with integer correction...

$Pr[(n - \text{½})\leq X \leq (m + \text{½})] = Pr[\frac{(n - \text{½})-\mu}{\sigma}\leq Z \leq \frac{(m + \text{½})-\mu}{\sigma}] = \Phi(\frac{(m + \text{½})-\mu}{\sigma}) - \Phi(\frac{(n - \text{½})-\mu}{\sigma})$

...if the skewness and excess kurtosis were (closer to) 0, but that's not the case here.

I have to perform multiple approximations for different discrete distributions with different values of $\gamma_1$ $\gamma_2$ $\gamma_1$ $\gamma_2$

3个回答

This is an interesting question, which doesn't really have a good solution. There a few different ways of tackling this problem.

Assume an underlying distribution and match moments - as suggested in the answers by @ivant and @onestop. One downside is that the multivariate generalisation may be unclear.
Saddlepoint approximations. In this paper:

Gillespie, C.S. and Renshaw, E. An improved saddlepoint approximation. Mathematical Biosciences, 2007.

We look at recovering a pdf/pmf when given only the first few moments. We found that this approach works when the skewness isn't too large.
Laguerre expansions:

Mustapha, H. and Dimitrakopoulosa, R. Generalized Laguerre expansions of multivariate probability densities with moments. Computers & Mathematics with Applications, 2010.

The results in this paper seem more promising, but I haven't coded them up.

Fitting a distribution to data using the first four moments is exactly what Karl Pearson devised the Pearson family of continuous probability distributions for (maximum likelihood is much more popular these days of course). Should be straightforward to fit the relevant member of that family then use the same type of continuity correction as you give above for the normal distribution.

I assume you must have a truly enormous sample size? Otherwise sample estimates of skewness and especially kurtosis are often hopelessly imprecise, as well as being highly sensitive to outliers.In any case, I highly recommend you have a look at L-moments as an alternative that have several advantages over ordinary moments that can be advantageous for fitting distributions to data.

You could try to use skew normal distribution and see if excess kurtosis for your particular data sets is sufficiently close to the excess kurtosis of the distribution for given skewness. If it is, you can use the skew normal distribution cdf to estimate the probability. If not, you would have to come up with a transformation to the normal/skew pdf similar to the one used for the skew normal distribution, which would give you control over both skewness and excess kurtosis.

其它你可能感兴趣的问题

上一篇通过坐标下降进行套索拟合：开源实现？下一篇由于中心极限定理，有没有*不是*正态分布的变量的示例？