机器算法验证 - 移位对数正态分布的稳健参数估计 - 吾爱随笔录

移位对数正态分布的稳健参数估计

机器算法验证分布估计对数正态分布

2022-03-27 08:35:20

我有一个非常适合 logNormal 分布的数据集。（从理论的角度来看，这是一些难以解决的商分布）。

然而，数据很脏，所以参数估计远非微不足道。

现在，我的方法如下：

移动分布，使最小值几乎为 0。
日志空间数据
使用稳健的中值和 MAD 参数估计（有关详细信息，请参阅估计正态分布的参数：中值而不是均值？）

结果明显好于以前（与经验 CDF 的最大差异为 0.034 而不是 0.081 和不使用 MAD 的 0.224）。特别是在我期望异常值的长尾上，它并不完美。额外的位置参数有很大帮助。然而，使用最小值是一种非常粗略的启发式方法。我显然不能期望观察到真正的最小值，但根据样本大小，观察到的最小值总是会小 x 大一些。

您是否知道任何稳健的参数估计方法（如果可能，请提供参考） $e^{\mathcal{N}(\mu, \sigma)} + c$ 分销家庭？

请注意，例如 scipy.stats.lognorm 也确实有这样一个额外的第三个位置参数，就像我正在使用的那个一样，但我正在使用自己的代码在 Java 中工作。

更新：我刚刚遇到了一篇关于这个主题的论文：

估计三参数对数正态分布的参数
Rodrigo J. Aristizabal

其中包括一些相关文献的指针，特别是

通过最大似然估计对数正态分布的参数
AC Cohen, Jr.

但我发现很难从这些出版物中找到一个我可以实施的公式。

1个回答

如果有人仍然感兴趣，我已经设法在 Java 中实现了 Aristizabal 的公式。这比请求的“健壮”代码更具概念验证性，但它是一个起点。

/**
 * Computes the point estimate of the shift offset (gamma) from the given sample. The sample array will be sorted by this method.<p>
 * Cf. Aristizabal section 2.2 ff.
 * @param sample {@code double[]}, will be sorted
 * @return gamma point estimate
 */
public static double pointEstimateOfGammaFromSample(double[] sample) {
    Arrays.sort(sample);
    DoubleUnaryOperator func = x->calculatePivotalOfSortedSample(sample, x)-1.0;
    double upperLimit = sample[0];
    double lowerLimit = 0;
    double gamma = bisect(func, lowerLimit, upperLimit);
    return gamma;
}

/**
 * Cf. Aristizabal's equation (2.3.1)
 * @param sample {@code double[]}, should be sorted in ascending order
 * @param gamma shift offset
 * @return pivotal value of sample
 */
private static double calculatePivotalOfSortedSample(final double[] sample, double gamma) {
    final int n=sample.length;
    final int n3=n/3;
    final double mid = avg(sample, gamma, n3+1, n-n3);
    final double low = avg(sample, gamma, 1, n3);
    final double upp = avg(sample, gamma, n-n3+1, n);
    final double result = (mid-low)/(upp-mid);
    return result;
}

/**
 * Computes average of sample values from {@code sample[l-1]} to {@code sample[u-1]}.
 * @param sample {@code double[]}, should be sorted in ascending order
 * @param gamma shift offset
 * @param l lower limit
 * @param u upper limit
 * @return average
 */
private static double avg(double[] sample, double gamma, int l, int u) {
    double sum = 0.0;
    for (int i=l-1;i<u;sum+=Math.log(sample[i++]-gamma));
    final int n = u-l+1;
    return sum/n;
}

/**
 * Naive bisection implementation. Should always complete if the given values actually straddles the root.
 * Will call {@link #secant(DoubleUnaryOperator, double, double)} if they do not, in which case the
 * call may not complete.
 * @param func Function solve for root value
 * @param lowerLimit Some value for which the given function evaluates < 0
 * @param upperLimit Some value for which the given function evaluates > 0
 * @return x value, somewhere between the lower and upper limits, which evaluates close enough to zero
 */
private static double bisect(DoubleUnaryOperator func, double lowerLimit, double upperLimit) {
    final double eps = 0.000001;
    double low=lowerLimit;
    double valAtLow = func.applyAsDouble(low);
    double upp=upperLimit;
    double valAtUpp = func.applyAsDouble(upp);
    if (valAtLow*valAtLow>0) {
        // Switch to secant method
        return secant(func, lowerLimit, upperLimit);
    }
    System.out.printf("bisect %f@%f -- %f@%f%n", valAtLow, low, valAtUpp, upp);
    double mid;
    while(true) {
        mid = (upp+low)/2;
        if (Math.abs(upp-low)/low<eps)
            break;
        double val = func.applyAsDouble(mid);
        if (Math.abs(val)<eps)
            break;
        if (val<0)
            low=mid;
        else
            upp=mid;
    }
    return mid;
}

/**
 * Naive secant root solver implementation. May not complete if root not found.
 * @param f Function solve for root value
 * @param a Some value for which the given function evaluates
 * @param b Some value for which the given function evaluates
 * @return x value which evaluates close enough to zero
 */
static double secant(final DoubleUnaryOperator f, double a, double b) {
    double fa = f.applyAsDouble(a);
    if (fa==0)
        return a;
    double fb = f.applyAsDouble(b);
    if (fb==0)
        return b;
    System.out.printf("secant %f@%f -- %f@%f%n", fa, a, fb, b);
    if (fa*fb<0) {
        return bisect(f, a, b);
    }
    while ( abs(b-a) > abs(0.00001*a) ) {
          final double m = (a+b)/2;
          final double k = (fb-fa)/(b-a);
          final double fm = f.applyAsDouble(m);
          final double x = m-fm/k;
          if (Math.abs(fa)<Math.abs(fb)) {
              // f(a)<f(b); Choose x and a
              b=x;
              fb=f.applyAsDouble(b);
          } else {
              // f(a)>=f(b); Choose x and b
              a=x;
              fa=f.applyAsDouble(a);
          }
          if (fa==0)
              return a;
          if (fb==0)
              return b;
          if (fa*fb<0) {
              // Straddling root; switch to bisect method
              return bisect(f, a, b);
          }
      }
    return (a+b)/2;

}

其它你可能感兴趣的问题

上一篇通过模拟获得的统计数据的标准误差下一篇是否可以比较模型拟合高斯与二项式 glm