机器算法验证 - 转换以增加正常 rv 的峰度和偏度 - 吾爱随笔录

转换以增加正常 rv 的峰度和偏度

机器算法验证数据转换正态假设偏度峰度

2022-02-04 01:41:58

我正在研究一种算法，该算法依赖于观察结果 $Y$ s 是正态分布的，我想根据经验测试算法对这个假设的鲁棒性。

为此，我正在寻找一系列转换 $T_1(), \dots, T_n()$ 这将逐渐破坏正常的 $Y$ . 例如，如果 $Y$ s 是正常的他们有偏斜 $= 0$ 和峰度 $= 3$ ，并且很高兴找到一个逐渐增加两者的转换序列。

我的想法是模拟一些正态分布的数据 $Y$ 并测试算法。在每个转换后的数据集上测试算法 $T_1(Y), \dots, T_n(y)$ ，看看输出有多少变化。

请注意，我不控制模拟的分布 $Y$ s，所以我无法使用泛化正态分布的分布（例如偏斜的广义误差分布）来模拟它们。

4个回答

这可以使用来自的 sinh-arcsinh 变换来完成

琼斯，MC 和 Pewsey A.（2009 年）。Sinh-arcsinh 分布。生物计量学 96：761-780。

转换定义为

\begin{matrix} (⋆) & H (x; ϵ, δ) = \sinh [δ \sinh^{- 1} (x) - ϵ], \end{matrix}

$H(x;\epsilon,\delta)=\sinh[\delta\sinh^{-1}(x)-\epsilon], \tag{$\star$}$

在哪里 $\epsilon \in{\mathbb R}$ 和 $\delta \in {\mathbb R}_+$ . 当将此变换应用于普通 CDF 时 $S(x;\epsilon,\delta)=\Phi[H(x;\epsilon,\delta)]$ ，它产生一个单峰分布，其参数 $(\epsilon,\delta)$ 在van Zwet (1969)的意义上，分别控制偏度和峰度 (Jones and Pewsey, 2009 ) 。此外，如果 $\epsilon=0$ 和 $\delta=1$ ，我们得到原始的正态分布。请参阅以下 R 代码。

fs = function(x,epsilon,delta) dnorm(sinh(delta*asinh(x)-epsilon))*delta*cosh(delta*asinh(x)-epsilon)/sqrt(1+x^2)

vec = seq(-15,15,0.001)

plot(vec,fs(vec,0,1),type="l")
points(vec,fs(vec,1,1),type="l",col="red")
points(vec,fs(vec,2,1),type="l",col="blue")
points(vec,fs(vec,-1,1),type="l",col="red")
points(vec,fs(vec,-2,1),type="l",col="blue")

vec = seq(-5,5,0.001)

plot(vec,fs(vec,0,0.5),type="l",ylim=c(0,1))
points(vec,fs(vec,0,0.75),type="l",col="red")
points(vec,fs(vec,0,1),type="l",col="blue")
points(vec,fs(vec,0,1.25),type="l",col="red")
points(vec,fs(vec,0,1.5),type="l",col="blue")

因此，通过选择适当的参数序列 $(\epsilon_n,\delta_n)$ ，您可以生成一系列具有不同偏度和峰度的分布/变换，并使它们看起来与您想要的正态分布相似或不同。

下图显示了 R 代码产生的结果。对于（一） $\epsilon=(-2,-1,0,1,2)$ 和 $\delta=1$ , 和(ii) $\epsilon=0$ 和 $\delta=(0.5,0.75,1,1.25,1.5)$ .

在此处输入图像描述

这个分布的模拟很简单，因为您只需要使用的倒数来转换一个正态样本 $(\star)$ .

H^{- 1} (x; ϵ, δ) = \sinh [δ^{- 1} (\sinh^{- 1} (x) + ϵ)]

$H^{-1}(x;\epsilon,\delta)=\sinh[\delta^{-1}(\sinh^{-1}(x)+\epsilon)]$

这可以使用 Lambert W x F 随机变量/分布来完成。Lambert W x F 随机变量 (RV) 是具有分布 F 的非线性变换 (RV) X。

对于 F 是正态分布和 $\alpha = 1$ ，它们减少到 Tukey 的 h 分布。Lambert W x F 分布的优点是您也可以再次从非正态返回到正态；即，您可以估计参数和Gaussianize()数据。

它们在

Lambert W x F 变换有 3 种风格：

type = 's'带偏度参数的偏斜 ( ) $\gamma \in R$
type = 'h'带尾参数的重尾 ( ) $\delta \geq 0$ （和可选的 $\alpha$ )
type = 'hh'带有左/右尾参数的倾斜和重尾 ( ) $\delta_l, \delta_r \geq 0$

请参阅有关倾斜和重尾的参考资料（免责声明：我是作者。）

在 R 中，您可以使用LambertW包模拟、估计、绘制等多个 Lambert W x F 分布。

library(LambertW)
library(RColorBrewer)
# several heavy-tail parameters
delta.v <- seq(0, 2, length = 11)
x.grid <- seq(-5, 5, length = 100)
col.v <- colorRampPalette(c("black", "orange"))(length(delta.v))

plot(x.grid, dnorm(x.grid), lwd = 2, type = "l", col = col.v[1],
     ylab = "")
for (ii in seq_along(delta.v)) {
  lines(x.grid, dLambertW(x.grid, "normal", 
                          theta = list(delta = delta.v[ii], beta = c(0, 1))),
        col = col.v[ii])
}
legend("topleft", paste(delta.v), col = col.v, lty = 1,
       title = "delta = ")

在此处输入图像描述

对于一系列 $\gamma$ 添加偏度。如果你想添加偏度和重尾然后生成一个序列 $\delta_l$ 和 $\delta_r$ .

一个这样的序列是不同程度的指数。例如

library(moments)
x <- rnorm(1000) #Normal data
x2 <- 2^x #One transformation
x3 <- 2^{x^2} #A stronger transformation
test <- cbind(x, x2, x3) 
apply(test, 2, skewness) #Skewness for the three distributions
apply(test, 2, kurtosis) #Kurtosis for the three distributions

你可以使用 $x^{1.1}, x^{1.2} \dots x^2$ 以获得中等程度的转变。

与@user10525 相同的答案，但在 python 中

import numpy as np
from scipy.stats import norm
def sinh_archsinh_transformation(x,epsilon,delta):
    return norm.pdf(np.sinh(delta*np.arcsinh(x)-epsilon))*delta*np.cosh(delta*np.arcsinh(x)-epsilon)/np.sqrt(1+np.power(x,2))


vec = np.arange(start=-15,stop=15+0.001,step=0.001)

import matplotlib.pyplot as plt
plt.plot(vec,sinh_archsinh_transformation(vec,0,1))
plt.plot(vec,sinh_archsinh_transformation(vec,1,1),color='red')
plt.plot(vec,sinh_archsinh_transformation(vec,2,1),color='blue')
plt.plot(vec,sinh_archsinh_transformation(vec,-1,1),color='red')
plt.plot(vec,sinh_archsinh_transformation(vec,-2,1),color='blue')

[

其它你可能感兴趣的问题

上一篇时间序列中可逆过程的直觉是什么？下一篇为什么 L2 norm loss 有唯一解，而 L1 norm loss 可能有多个解？