机器算法验证 - 在二项式数据上使用 R 的良好拟合时的 NaN p 值 - 吾爱随笔录

在二项式数据上使用 R 的良好拟合时的 NaN p 值

机器算法验证二项分布卡方检验拟合优度

2022-03-02 23:14:06

我正在尝试测试计数数据向量与二项式的拟合优度。为此，我正在使用包goodfit()中的函数vcd。但是，当我运行该函数时，它会返回NaN卡方检验的 p 值。在我的设置中，我有一个包含 75 个元素的计数数据向量。

> library(vcd)
> counts <- c(32, 35, 44, 35, 41, 33, 42, 49, 36, 41, 42, 45, 38, 43, 36, 
35, 40, 40, 43, 34, 39, 31, 40, 39, 36, 37, 37, 37, 32, 48, 41, 
32, 37, 36, 49, 37, 41, 36, 34, 37, 41, 32, 36, 36, 30, 33, 33, 
42, 39, 36, 36, 29, 31, 41, 36, 39, 40, 37, 39, 39, 31, 39, 37, 
40, 33, 41, 34, 46, 35, 41, 44, 38, 44, 34, 42)
> test.gof <- goodfit(counts, type="binomial", 
+                     par=list(size=length(counts), prob=0.5))

一切正常，但是当我检查goodfit()对象时，我得到以下信息：

> summary(test.gof)

 Goodness-of-fit test for binomial distribution

                      X^2 df  P(> X^2)
Pearson               NaN 75       NaN
Likelihood Ratio 21.48322 19 0.3107244
Warning message:
In summary.goodfit(test.gof) : Chi-squared approximation may be incorrect

起初我怀疑这是一个小样本量问题，但我也有一个包含 50 个观察值的数据集，没有返回NaNp 值。我还尝试将方法切换goodfit()到 ML，结果相似。

为什么在这种情况下会产生这个功能NaN？是否有替代函数来计算计数数据的 GOF？

3个回答

您在观察计数中的频率为零。这解释NaN了您的数据中的 s。如果您查看test.gof对象，您会看到：

table(test.gof$observed)

 0  1  2  3  4  5  7  8 10 
56  5  3  2  5  1  1  2  1

你有 56 个零。无论如何，恕我直言，这个问题是针对http://stats.stackexchange.com的。

你会对一个经过手术改变的goodfit对象更满意吗？

> idx <- which(test.gof$observed != 0)
> idx
 [1] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 49 50
> test.gof$par$size <- length(  idx-1)
> test.gof$fitted <- test.gof$fitted[idx]
> test.gof$count <- test.gof$count[idx]
> test.gof$observed <- test.gof$observed[idx]
> summary(test.gof)

     Goodness-of-fit test for binomial distribution

                      X^2 df  P(> X^2)
Pearson               Inf 75 0.0000000
Likelihood Ratio 21.48322 19 0.3107244
Warning message:
In summary.goodfit(test.gof) : Chi-squared approximation may be incorrect

尝试绘制它。你会更好地了解正在发生的事情。如前所述，您得到 NaN 是因为您将 0 个频率传递给 chisq.test()

test.gof <- goodfit(counts, type="binomial", par=list(size=length(counts), prob=0.5)) 
plot(test.gof)
## doesn't look so good 
test.gof <- goodfit(counts, type="binomial", par=list(size=length(counts))) 
plot(test.gof)
## looks a little more clear

其它你可能感兴趣的问题

上一篇两个样本 t 检验的功效下一篇在被推荐的 wiki StatProb.com 上发布是否值得？