机器算法验证 - 比较两个多项分布 - 吾爱随笔录

背景：想象一个披萨切成 8 片。

[ 8片披萨]

在切片的每个直边上，我插入一个极性相反的磁铁，朝外。如果我将这些组件分开，防止翻转并摇晃它们，它们应该会形成一个完整的比萨饼。

现在，如果我再放一片（相同尺寸，是完整披萨的 1/8），则不会总是形成完整的披萨。它可以形成以下集群：4 & 5、3 & 6、2 & 7 和 1 & 8。

一个模型（由 Hosokawa 等人（1994 年）给出）给了我每个集群形成的概率。为了验证模型，我做了一些物理实验。我对每个实验条件进行 20 次试验。

我的结果如下所示：

Cluster  Theoretical  Physical
3,6:    6.01961132827  4  
1,8:    2.77455224377  5  
4,5:    6.62198848501  5  
2,7:    4.58384794294  6

上述数据是多项分布（类似于掷骰子时获得的分布）。当有 9 个切片时，每个试验可以以 4 个状态之一结束。

除了 9 片实验之外，我还有 40 片（和其他几个）实验的数据。（如果您希望我将其包含在此处，请告诉我）

问题：为了测试拟合优度，我会运行 Pearson 卡方检验。但是，由于两种分布的均值都“接近”，我不能拒绝原假设。但是，我也不能接受零假设。

问题：有没有更好的方法来显示模型与物理实验的“接近”程度？“标准偏差”的多项式等价物，或者可能是置信区间？带置信区间的回归？

更新：我的一位同事建议在 R 中使用以下方法进行回归分析：

d=read.csv("data.csv")
length(unique(d$abs_state))
nrow(d)
d$n_componentsf=as.factor(d$n_components)

ncomps=9
dsubs=d[d$n_components==ncomps,]
# using exact multinomial test in EMT (better)
library(EMT)
# using Chi square test statistics
EMT::multinomial.test(dsubs$freq_obs, 
                      dsubs$freq_pred/sum(dsubs$freq_pred),
                      useChisq=TRUE,MonteCarlo=FALSE)
# using p value calculated via Monte Carlo approach
EMT::multinomial.test(dsubs$freq_obs, 
                      dsubs$freq_pred/sum(dsubs$freq_pred),
                      ntrial = 100000,
                      MonteCarlo=TRUE)

# other exact multinomial test implementation
library(RVAideMemoire)
RVAideMemoire::multinomial.test(dsubs$freq_obs, 
                                p=dsubs$freq_pred/sum(dsubs$freq_pred))


# if you're interested in 95% confidence limits
# on your multinomial proportions you can 
# calculate these like this
library(DescTools)
MultinomCI(dsubs$freq_obs, conf.level=0.95)

library(ggplot2)
library(lattice)

# if you would like to analyse all data together
# you could also analyse the observed frequency
# counts as having approx Poisson expectation,
# using a Poisson GLM 
# (or a binomial GLM if you put cbind(freq_obs,freq_notobs) as your dependent var and use family=binomial)
library(afex)
library(MASS)
library(car)
library(effects)
set_sum_contrasts() # use effect coding, as data is unbalanced

fit=glm(freq_obs ~ freq_pred, 
        family=quasipoisson, data=d)
summary(fit)
Anova(fit, test="LR", type="III")

plot(allEffects(fit),type="response", rug=F)
plot1=plot(Effect(c("freq_pred"),
            mod=fit),type="response",rug=F)
plot2=xyplot(freq_obs ~ freq_pred,
             data=d,
             type=c("p"),
             par.settings=
               simpleTheme(col=c("blue","red"),pch=16,cex=1.1))
library(latticeExtra)
plot1+plot2 # nice correspondence between observed and expected frequencies

# sticking model predictions in data frame and
# plotting this in ggplot2 instead
df=data.frame(effect(term="freq_pred",
                  mod=fit,
                  xlevels=list(freq_pred=seq(0,15,1)),type="response"))
df

library(ggplot2)
library(ggthemes)
ggplot(data=df) +
  geom_ribbon(data = df, 
              aes(x=freq_pred, ymin=lower, ymax=upper), linetype=2, alpha=0.1) +
  geom_line(data = df, aes(x=freq_pred, y=fit)) +
  geom_point(data = d, aes(x=freq_pred, y=freq_obs, color=n_componentsf), cex=3) +
  ylab("Predicted frequency") + 
  xlab("Observed frequency") +
  theme_few(base_size=16)

Data.csv 有：

n_components,abs_state,freq_pred,freq_obs,freq_notobs
9,"3,6",6.019611328,4,16
9,"1,8",2.774552244,5,15
9,"4,5",6.621988485,5,15
9,"2,7",4.583847943,6,14
15,"3,6,6",1.81009773,0,20
15,"4,5,6",3.927622683,7,13
15,"7,8",13.49657189,13,7
15,"5,5,5",0.765707695,0,20
40,"4,5,5,5,6,7,8",0.803987454,0,20
40,"5,6,6,7,8,8",3.376961833,3,17
40,"2,7,7,8,8,8",0.230595232,0,20
40,"5,7,7,7,7,7",0.175319358,0,20
40,"5,6,7,7,7,8",2.709196442,1,19
40,"5,5,5,5,6,7,7",0.170797287,0,20
40,"4,5,5,6,6,7,7",0.847746645,1,19
40,"8,8,8,8,8",0.230119576,0,20
40,"4,6,7,7,8,8",1.795116571,0,20
40,"6,6,6,6,8,8",0.483846181,0,20
40,"5,5,5,5,6,6,8",0.185675465,1,19
40,"4,7,7,7,7,8",0.4072505,0,20
40,"5,5,5,6,6,6,7",0.274814936,1,19
40,"5,5,6,8,8,8",0.708881447,1,19
40,"4,6,6,8,8,8",0.479526825,1,19
40,"4,5,5,5,5,8,8",0.071967085,0,20
40,"5,5,7,7,8,8",1.233981848,1,19
40,"4,5,6,6,6,6,7",0.34756786,1,19
40,"6,6,6,7,7,8",2.208449237,2,18
40,"4,5,5,6,6,6,8",0.494611498,1,19
40,"5,5,5,5,5,7,8",0.061650486,0,20
40,"4,5,5,5,5,5,5,6",0.010322162,0,20
40,"3,6,6,6,6,6,7",0.071274376,0,20
40,"4,6,6,6,6,6,6",0.015244456,0,20
40,"6,6,7,7,7,7",0.656508868,1,19
40,"4,5,5,5,7,7,7",0.155256863,2,18
40,"5,5,6,6,6,6,6",0.04917738,0,20
40,"3,6,7,8,8,8",0.634760319,0,20
40,"3,7,7,7,8,8",0.430171526,1,19
40,"5,5,5,5,5,5,5,5",0.00022644,0,20

该代码创建了 3 个令人印象深刻的 [至少对我来说 :) ] 图。但是，我对它们的正确性持怀疑态度。据我了解，回归曲线代表了预测和观察之间的对应关系。考虑到对应关系，这条曲线可能是“最佳拟合”。然而，我似乎担心的是，大部分相应的观点都是在一开始就提出的。这种担忧有效吗？

参考回归曲线，如何计算置信区间？我找到了 Bauer (1972) 的论文，但无法令人满意地理解它。帮助理解如何计算置信区间将不胜感激。

其他（可能是基本的）问题： - 不观察回归图上的状态有什么影响？例如，15 片实验的第 1 和第 4 状态不是物理观察到的吗？- 在第三个图中（您在执行 R 代码后获得），是否存在轴可能翻转的特殊原因？

最后，我可以从图中得出什么结论？回归图上的大多数对应点都在置信区间之外。这是模型（不）做出预测能力的万无一失的指示吗？

最后，很明显，数据不足。为了找到所需的数据量，必须进行足够数量的试验，直到置信区间减小到所需的宽度（我开始意识到选择这个宽度并非易事）。我尝试过预测频率的随机复制（穷人的自举？），但我再次遇到了不理解如何从 Pearson 的卡方统计中计算置信区间的问题。