我进行了很多生物测定,在这些生物测定中,我不是对个体进行死亡率评分,而是对个体群体的死亡率进行评分(分母,即试验次数是已知的):
样本数据:
library(tidyverse)
library(betareg)
dataset <- data.frame(treatment = rep(c("a", "b", "c", "d"), each = 5),
success = c(6,6,9,6,5,10,10,9,10,10,6,8,9,9,10,7,8,6,8,7),
fail = c(4,3,1,4,5,0,0,0,0,0,4,2,1,1,0,3,2,4,2,3)) %>%
mutate(n = success+fail,
treatment = factor(treatment, levels = c("a", "b", "c", "d")))
通常,我会将其作为具有准二项式家族的 GLM 运行,但在某些情况下(如提供的情况),我遇到一种情况,即一次处理中的所有复制为 0% 或 100%。在这种情况下标准。对于该处理,误差和 CI 估计值非常大。我认为这与完全分离类似(或相同),尽管我只见过在由 0 和 1 组成的二项式响应数据中讨论过完全分离。
常规 GLM:
data.list <- list(success.fail = cbind(dataset$success, dataset$fail), treatment = dataset$treatment)
binary.glm <- glm(success.fail ~ treatment, data = data.list, family = quasibinomial)
summary(binary.glm)
> summary(binary.glm)
Call:
glm(formula = success.fail ~ treatment, family = quasibinomial,
data = data.list)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.81457 -0.33811 0.00012 0.54942 1.86737
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6325 0.2622 2.412 0.0282 *
treatmentb 20.4676 2890.5027 0.007 0.9944
treatmentc 1.0257 0.4271 2.402 0.0288 *
treatmentd 0.3119 0.3801 0.821 0.4239
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 0.7634611)
Null deviance: 43.36 on 19 degrees of freedom
Residual deviance: 13.40 on 16 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 18
我在这篇文章中读过关于处理分离的内容,但我没有办法将 Firth 的惩罚可能性与我的比例数据一起使用。相反,我转向了 beta 回归作为一种选择。在 betareg() 的 weights 参数中使用我的每个比例的 n 值(观察次数)作为加权值是否合理?我使用 Smithson & Verkuilen 的等式 (4) 将我的比例转换为适合 (0,1) 区间,“ A Better Lemon Squeezer?Maximum-Likelihood Regression With Beta-Distributed Dependent Variables ”(直接链接到PDF)。
贝塔回归:
lemonsqueeze.fun <- function(df, success, fail) {
new <- df %>%
mutate(prop.naught = {{success}}/({{success}}+{{fail}}),
n = {{success}}+{{fail}},
y.prime = (prop.naught-0)/(1-0),
y.doubleprime = ((y.prime*(length(y.prime)-1))+0.5)/length(y.prime))
return(new)
}
transformed.data <- lemonsqueeze.fun(dataset, success, fail)
beta.mod <- betareg(y.doubleprime ~ treatment, data = transformed.data, weights = n)
summary(beta.mod)
> summary(beta.mod)
Call:
betareg(formula = y.doubleprime ~ treatment, data = transformed.data, weights = n)
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-6.8894 -1.9274 0.0000 0.5899 8.4217
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.63875 0.07291 8.761 <2e-16 ***
treatmentb 2.30484 0.14846 15.525 <2e-16 ***
treatmentc 1.04830 0.11600 9.037 <2e-16 ***
treatmentd 0.21245 0.10408 2.041 0.0412 *
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 15.765 1.602 9.841 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 229.5 on 5 Df
Pseudo R-squared: 0.7562
Number of iterations: 21 (BFGS) + 2 (Fisher scoring)
在所有情况下,标准。误差要小得多,我们不再对治疗 b 进行疯狂的估计。这有问题吗?我应该检查的假设?给这只猫剥皮的更好方法?唉,我不是统计学家,只是一个统计学知识不足以构成危险的生物学家。