机器算法验证 - 计数解释变量、比例因变量 - 吾爱随笔录

我在想出一种分析数据的方法时遇到了一些麻烦。如果有一个简短的答案（即“使用逻辑回归，假人”），您可以发布它，我会自己进行一些挖掘 - 我只需要指出正确的方向......

我的自变量是一个计数，我的因变量是一个比率。这是数据：

success <- c(322,358,323,277)
total.trials <- c(540,533,507,540)
count = c(23,13,21,39)
ratio <- success/total.trials

IIRC，对比率〜计数进行简单的线性回归是错误的......那么我应该在这里使用什么方法？谢谢您的帮助。

好的，这是我在遵循 gung 使用 GEE 的建议后运行的一些代码：

subject <- c(1, 2, 3, 4)
success <- c(322, 358, 323, 277)
total <- c(540, 533, 507, 540)
count <- c(23, 13, 21, 39)
data <- cbind(success,total)

gee.model <- gee(data ~ count, id = subject, family = 'binomial')

summary(gee.model)

GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
Link:                      Logit 
Variance to Mean Relation: Binomial 
Correlation Structure:     Independent 

Call:
gee(formula = data ~ count, id = subject, family = "binomial")

Summary of Residuals:
     Min       1Q   Median       3Q      Max 
  276.6608 310.3817 322.1195 331.3620 357.5969 


Coefficients:
               Estimate  Naive S.E.   Naive z  Robust S.E.  Robust z
(Intercept) -0.25516680 0.031437649 -8.116599 0.0134033383 -19.03756
count       -0.01055972 0.001244121 -8.487698 0.0002616798 -40.35360

Estimated Scale Parameter:  0.1066564
Number of Iterations:  1

Working Correlation
     [,1]
[1,]    1

这看起来正确吗？而且，如果我的解释正确，那么计数对比例有显着影响。