R 中库的plm
功能plm
让我对重复的时间 ID 对感到悲痛,即使我正在运行一个我认为根本不需要时间变量的模型(参见下面的可重现示例)。
我能想到三种可能:
- 我对固定效应回归的理解是错误的,它们确实需要唯一的时间索引(或根本没有时间索引!)。
- plm() 在这里过于挑剔,应该放宽这个要求。
- plm() 使用的特定估计技术(内部转换)需要时间索引,即使顺序似乎并不重要,而且计算效率较低的版本(包括直接 OLS 模型中的虚拟变量)并不重要不需要它们。
有什么想法吗?
set.seed(1)
n <- 1000
test <- data.frame( grp = as.factor(rep( letters, (n/length(letters))+1 ))[seq(n)], x = runif(n), z = runif(n) )
test$y <- with( test, 2*x + 3*z + rnorm(n) )
lm( y ~ x + z, data = test )
lm( y ~ x + z + grp, data = test )
require(plm)
# Model fails if I don't specify a time index, despite effect = "individual"
plm( y ~ x + z, data = test, model = "within", effect="individual", index = "grp" )
# Create time variable and add it to the index but still specify individual FE not time FE also
library(plyr)
test <- ddply( test, .(grp), function(dat) transform( dat, t = seq(nrow(dat)) ) )
# Now plm() works; note coefficients clearly include the fixed effects, as they match the lm() version above
plm( y ~ x + z, data = test, model = "within", effect="individual", index = c("grp","t") )
# Scramble time variables and show they don't matter as long as they're unique within a cluster
test <- ddply( test, .(grp), function(dat) transform( dat, t = sample(t) ) )
plm( y ~ x + z, data = test, model = "within", effect="individual", index = c("grp","t") )
# Add a duplicate time entry and show that it causes plm() to fail
test[ 2, "t" ] <- test[ 1, "t" ]
plm( y ~ x + z, data = test, model = "within", effect="individual", index = c("grp","t") )
为什么这很重要
我正在尝试引导我的模型,当我要求索引时间对是唯一的时,这会导致头痛,如果 (2) 为真,这似乎是不必要的。