比较重复测量 FE 模型中的组,具有嵌套误差分量,使用 plm 估计

机器算法验证 r 假设检验 重复测量 嵌套数据 plm
2022-03-07 23:11:16

我已经估计了一些重复测量的固定效应模型,具有嵌套的误差分量,基于分组变量,即非嵌套模型,使用. 我现在有兴趣

  1. 测试完整模型是否显着不同,即H_o其中是完整模型是完整模型
    Ho:βFemale=βMale
    βFemaleFemalesβMaleMales
  2. 随后测试两组之间选择的回归系数,即其中是女性的回归系数在, 和是男性在 的回归系数
    Ho:βFemale==year1.5=βMale==year1.5
    βFemale==year1.5year1.5βMale==year1.5year1.5

我将使用以下工作示例来说明情况,

首先,需要一些包,

# install.packages(c("plm","texreg","tidyverse","lmtest"), dependencies = TRUE)
library(plm); library(lmtest); require(tidyverse)

二、一些资料准备,

data(egsingle, package = "mlmRev")
dta <-  egsingle %>% mutate(Female = recode(female,.default = 0L,`Female` = 1L))

三、我为数据中的每个性别估计了一组模型

MoSpc <- as.formula(math ~ Female + size + year)
dfMo = dta %>% group_by(female) %>%
    do(fitMo = plm(update(MoSpc, . ~ . -Female), 
       data = ., index = c("childid", "year", "schoolid"), model="within") )

第四,让我们看一下两个估计模型,

texreg::screenreg(dfMo[[2]], custom.model.names = paste0('FE: ', dfMo[[1]]))
#> ===================================
#>            FE: Female   FE: Male   
#> -----------------------------------
#> year-1.5      0.79 ***     0.88 ***
#>              (0.07)       (0.10)   
#> year-0.5      1.80 ***     1.88 ***
#>              (0.07)       (0.10)   
#> year0.5       2.51 ***     2.56 ***
#>              (0.08)       (0.10)   
#> year1.5       3.04 ***     3.17 ***
#>              (0.08)       (0.10)   
#> year2.5       3.84 ***     3.98 ***
#>              (0.08)       (0.10)   
#> -----------------------------------
#> R^2           0.77         0.79    
#> Adj. R^2      0.70         0.72    
#> Num. obs.  3545         3685       
#> ===================================
#> *** p < 0.001, ** p < 0.01, * p < 0.05    #> 

现在,我想测试这两个(线性 OLS)模型是否显着不同,参见。上面的第 1 点。我环顾了 SO和互联网,有些人建议我需要使用plm::pFtest(),也建议在这里,我已经尝试过,但我不相信。我会想象一些非嵌套模型的测试,可能的 Cox 测试,lmtest::coxtest但我完全不确定。如果这里有人可以帮助我。

我试过,

plm::pFtest(dfMo[[1,2]], dfMo[[2,2]])
# >
# > F test for individual effects
# >
# >data:  update(MoSpc, . ~ . - Female)
# >F = -0.30494, df1 = 113, df2 = 2693, p-value = 1
# >alternative hypothesis: significant effects

和,

lmtest::coxtest(dfMo[[1,2]], dfMo[[2,2]])
# > Cox test
# > 
# > Model 1: math ~ size + year
# > Model 2: math ~ size + year
# >                 Estimate Std. Error    z value Pr(>|z|)    
# > fitted(M1) ~ M2     0.32    1.66695     0.1898   0.8494    
# > fitted(M2) ~ M1 -1222.87    0.13616 -8981.1963   <2e-16 ***
# > ---
# > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# > Warning messages:
# > 1: In lmtest::coxtest(dfMo[[1, 2]], dfMo[[2, 2]]) :
# >   models fitted on different subsets
# > 2: In lmtest::coxtest(dfMo[[1, 2]], dfMo[[2, 2]]) :
# >   different dependent variables specified

其次,我有兴趣比较两组之间的回归系数。比如说,year1.53.04 的估计值是否与 3.17 显着不同?参照。上述第 2 点。

请询问以上是否有任何不清楚的地方,我很乐意详细说明。任何帮助将不胜感激!

我意识到这个问题有点像编程,但我最初是在 SO 中发布的。但是,DWin非常友好地指出该问题属于 CrossValidated 并将其迁移到此处。

1个回答

下面的代码实现了在Femaledummy 和 year 之间放置交互的做法。底部的 F 测试测试你的 nullβFemale=βMale. 输出中的 t 统计plm量测试您的空值βFemale:year=1.5=βMale:year=1.5. 特别是,对于year=1.5,p 值为 0.32。

library(plm)  # Use plm
library(car)  # Use F-test in command linearHypothesis
library(tidyverse)
data(egsingle, package = 'mlmRev')
dta <- egsingle %>% mutate(Female = recode(female, .default = 0L, `Female` = 1L))
plm1 <- plm(math ~ Female * (year), data = dta, index = c('childid', 'year', 'schoolid'), model = 'within')

# Output from `summary(plm1)` --- I deleted a few lines to save space.
# Coefficients:
#                 Estimate Std. Error t-value Pr(>|t|)    
# year-1.5          0.8842     0.1008    8.77   <2e-16 ***
# year-0.5          1.8821     0.1007   18.70   <2e-16 ***
# year0.5           2.5626     0.1011   25.36   <2e-16 ***
# year1.5           3.1680     0.1016   31.18   <2e-16 ***
# year2.5           3.9841     0.1022   38.98   <2e-16 ***
# Female:year-1.5  -0.0918     0.1248   -0.74     0.46    
# Female:year-0.5  -0.0773     0.1246   -0.62     0.53    
# Female:year0.5   -0.0517     0.1255   -0.41     0.68    
# Female:year1.5   -0.1265     0.1265   -1.00     0.32    
# Female:year2.5   -0.1465     0.1275   -1.15     0.25    
# ---

xnames <- names(coef(plm1)) # a vector of all independent variables' names in 'plm1'
# Use 'grepl' to construct a vector of logic value that is TRUE if the variable
# name starts with 'Female:' at the beginning. This is generic, to pick up
# every variable that starts with 'year' at the beginning, just write
# 'grepl('^year+', xnames)'.
picked <- grepl('^Female:+', xnames)
linearHypothesis(plm1, xnames[picked])

# Hypothesis:
# Female:year - 1.5 = 0
# Female:year - 0.5 = 0
# Female:year0.5 = 0
# Female:year1.5 = 0
# Female:year2.5 = 0
# 
# Model 1: restricted model
# Model 2: math ~ Female * (year)
# 
#   Res.Df Df Chisq Pr(>Chisq)
# 1   5504                    
# 2   5499  5  6.15       0.29