机器算法验证 - 解释 R 中的 plm 输出 - 用于非常不平衡面板的观察次数 - 吾爱随笔录

解释 R 中的 plm 输出 - 用于非常不平衡面板的观察次数

机器算法验证 r 解释面板数据

2022-04-18 13:13:37

我正在使用该plm函数运行一个固定效果模型，并且我正在寻求帮助来解释输出的一个方面。如果输出显示：

Call:
plm(formula = dependentvariable ~ independentvars, data = data, model = "within", 
    type="time")

Unbalanced Panel: n=176, T=1-2, N=211

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-0.0654  0.0000  0.0000  0.0000  0.0654 

Coefficients :
    Estimate Std. Error t-value Pr(>|t|)   
x1  -0.4219101  0.1662230 -2.5382 0.020054 * 
x2  -0.0072536  0.0069678 -1.0410 0.310933   
x3  -0.2221514  0.0574869 -3.8644 0.001044 **
x4   0.1118861  0.1247960  0.8966 0.381177   

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    0.18087
Residual Sum of Squares: 0.045841
R-Squared      :  0.74656 
      Adj. R-Squared :  0.067226 
F-statistic: 5.08807 on 11 and 19 DF, p-value: 0.00099507

我怎么知道这里使用了多少观察，因为我的面板非常不平衡？由于输出中同时提供了小 n 和大 N，我对如何找到包含多少观察值感到困惑。

1个回答

大写 N 为您提供数据中的总行数，该总行数对应于池模型中的观察数（plm 函数中的选项 model="pooling"）。

小写的 n 为您提供唯一数量的观察结果（例如群体或个人）。如果您使用最小二乘虚拟变量估计器而不是等效的内部估计器，这对应于您添加的虚拟器数量。

大写显示观察或观察个人的频率（指时间维度）。意味着有两个人你只观察了一个，还有一些人被观察了两次。 $T$ $T=1-2$

由于您最多有两个时期，因此您可以轻松计算出有 35 个人 (211-176) 出现了两次，而 141 个人的信息仅在一个时期可用。一般来说，如果 T>2，则无法进行此计算，并且您需要更多信息。

这是德甲三个赛季的足球比赛数据集。在德国，一个赛季有 18 支球队参加 34 场比赛。从每支球队的角度来看，每场比赛在这里出现两次，“主场”是指主队。我们使用模型内的固定效应来估计随着时间的推移（相对于我们样本中的第一个赛季）的主场优势，以控制可能的时间/季节效应。

rm(list=ls(all=TRUE))
library(plm)
bundesliga<-read.csv("https://drive.google.com/uc?export=download&id=0B70aDwYo0zuGRGxVV1p2MTlqaUk")
head(bundesliga)

> head(bundesliga)
   Season Round                Team             Opponent Home Goals_Diff
1 2013/14     1      Bayern München Bor. Mönchengladbach    H          2
2 2013/14     1     1899 Hoffenheim       1. FC Nürnberg    H          0
3 2013/14     1 Bayer 04 Leverkusen          SC Freiburg    H          2
4 2013/14     1         Hannover 96        VfL Wolfsburg    H          2
5 2013/14     1         FC Augsburg    Borussia Dortmund    H         -4
6 2013/14     1          Hertha BSC  Eintracht Frankfurt    H          5

# Create time index
bundesliga$Index<-as.numeric(as.factor(bundesliga$Season))*100+bundesliga$Round
# Declare panel data
bl_panel<-pdata.frame(bundesliga,c("Team","Index"))
# Run regression
summary(plm(Goals_Diff~Home*Season,data=bl_panel,model = "within"))

这是结果。主场优势很强，对应第一个赛季的净胜球为 0.67，随后两个赛季的主场优势与第一个赛季相比没有统计差异（并且没有明显的赛季影响）：

    > summary(plm(Goals_Diff~Home*Season,data=bl_panel,model = "within"))
Oneway (individual) effect Within Model

Call:
plm(formula = Goals_Diff ~ Home * Season, data = bl_panel, model = "within")

Unbalanced Panel: n=22, T=34-102, N=1836

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max. 
-7.0400 -1.1700 -0.0185  1.1300  5.8300 

Coefficients :
                     Estimate Std. Error t-value  Pr(>|t|)    
HomeH                0.673203   0.143947  4.6768 3.131e-06 ***
Season2014/15       -0.132377   0.147800 -0.8957    0.3706    
Season2015/16       -0.057980   0.149638 -0.3875    0.6985    
HomeH:Season2014/15  0.169935   0.203571  0.8348    0.4040    
HomeH:Season2015/16 -0.071895   0.203571 -0.3532    0.7240    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    5970.7
Residual Sum of Squares: 5735
R-Squared      :  0.039484 
      Adj. R-Squared :  0.038904 
F-statistic: 14.8726 on 5 and 1809 DF, p-value: 2.5075e-14

总共有 n=1,836 个观测值，三个季节的每个观测值 612 个。我们观察到 n=22 个独特的团队。

T=34-102 意味着有些球队我们只观察了一个赛季或 34 场比赛，而其他球队则出现在所有三个赛季（或 102 场比赛）中。要评估您的样本的平衡程度，您可以查看每支球队的赛季数，这表明大多数球队在所有三个赛季中都得到了观察。您可以计算和。 $N=1836=(15*3+2*2+5*1)*34$ $n=15+2+5$

> table(table(bl_panel$Team)/34)
 1  2  3 
 5  2 15

其它你可能感兴趣的问题

上一篇如何纠正线性回归中响应的非线性下一篇如何在生存分析中选择最佳分类方案（SurvivalROC、R2、Concordance、AIC）？