难以置信的小标准误

机器算法验证 回归 状态 标准错误 聚集标准错误
2022-04-10 19:49:19

我有很多医生手术成功的数据。我使用Stata估计了一个回归,对个别医生有修复效果。我首先使用强大的选项运行回归。个别医生估计的结果 t 值在 2.17 到 6.14 之间。然后我使用 vce(cluster doctor) 选项重新运行它。我预计标准误差会变大。但是,我确实得到了更小的标准。错误——小得多,例如 1.04e-14。这太好了,令人难以置信。这是为什么?有什么可能的原因吗?

2个回答

您使用根本无法协同工作的方法两次过度校正了个别医生的效果。

如果你的模型是regress outcome i.doctor, vce(cluster doctor),那么 Stata 应该抱怨你已经用尽了你的自由度。xtreg可能不那么聪明,并且可能错过对固定效应的完美确定。这些1e-14标准误差应该完全为零,并且由于在固定效应估计的内部某处四舍五入,它们在实践中是非零的。这里发生的是这样的:

  1. cluster方差估计通过对集群的集群贡献求和来工作。然而,
  2. 通过将医生指定为固定效应,您可以强制给定医生的残差总和为 0。
  3. regress知道如何在代数水平上确定这一点。xtreg但是,可能对计算线性代数知之甚少,无法做到这一点,只是简单地将(数值)零贡献相加,以产生您在此处看到的难以置信的小标准误差。

如果我了解您的问题,当集群内相关性为负时,可能会发生这种情况。直观地查看治疗师版本的 Stata常见问题解答。


编辑:

我认为 Stas 关于更深层次的问题是正确的。我太仓促了。这是我尝试使用 27,766 名越南村民的药房访问数据集来复制这一点,这些村民嵌套在 194 个村庄的 5,740 个家庭中(数据来自 Cameron 和 Trivedi)。我找不到聚集错误较小的公共数据集,但我认为这说明了要点。我会将药房访问视为连续的,尽管它们显然不是。

首先,我们设置数据:

. use "http://cameron.econ.ucdavis.edu/mmabook/vietnam_ex2.dta", clear

. egen hh=group(lnhhinc)
(1 missing value generated)

. bys hh: gen person = _n

. xtset hh person
       panel variable:  hh (unbalanced)
        time variable:  person, 1 to 19
                delta:  1 unit

. xtdes

      hh:  1, 2, ..., 5740                                   n =       5740
  person:  1, 2, ..., 19                                     T =         19
           Delta(person) = 1 unit
           Span(person)  = 19 periods
           (hh*person uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       2       4         5         6       8      19

(snip)

现在对于病假访问的 FE 回归:

. xtreg PHARVIS ILLDAYS, fe

Fixed-effects (within) regression               Number of obs      =     27765
Group variable: hh                              Number of groups   =      5740

R-sq:  within  = 0.1145                         Obs per group: min =         1
       between = 0.1390                                        avg =       4.8
       overall = 0.1257                                        max =        19

                                                F(1,22024)         =   2848.23
corr(u_i, Xb)  = 0.0465                         Prob > F           =    0.0000

------------------------------------------------------------------------------
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0788618   .0014777    53.37   0.000     .0759654    .0817581
       _cons |   .2906284   .0077221    37.64   0.000     .2754925    .3057643
-------------+----------------------------------------------------------------
     sigma_u |  .85814688
     sigma_e |   1.085808
         rho |  .38447214   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(5739, 22024) =     2.35         Prob > F = 0.0000

面板变量上的聚类会增加错误:

. xtreg PHARVIS ILLDAYS, fe vce(cluster hh)

Fixed-effects (within) regression               Number of obs      =     27765
Group variable: hh                              Number of groups   =      5740

R-sq:  within  = 0.1145                         Obs per group: min =         1
       between = 0.1390                                        avg =       4.8
       overall = 0.1257                                        max =        19

                                                F(1,5739)          =    464.54
corr(u_i, Xb)  = 0.0465                         Prob > F           =    0.0000

                                  (Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0788618   .0036589    21.55   0.000     .0716889    .0860346
       _cons |   .2906284   .0102597    28.33   0.000     .2705154    .3107413
-------------+----------------------------------------------------------------
     sigma_u |  .85814688
     sigma_e |   1.085808
         rho |  .38447214   (fraction of variance due to u_i)
------------------------------------------------------------------------------

现在我尝试这种非面板方法。我正在使用areg,因为 Stata 不允许我放入 ~6K 假人。

. areg PHARVIS ILLDAYS, absorb(hh) vce(cluster hh)

Linear regression, absorbing indicators           Number of obs   =      27765
                                                  F(   1,   5739) =     368.52
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.4579
                                                  Adj R-squared   =     0.3166
                                                  Root MSE        =     1.0858

                                  (Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0788618   .0041081    19.20   0.000     .0708084    .0869151
       _cons |   .2906284   .0115192    25.23   0.000     .2680464    .3132103
-------------+----------------------------------------------------------------
          hh |   absorbed                                    (5740 categories)

不幸的是,areg这掩盖了您感兴趣的内容。如果您使用regress并限制样本以使 HH 的数量合理,那么您将得到只有 1 个村民的集群的微小标准误差。这是有道理的,因为此类观察的残差将完全为零。这是一个例子:

. reg PHARVIS ILLDAYS i.hh if inrange(hh,1,100), cluster(hh)

Linear regression                                      Number of obs =     219
                                                       F(  0,    99) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.6473
                                                       Root MSE      =  .88177

                                   (Std. Err. adjusted for 100 clusters in hh)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0518095   .0314707     1.65   0.103    -.0106352    .1142542
             |
          hh |
          2  |         -1   1.84e-14 -5.4e+13   0.000           -1          -1
          3  |   .2590475   .1573536     1.65   0.103    -.0531762    .5712712
          4  |   .4662855   .2832365     1.65   0.103    -.0957171    1.028288
          5  |   2.129524   .0786768    27.07   0.000     1.973412    2.285636
          6  |          1   1.84e-14  5.4e+13   0.000            1           1
          7  |   -.585524   .2517657    -2.33   0.022    -1.085082   -.0859662
        (snip)....
        100  |  -.8359366   .0996573    -8.39   0.000    -1.033678   -.6381949
             |
       _cons |    .481905   .3147072     1.53   0.129    -.1425423    1.106352
------------------------------------------------------------------------------

现在我将聚集在村庄上,这会使它们膨胀一些,正如预期的那样,但仍然可以:

. reg PHARVIS ILLDAYS i.commune, cluster(commune)

Linear regression                                      Number of obs =   27765
                                                       F(  0,   193) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.1814
                                                       Root MSE      =  1.1925

                              (Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ILLDAYS |   .0840634   .0056375    14.91   0.000     .0729444    .0951823
             |
     commune |
          2  |  -.1885549    .012027   -15.68   0.000    -.2122761   -.1648337
        (snip) ....
        191  |   .4646775   .0014571   318.91   0.000     .4618037    .4675514
        192  |  -.0020317   .0065782    -0.31   0.758    -.0150061    .0109427
        193  |  -.2444578   .0115522   -21.16   0.000    -.2672426   -.2216731
        194  |   .1917803   .0002288   838.33   0.000     .1913291    .1922315
             |
       _cons |   .4371527   .0200739    21.78   0.000     .3975602    .4767452
------------------------------------------------------------------------------

如果我放弃所有其他回归量并估计像 Stas 建议的东西,我会在公社假人上得到零标准误差:

. reg PHARVIS i.commune, cluster(commune)

Linear regression                                      Number of obs =   27765
                                                       F(  0,   193) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.0656
                                                       Root MSE      =   1.274

                              (Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
             |               Robust
     PHARVIS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     commune |
          2  |  -.0092138   1.72e-14 -5.4e+11   0.000    -.0092138   -.0092138
          3  |  -.2910319   1.72e-14 -1.7e+13   0.000    -.2910319   -.2910319
          4  |  -.3957457   1.72e-14 -2.3e+13   0.000    -.3957457   -.3957457
          5  |  -.4244865   1.72e-14 -2.5e+13   0.000    -.4244865   -.4244865
        (snip) ....
        191  |   .4864051   1.72e-14  2.8e+13   0.000     .4864051    .4864051
        192  |  -.1001229   1.72e-14 -5.8e+12   0.000    -.1001229   -.1001229
        193  |   -.416719   1.72e-14 -2.4e+13   0.000     -.416719    -.416719
        194  |    .188369   1.72e-14  1.1e+13   0.000      .188369     .188369
             |
       _cons |   .7364865   1.72e-14  4.3e+13   0.000     .7364865    .7364865
------------------------------------------------------------------------------