我有很多医生手术成功的数据。我使用Stata估计了一个回归,对个别医生有修复效果。我首先使用强大的选项运行回归。个别医生估计的结果 t 值在 2.17 到 6.14 之间。然后我使用 vce(cluster doctor) 选项重新运行它。我预计标准误差会变大。但是,我确实得到了更小的标准。错误——小得多,例如 1.04e-14。这太好了,令人难以置信。这是为什么?有什么可能的原因吗?
难以置信的小标准误
机器算法验证
回归
状态
标准错误
聚集标准错误
2022-04-10 19:49:19
2个回答
您使用根本无法协同工作的方法两次过度校正了个别医生的效果。
如果你的模型是regress outcome i.doctor, vce(cluster doctor),那么 Stata 应该抱怨你已经用尽了你的自由度。xtreg可能不那么聪明,并且可能错过对固定效应的完美确定。这些1e-14标准误差应该完全为零,并且由于在固定效应估计的内部某处四舍五入,它们在实践中是非零的。这里发生的是这样的:
cluster方差估计通过对集群的集群贡献求和来工作。然而,- 通过将医生指定为固定效应,您可以强制给定医生的残差总和为 0。
regress知道如何在代数水平上确定这一点。xtreg但是,可能对计算线性代数知之甚少,无法做到这一点,只是简单地将(数值)零贡献相加,以产生您在此处看到的难以置信的小标准误差。
如果我了解您的问题,当集群内相关性为负时,可能会发生这种情况。直观地查看治疗师版本的 Stata常见问题解答。
编辑:
我认为 Stas 关于更深层次的问题是正确的。我太仓促了。这是我尝试使用 27,766 名越南村民的药房访问数据集来复制这一点,这些村民嵌套在 194 个村庄的 5,740 个家庭中(数据来自 Cameron 和 Trivedi)。我找不到聚集错误较小的公共数据集,但我认为这说明了要点。我会将药房访问视为连续的,尽管它们显然不是。
首先,我们设置数据:
. use "http://cameron.econ.ucdavis.edu/mmabook/vietnam_ex2.dta", clear
. egen hh=group(lnhhinc)
(1 missing value generated)
. bys hh: gen person = _n
. xtset hh person
panel variable: hh (unbalanced)
time variable: person, 1 to 19
delta: 1 unit
. xtdes
hh: 1, 2, ..., 5740 n = 5740
person: 1, 2, ..., 19 T = 19
Delta(person) = 1 unit
Span(person) = 19 periods
(hh*person uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 2 4 5 6 8 19
(snip)
现在对于病假访问的 FE 回归:
. xtreg PHARVIS ILLDAYS, fe
Fixed-effects (within) regression Number of obs = 27765
Group variable: hh Number of groups = 5740
R-sq: within = 0.1145 Obs per group: min = 1
between = 0.1390 avg = 4.8
overall = 0.1257 max = 19
F(1,22024) = 2848.23
corr(u_i, Xb) = 0.0465 Prob > F = 0.0000
------------------------------------------------------------------------------
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0788618 .0014777 53.37 0.000 .0759654 .0817581
_cons | .2906284 .0077221 37.64 0.000 .2754925 .3057643
-------------+----------------------------------------------------------------
sigma_u | .85814688
sigma_e | 1.085808
rho | .38447214 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(5739, 22024) = 2.35 Prob > F = 0.0000
面板变量上的聚类会增加错误:
. xtreg PHARVIS ILLDAYS, fe vce(cluster hh)
Fixed-effects (within) regression Number of obs = 27765
Group variable: hh Number of groups = 5740
R-sq: within = 0.1145 Obs per group: min = 1
between = 0.1390 avg = 4.8
overall = 0.1257 max = 19
F(1,5739) = 464.54
corr(u_i, Xb) = 0.0465 Prob > F = 0.0000
(Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0788618 .0036589 21.55 0.000 .0716889 .0860346
_cons | .2906284 .0102597 28.33 0.000 .2705154 .3107413
-------------+----------------------------------------------------------------
sigma_u | .85814688
sigma_e | 1.085808
rho | .38447214 (fraction of variance due to u_i)
------------------------------------------------------------------------------
现在我尝试这种非面板方法。我正在使用areg,因为 Stata 不允许我放入 ~6K 假人。
. areg PHARVIS ILLDAYS, absorb(hh) vce(cluster hh)
Linear regression, absorbing indicators Number of obs = 27765
F( 1, 5739) = 368.52
Prob > F = 0.0000
R-squared = 0.4579
Adj R-squared = 0.3166
Root MSE = 1.0858
(Std. Err. adjusted for 5740 clusters in hh)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0788618 .0041081 19.20 0.000 .0708084 .0869151
_cons | .2906284 .0115192 25.23 0.000 .2680464 .3132103
-------------+----------------------------------------------------------------
hh | absorbed (5740 categories)
不幸的是,areg这掩盖了您感兴趣的内容。如果您使用regress并限制样本以使 HH 的数量合理,那么您将得到只有 1 个村民的集群的微小标准误差。这是有道理的,因为此类观察的残差将完全为零。这是一个例子:
. reg PHARVIS ILLDAYS i.hh if inrange(hh,1,100), cluster(hh)
Linear regression Number of obs = 219
F( 0, 99) = .
Prob > F = .
R-squared = 0.6473
Root MSE = .88177
(Std. Err. adjusted for 100 clusters in hh)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0518095 .0314707 1.65 0.103 -.0106352 .1142542
|
hh |
2 | -1 1.84e-14 -5.4e+13 0.000 -1 -1
3 | .2590475 .1573536 1.65 0.103 -.0531762 .5712712
4 | .4662855 .2832365 1.65 0.103 -.0957171 1.028288
5 | 2.129524 .0786768 27.07 0.000 1.973412 2.285636
6 | 1 1.84e-14 5.4e+13 0.000 1 1
7 | -.585524 .2517657 -2.33 0.022 -1.085082 -.0859662
(snip)....
100 | -.8359366 .0996573 -8.39 0.000 -1.033678 -.6381949
|
_cons | .481905 .3147072 1.53 0.129 -.1425423 1.106352
------------------------------------------------------------------------------
现在我将聚集在村庄上,这会使它们膨胀一些,正如预期的那样,但仍然可以:
. reg PHARVIS ILLDAYS i.commune, cluster(commune)
Linear regression Number of obs = 27765
F( 0, 193) = .
Prob > F = .
R-squared = 0.1814
Root MSE = 1.1925
(Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ILLDAYS | .0840634 .0056375 14.91 0.000 .0729444 .0951823
|
commune |
2 | -.1885549 .012027 -15.68 0.000 -.2122761 -.1648337
(snip) ....
191 | .4646775 .0014571 318.91 0.000 .4618037 .4675514
192 | -.0020317 .0065782 -0.31 0.758 -.0150061 .0109427
193 | -.2444578 .0115522 -21.16 0.000 -.2672426 -.2216731
194 | .1917803 .0002288 838.33 0.000 .1913291 .1922315
|
_cons | .4371527 .0200739 21.78 0.000 .3975602 .4767452
------------------------------------------------------------------------------
如果我放弃所有其他回归量并估计像 Stas 建议的东西,我会在公社假人上得到零标准误差:
. reg PHARVIS i.commune, cluster(commune)
Linear regression Number of obs = 27765
F( 0, 193) = .
Prob > F = .
R-squared = 0.0656
Root MSE = 1.274
(Std. Err. adjusted for 194 clusters in commune)
------------------------------------------------------------------------------
| Robust
PHARVIS | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
commune |
2 | -.0092138 1.72e-14 -5.4e+11 0.000 -.0092138 -.0092138
3 | -.2910319 1.72e-14 -1.7e+13 0.000 -.2910319 -.2910319
4 | -.3957457 1.72e-14 -2.3e+13 0.000 -.3957457 -.3957457
5 | -.4244865 1.72e-14 -2.5e+13 0.000 -.4244865 -.4244865
(snip) ....
191 | .4864051 1.72e-14 2.8e+13 0.000 .4864051 .4864051
192 | -.1001229 1.72e-14 -5.8e+12 0.000 -.1001229 -.1001229
193 | -.416719 1.72e-14 -2.4e+13 0.000 -.416719 -.416719
194 | .188369 1.72e-14 1.1e+13 0.000 .188369 .188369
|
_cons | .7364865 1.72e-14 4.3e+13 0.000 .7364865 .7364865
------------------------------------------------------------------------------