Fuzzy RD 设计可以概念化为局部IV 模型(即,随着观察值远离截止值,权重下降的工具变量回归)。您需要使用虚拟变量来检测处理过的指标,以使其高于截止值,同时控制运行变量以及高于截止值的虚拟变量与 Z 的。这可以在第 2 版“成人”伍尔德里奇的第 958 页上找到。您没有权重,并且您在两个模型中缺少这些交互。ZZ
这是Stata中的一个模拟,证明了这种等价性。我们首先安装两个 RD 命令并制作一些假数据:
. clear
. /* install two commands that do fuzzy RD */
. capture net install rdrobust, from(http://www-personal.umich.edu/~cattaneo/rdrobust/stata) replace
. capture ssc install rd
. /* Generate Fake Data and Weights */
. mat c=(1,.5\.5,1)
. set seed 10011979
. drawnorm e pretest, n(1000) corr(c) clear
(obs 1000)
. gen z=pretest-0 // z is running variable
. gen above=z>0 // above is above-the-cutoff indicator
. gen treated=cond(uniform()<.8,above,1-above) // treated indicator
. gen y=z-z^3+treated+e // define outcome y
. gen w=max(0,1-abs(z)) // define triangle kernel weight
这是IV估计。请注意如何使用因子变量表示法即时进行交互。我没有在我的模型中使用的幂,只是一个简单的线性项:Z
. /* IV Version */
. ivregress 2sls y (treated=i.above) z c.z#i.above [pw=w]
(sum of wgt is 3.8525e+02)
Instrumental variables (2SLS) regression Number of obs = 703
Wald chi2(3) = 464.46
Prob > chi2 = 0.0000
R-squared = 0.3792
Root MSE = .9368
------------------------------------------------------------------------------
| Robust
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treated | 1.641632 .3639524 4.51 0.000 .9282981 2.354965
z | .9815473 .2558127 3.84 0.000 .4801637 1.482931
|
above#c.z |
1 | -.4895004 .3488331 -1.40 0.161 -1.173201 .1942
|
_cons | -.2436488 .1800297 -1.35 0.176 -.5965006 .109203
------------------------------------------------------------------------------
Instrumented: treated
Instruments: z 1.above#c.z 1.above
有两个用户编写的命令可以估计模糊 RD 模型:
. /* FRD Versions */
. rd y treated z, bw(1) z0(0) kernel(triangle)
Three variables specified; jump in treatment
at Z=0 will be estimated. Local Wald Estimate
is the ratio of jump in outcome to jump in treatment.
Assignment variable Z is z
Treatment variable X_T is treated
Outcome variable y is y
Estimating for bandwidth 1
Estimating for bandwidth .5
Estimating for bandwidth 2
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
numer | .7219002 .1669301 4.32 0.000 .3947233 1.049077
denom | .4397455 .0724289 6.07 0.000 .2977876 .5817035
lwald | 1.641632 .3642115 4.51 0.000 .9277902 2.355473
numer50 | .6432259 .240981 2.67 0.008 .1709118 1.11554
denom50 | .3158332 .1027168 3.07 0.002 .1145121 .5171544
lwald50 | 2.0366 .7731264 2.63 0.008 .5212996 3.5519
numer200 | 1.518334 .1283546 11.83 0.000 1.266764 1.769905
denom200 | .4920938 .0529886 9.29 0.000 .3882381 .5959496
lwald200 | 3.085456 .3284832 9.39 0.000 2.441641 3.729272
------------------------------------------------------------------------------
第一个 lwald 系数是 FRD 治疗效果。这是另一个执行 FRD 的命令:
. rdrobust y z, fuzzy(treated) kernel(triangular) h(1) bwselect(IK)
Preparing data.
Computing variance-covariance matrix.
Computing RD estimates.
Estimation completed.
Sharp RD estimates using local polynomial regression.
Cutoff c = 0 | Left of c Right of c Number of obs = 1000
----------------------+---------------------- NN matches = 3
Number of obs | 373 330 BW type = Manual
Order loc. poly. (p) | 1 1 Kernel type = Triangular
Order bias (q) | 2 2
BW loc. poly. (h) | 1.000 1.000
BW bias (b) | 1.000 1.000
rho (h/b) | 1.000 1.000
Structural Estimates. Outcome: y. Running variable: z. Instrument: treated.
--------------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+---------------------------------------------------------------
Conventional | 1.6416 .36912 4.4474 0.000 .918162 2.3651
Robust | - - 2.9955 0.003 .584165 2.79549
--------------------------------------------------------------------------------------
First-Stage Estimates. Outcome: treated. Running variable: z.
--------------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+---------------------------------------------------------------
Conventional | .43975 .07148 6.1523 0.000 .299654 .579837
Robust | - - 2.7248 0.006 .083139 .509228
--------------------------------------------------------------------------------------
上面的常规系数是FRD治疗效果。FRD 估计值及其标准误均与 LWIV 相匹配。
现在问你第二个问题。由于我对文献不太熟悉,所以在这里我的立场可能更不稳定。我假设您想估计男性和女性的单一模型以获得对效果的单一估计。有两种选择可以实现这一点。一是估计两个模型并重新加权估计。谨慎的做法是不要按特定性别的总体样本量或经过处理的样本量加权。就个人而言,我喜欢使权重与每组不连续性的某个范围内的单位数成正比,以确保离截止点太远的观察结果在确定权重时无关紧要。您可以为此使用带宽。因为来自两个独立不连续点的估计是独立的,
另一种选择是按性别重新集中所有观察值,将它们汇集起来,然后对单个不连续点应用一个估计量,现在的运行变量是相对变量而不是绝对变量。得到的估计通过在每种情况下的不连续处的观察数量隐含地加权各种不连续估计。
我认为我更喜欢前一种方法,因为它允许带宽因性别而异,并且治疗效果中的任何异质性都会出现。