Theory¶
See the paper for a full treatment of the theory.
Setup¶
We consider the case of \(\mathcal{T}\) periods. Let \(D_{t} \in \{0, 1\}\) denote treatment status and \(Z_{t} \in \{0, 1\}\) the instrument status. Moreover, let \(D = (D_{1}, D_{2}, \ldots, D_{\mathcal{T}})\) and \(Z = (Z_{1}, Z_{2}, \ldots, Z_{\mathcal{T}})\) denote the treatment and instrument paths.
We assume staggered adoption of the instrument: \(Z_{1} = 0\), and for \(t = 2, \ldots, \mathcal{T}\):
i.e. no units are exposed to the instrument in the first period and all units that are exposed in some period stay exposed.
This implies that the time period where the instrument switches on characterizes the instrument path \(Z\) completely. Because of this we define the cohort exposure variable \(E := \min \{t \mid Z_{t} = 1\}\) and \(E_{e} := \mathbf{1}\{E = e\}\).
[!NOTE]
The staggered adoption of the instrument is analogous to the staggered adoption of the treatment variable \(D_t\) in DiD, for instance in Callaway and Sant’Anna (2021) (CSA). Likewise, the cohort variables are are analogous to \(G := \min \{t \mid D_{t} = 1\}\) and \(G_{g} := \mathbf{1}\{G = g\}\) of CSA.
Causal estimand¶
The main causal estimand is the cohort-specific time-varying local average treatment effect on the treated:
The paper develops the theory for two types of controls: never-exposed and not-yet-exposed, analogous to the never treated and not-yet-treated of CSA:
Estimable estimands¶
Panel Data¶
Under the identifying assumptions of the paper, a doubly robust panel-data estimand for the \(LATT(e,t)\) parameter is:
This is essentially a ratio of two \(ATT_{dr}(g, t; 0)\) estimands of CSA with the treatment variable replaced with the exposure variable \(E_{e}\) and the denominator having outcome the treatment variable \(D_{t}\).
Repeated cross-sections¶
Likewise, a doubly robust repeated cross-section estimand for \(LATT(e,t)\) is
See the paper for the definitions. Again, this is essentially a ratio of two \(ATT_{dr,rc}(g, t; 0)\) estimands of CSA with the treatment variable replaced with the exposure variable \(E_{e}\) and the denominator having outcome the treatment variable \(D\).
Estimators¶
The estimators of the double robust estimands are plug-in estimators of
the doubly robust estimands above. The main public entry point is
idid.estimate(). Worked examples are given in the
Quickstart page.
Panel data¶
Doubly robust (method="dr")¶
For panel data, the doubly robust estimator plugs in estimators of the nuisance functions in \(\tau^{dr,p}_{e,t}\). Operationally, the package computes:
a numerator estimator for the outcome change
a denominator estimator for the treatment change
the ratio of the two
and repeats for all pairs \((e, t)\), \(e \in \mathcal{E}\), \(t \geq e\).
This corresponds to:
1import idid
2
3
4res = idid.estimate(
5 idid.sim_stag_panel(n=10_000, T=6, E_cohorts = [0, 2, 3, 4, 5]),
6 cohort="E",
7 time="t",
8 outcome="Y_t",
9 treatment="D_t",
10 unit="id",
11 covariates=["X"],
12 control="never",
13 method="dr",
14 balanced=True,
15 verbose=False,
16)
17res.summary()
Cohort-Time Local Average Treatment Effects on the Treated:
E t AET(e, t) LATT(e, t) Std. Error [95% Pointwise. Conf. Band]
2 2 0.2531 0.9366 0.2070 0.5309 1.3422 *
2 3 0.2346 1.2491 0.2239 0.8103 1.6879 *
2 4 0.2083 1.2095 0.2473 0.7248 1.6942 *
2 5 0.2050 1.2481 0.2618 0.7350 1.7612 *
2 6 0.2102 1.3714 0.2497 0.8819 1.8609 *
3 3 0.2200 1.2780 0.2443 0.7991 1.7568 *
3 4 0.2354 1.0497 0.2242 0.6102 1.4891 *
3 5 0.1912 1.2113 0.2774 0.6675 1.7550 *
3 6 0.2358 1.0995 0.2244 0.6597 1.5394 *
4 4 0.2221 0.8343 0.2356 0.3726 1.2960 *
4 5 0.2139 1.0805 0.2467 0.5969 1.5640 *
4 6 0.2509 1.0037 0.2120 0.5881 1.4193 *
5 5 0.1992 1.0818 0.2718 0.5491 1.6144 *
5 6 0.2543 1.1341 0.2112 0.7201 1.5481 *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust
The panel DR estimator supports custom nuisance choices through
num_kwargs and den_kwargs; see the idid.estimate() API.
Double machine learning (method="dml")¶
For panel data, the DML estimator targets the same \(LATT(e,t)\) parameter, but estimates the nuisance functions by cross-fitting user-supplied machine learning models.
This corresponds to:
1res = idid.estimate(
2 data,
3 cohort="E",
4 time="t",
5 outcome="Y_t",
6 treatment="D_t",
7 unit="id",
8 covariates=["X"],
9 control="never",
10 method="dml",
11 dml_kwargs={
12 "nfolds": 5,
13 "m_m": ...,
14 "g_m": ...,
15 "p_m": ...,
16 },
17 balanced=True,
18)
See the panel DML examples in Quickstart.
Repeated cross-sections¶
Doubly robust (method="dr")¶
For repeated cross-sections, the doubly robust estimator plugs in nuisance estimators in \(\tau^{dr,rc}_{e,t}\) and again forms a ratio between the outcome and treatment components.
This corresponds to:
1res = idid.estimate(
2 idid.sim_stag_rc(n=20_000, T=6, E_cohorts = [0, 2, 3, 4, 5]),
3 cohort="E",
4 time="t",
5 outcome="Y_t",
6 treatment="D_t",
7 unit="id",
8 covariates=["X"],
9 control="never",
10 method="dr",
11 balanced=False,
12 verbose=False,
13)
14res.summary()
Cohort-Time Local Average Treatment Effects on the Treated:
E t AET(e, t) LATT(e, t) Std. Error [95% Pointwise. Conf. Band]
2 2 0.2622 1.4060 0.4561 0.5121 2.3000 *
2 3 0.2804 1.0196 0.4091 0.2177 1.8215 *
2 4 0.2289 1.0392 0.5145 0.0308 2.0477 *
2 5 0.3032 0.8324 0.3873 0.0733 1.5914 *
2 6 0.2528 0.5638 0.4717 -0.3608 1.4884
3 3 0.2025 0.1605 0.6104 -1.0358 1.3568
3 4 0.1496 -0.8528 0.9244 -2.6646 0.9590
3 5 0.2432 -0.2704 0.5246 -1.2986 0.7579
3 6 0.2378 0.0119 0.5337 -1.0342 1.0579
4 4 0.1907 1.3235 0.6397 0.0696 2.5773 *
4 5 0.2415 0.3751 0.4976 -0.6003 1.3505
4 6 0.2423 0.8244 0.4979 -0.1514 1.8003
5 5 0.2248 0.5506 0.5454 -0.5185 1.6197
5 6 0.2449 1.0097 0.5099 0.0102 2.0091 *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust
The repeated-cross-section DR estimator uses the same
idid.estimate() interface, with balanced=False.
Double machine learning (method="dml")¶
For repeated cross-sections, the DML estimator cross-fits nuisance models for the outcome, treatment, and exposure propensity components in the repeated cross-section score.
This corresponds to:
1res = idid.estimate(
2 data,
3 cohort="E",
4 time="t",
5 outcome="Y_t",
6 treatment="D_t",
7 unit="id",
8 covariates=["X"],
9 control="never",
10 method="dml",
11 dml_kwargs={
12 "nfolds": 5,
13 "m_m": ...,
14 "g_m": ...,
15 "p_m": ...,
16 },
17 balanced=False,
18)
The DML kwargs are documented in the idid.estimate() API.