Quickstart¶

This page shows how to use the package on a simulated example. We first generate panel data, inspect its basic evolution, then estimate the cohort-time \(LATT(e, t)\) parameters, and finally aggregate them into event-study effects. We also show how to specify a DML estimator in DML. The repeated cross-sections case has the exact same syntax; only the simulated data differs.

Installation¶

Install the package from PyPI with:

uv pip install idid-py

or from Github

uv pip install git+https://github.com/jsr-p/idid

or install the local development version with:

git clone https://github.com/jsr-p/idid && cd idid
uv sync

Panel data¶

Data¶

import numpy as np
import polars as pl

import idid

np.random.seed(40)
n = 10_000
E_cohorts = [0, 2, 3, 4, 5]
T = max(E_cohorts)

data = idid.sim_stag_panel(n=n, T=T, E_cohorts=E_cohorts)

with pl.Config(tbl_rows=10, tbl_cols=10):
    print(data.head(10))

shape: (10, 6)
┌─────┬─────┬─────┬───────────┬─────┬───────────┐
│ id  ┆ E   ┆ t   ┆ X         ┆ D_t ┆ Y_t       │
│ --- ┆ --- ┆ --- ┆ ---       ┆ --- ┆ ---       │
│ i64 ┆ i64 ┆ i64 ┆ f64       ┆ i64 ┆ f64       │
╞═════╪═════╪═════╪═══════════╪═════╪═══════════╡
│ 0   ┆ 0   ┆ 1   ┆ -0.607548 ┆ 0   ┆ -0.682584 │
│ 0   ┆ 0   ┆ 2   ┆ -0.607548 ┆ 0   ┆ 0.089033  │
│ 0   ┆ 0   ┆ 3   ┆ -0.607548 ┆ 1   ┆ 3.061392  │
│ 0   ┆ 0   ┆ 4   ┆ -0.607548 ┆ 0   ┆ 0.186256  │
│ 0   ┆ 0   ┆ 5   ┆ -0.607548 ┆ 0   ┆ -0.989965 │
│ 1   ┆ 0   ┆ 1   ┆ -0.126136 ┆ 0   ┆ -0.09219  │
│ 1   ┆ 0   ┆ 2   ┆ -0.126136 ┆ 0   ┆ -0.620078 │
│ 1   ┆ 0   ┆ 3   ┆ -0.126136 ┆ 0   ┆ -1.533602 │
│ 1   ┆ 0   ┆ 4   ┆ -0.126136 ┆ 0   ┆ -0.117459 │
│ 1   ┆ 0   ┆ 5   ┆ -0.126136 ┆ 0   ┆ 1.390381  │
└─────┴─────┴─────┴───────────┴─────┴───────────┘

Plotting the evolution of the treatment and outcome gives:

from idid.plotting import plot_evolution, summarize_evolution

fig, ax = plot_evolution(
    summarize_evolution(data),
    include_bands=True,
)
fig

The true simulated effects are \(LATT(e, t) = 1\) for all \(e \geq t\).

Estimating all LATT(e, t)s¶

res = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dr",
    balanced=True,
    verbose=False,
)
print(res)

IDidResult(n=10000, dp=IDidParams(e_col='E', t_col='t', d_col='D_t', y_col='Y_t'), periods=Periods(...), estimates=DataFrame[10x9; E, t, latt, se, ...], IFs=ndarray(10000, 10), IFs_aet=ndarray(10000, 10))

We can inspect the estimates as a DataFrame:

with pl.Config(tbl_rows=10, tbl_cols=10):
  print(res.estimates)

shape: (10, 9)
┌─────┬─────┬──────────┬──────────┬──────────┬──────────┬──────┬──────────┬──────────┐
│ E   ┆ t   ┆ latt     ┆ se       ┆ num      ┆ denom    ┆ ns   ┆ lower    ┆ upper    │
│ --- ┆ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---  ┆ ---      ┆ ---      │
│ i64 ┆ i64 ┆ f64      ┆ f64      ┆ f64      ┆ f64      ┆ i64  ┆ f64      ┆ f64      │
╞═════╪═════╪══════════╪══════════╪══════════╪══════════╪══════╪══════════╪══════════╡
│ 2   ┆ 2   ┆ 1.115836 ┆ 0.230915 ┆ 0.258145 ┆ 0.231346 ┆ 4022 ┆ 0.663243 ┆ 1.568428 │
│ 2   ┆ 3   ┆ 1.225298 ┆ 0.244773 ┆ 0.268993 ┆ 0.219533 ┆ 4022 ┆ 0.745543 ┆ 1.705053 │
│ 2   ┆ 4   ┆ 0.966331 ┆ 0.216551 ┆ 0.237252 ┆ 0.245519 ┆ 4022 ┆ 0.541892 ┆ 1.39077  │
│ 2   ┆ 5   ┆ 0.86206  ┆ 0.24246  ┆ 0.188788 ┆ 0.218997 ┆ 4022 ┆ 0.386838 ┆ 1.337281 │
│ 3   ┆ 3   ┆ 0.91629  ┆ 0.234961 ┆ 0.20478  ┆ 0.223488 ┆ 4046 ┆ 0.455767 ┆ 1.376814 │
│ 3   ┆ 4   ┆ 1.062971 ┆ 0.258129 ┆ 0.21958  ┆ 0.206572 ┆ 4046 ┆ 0.557037 ┆ 1.568904 │
│ 3   ┆ 5   ┆ 0.535162 ┆ 0.251652 ┆ 0.114376 ┆ 0.213722 ┆ 4046 ┆ 0.041925 ┆ 1.028399 │
│ 4   ┆ 4   ┆ 1.288046 ┆ 0.248529 ┆ 0.27902  ┆ 0.216622 ┆ 3976 ┆ 0.800929 ┆ 1.775164 │
│ 4   ┆ 5   ┆ 0.749072 ┆ 0.248445 ┆ 0.16084  ┆ 0.214718 ┆ 3976 ┆ 0.26212  ┆ 1.236023 │
│ 5   ┆ 5   ┆ 0.461413 ┆ 0.223437 ┆ 0.114358 ┆ 0.247844 ┆ 3995 ┆ 0.023476 ┆ 0.899349 │
└─────┴─────┴──────────┴──────────┴──────────┴──────────┴──────┴──────────┴──────────┘

or print out a summary:

res.summary()

Cohort-Time Local Average Treatment Effects on the Treated:
 E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
 2   2       0.2313       1.1158       0.2309            0.6632        1.5684  *
 2   3       0.2195       1.2253       0.2448            0.7455        1.7051  *
 2   4       0.2455       0.9663       0.2166            0.5419        1.3908  *
 2   5       0.2190       0.8621       0.2425            0.3868        1.3373  *
 3   3       0.2235       0.9163       0.2350            0.4558        1.3768  *
 3   4       0.2066       1.0630       0.2581            0.5570        1.5689  *
 3   5       0.2137       0.5352       0.2517            0.0419        1.0284  *
 4   4       0.2166       1.2880       0.2485            0.8009        1.7752  *
 4   5       0.2147       0.7491       0.2484            0.2621        1.2360  *
 5   5       0.2478       0.4614       0.2234            0.0235        0.8993  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

The IDidResult result object is documented in IDidResult.

Aggregated Effects¶

agg = idid.agg_latt(res, method="dynamic")
print(agg)

AggLattResult(method='dynamic', overall_latt=0.893323526028491, overall_se=0.1249504706156244, estimates=DataFrame[4x5; l, latt, se, lower, ...], IF=ndarray(10000, 4), IF_overall=ndarray(10000, 1))

These correspond to estimates of the \(\{\theta^{IV}_{es}(l) : l \in \{0,1,\ldots,h\}\}\) parameters of the paper.

Again, we can inspect the estimates as a DataFrame:

print(agg.estimates)

shape: (4, 5)
┌─────┬──────────┬──────────┬──────────┬──────────┐
│ l   ┆ latt     ┆ se       ┆ lower    ┆ upper    │
│ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ i64 ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════╪══════════╪══════════╪══════════╪══════════╡
│ 0   ┆ 0.931205 ┆ 0.091954 ┆ 0.750976 ┆ 1.111434 │
│ 1   ┆ 1.015631 ┆ 0.13086  ┆ 0.759145 ┆ 1.272117 │
│ 2   ┆ 0.764398 ┆ 0.162559 ┆ 0.445783 ┆ 1.083013 │
│ 3   ┆ 0.86206  ┆ 0.24246  ┆ 0.386838 ┆ 1.337281 │
└─────┴──────────┴──────────┴──────────┴──────────┘

or print out a summary:

agg.summary()

Overall summary of ATT's based on event-study/dynamic aggregation:
  LATT    Std. Error    [95%      Conf. Band]
0.8933        0.1250    0.6484         1.1382  *

Dynamic effects:
  Event time    Estimate    Std. Error    [95% Pointwise    Conf. Band]
           0      0.9312        0.0920            0.7510         1.1114  *
           1      1.0156        0.1309            0.7591         1.2721  *
           2      0.7644        0.1626            0.4458         1.0830  *
           3      0.8621        0.2425            0.3868         1.3373  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

The AggLattResult result object is documented in AggLattResult. See also extended guide in Aggregation.

Multiplier Bootstrap and Simultaneous Confidence Bands¶

We can conduct multiplier bootstrap to obtain simultaneous confidence bands on all the event study parameters:

agg_b = idid.agg_latt(res, method="dynamic", boot=True)
agg_b.summary()

Overall summary of ATT's based on event-study/dynamic aggregation:
  LATT    Std. Error    [95% Simult.    Conf. Band]
0.8933        0.1246          0.6448         1.1419  *

Dynamic effects:
  Event time    Estimate    Std. Error    [95% Simult.    Conf. Band]
           0      0.9312        0.0926          0.7036         1.1588  *
           1      1.0156        0.1302          0.6956         1.3356  *
           2      0.7644        0.1640          0.3614         1.1674  *
           3      0.8621        0.2337          0.2876         1.4366  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust
Multiplier bootstrap: B=1000, c=2.4580, overall c=1.9942

Comparing the two in a figure:

from idid.plotting import plot_agg
from matplotlib import pyplot as plt

fig, ax = plot_agg(
    [agg, agg_b],
    labels=[
        "Dynamic",
        "Dynamic (Simultaneous)",
    ],
)

DML¶

Using linear regression for the outcome nuisance model and FastLogit for the treatment nuisance models:

from idid.nuisance_estimators import OLS, FastLogit


res_dml = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dml",
    dml_kwargs={
        "nfolds": 5,
        "m_m": OLS(),
        "g_m": FastLogit(),
        "p_m": FastLogit(),
    },
    balanced=True,
    verbose=False,
)

res_dml.summary()

Cohort-Time Local Average Treatment Effects on the Treated:
 E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
 2   2       0.2311       1.1174       0.2313            0.6641        1.5707  *
 2   3       0.2190       1.2323       0.2456            0.7510        1.7137  *
 2   4       0.2453       0.9707       0.2169            0.5455        1.3959  *
 2   5       0.2187       0.8720       0.2431            0.3955        1.3485  *
 3   3       0.2237       0.9163       0.2348            0.4561        1.3766  *
 3   4       0.2073       1.0554       0.2573            0.5510        1.5598  *
 3   5       0.2139       0.5332       0.2517            0.0399        1.0265  *
 4   4       0.2166       1.2848       0.2487            0.7973        1.7723  *
 4   5       0.2141       0.7480       0.2495            0.2590        1.2370  *
 5   5       0.2474       0.4659       0.2236            0.0276        0.9041  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Double Machine Learning
DML nuisance models: m_m=OLS, g_m=FastLogit, p_m=FastLogit
DML cross-fitting folds: 5

FastLogit comes from fastlr.

Using sklearn objects for the nuisance models:

from sklearn.linear_model import LinearRegression, LogisticRegression


res_dml_sk = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dml",
    dml_kwargs={
        "nfolds": 5,
        "m_m": LinearRegression(),
        "g_m": LogisticRegression(),
        "p_m": LogisticRegression(),
    },
    balanced=True,
    verbose=False,
)

res_dml_sk.summary()

Cohort-Time Local Average Treatment Effects on the Treated:
 E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
 2   2       0.2311       1.1174       0.2313            0.6641        1.5707  *
 2   3       0.2190       1.2323       0.2456            0.7510        1.7137  *
 2   4       0.2453       0.9707       0.2169            0.5455        1.3959  *
 2   5       0.2187       0.8720       0.2431            0.3955        1.3485  *
 3   3       0.2237       0.9163       0.2348            0.4561        1.3765  *
 3   4       0.2072       1.0554       0.2573            0.5510        1.5598  *
 3   5       0.2139       0.5332       0.2517            0.0399        1.0265  *
 4   4       0.2166       1.2848       0.2487            0.7974        1.7723  *
 4   5       0.2141       0.7480       0.2495            0.2590        1.2370  *
 5   5       0.2474       0.4659       0.2236            0.0277        0.9041  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Double Machine Learning
DML nuisance models: m_m=LinearRegression, g_m=LogisticRegression, p_m=LogisticRegression
DML cross-fitting folds: 5

The dml_kwargs argument is documented in DMLKwargs.

Repeated cross-sections¶

The repeated cross-section case is handled analogously to the panel case; the only difference is setting balanced=False in the estimate function.

import numpy as np
import polars as pl

import idid

np.random.seed(40)
n = 10_000
E_cohorts = [0, 2, 3, 4, 5]
T = max(E_cohorts)

data = idid.sim_stag_rc(n=n, T=T, E_cohorts=E_cohorts)

with pl.Config(tbl_rows=10, tbl_cols=10):
    print(data.head(10))

shape: (10, 6)
┌─────┬─────┬─────┬───────────┬─────┬───────────┐
│ id  ┆ E   ┆ t   ┆ X         ┆ D_t ┆ Y_t       │
│ --- ┆ --- ┆ --- ┆ ---       ┆ --- ┆ ---       │
│ u32 ┆ i64 ┆ i64 ┆ f64       ┆ i64 ┆ f64       │
╞═════╪═════╪═════╪═══════════╪═════╪═══════════╡
│ 1   ┆ 5   ┆ 1   ┆ -0.607548 ┆ 0   ┆ 0.01554   │
│ 2   ┆ 3   ┆ 2   ┆ 0.545059  ┆ 1   ┆ 0.92165   │
│ 3   ┆ 5   ┆ 3   ┆ 0.188836  ┆ 0   ┆ 1.154266  │
│ 4   ┆ 0   ┆ 4   ┆ -0.982146 ┆ 0   ┆ 0.417061  │
│ 5   ┆ 5   ┆ 5   ┆ 0.639967  ┆ 1   ┆ 3.541751  │
│ 6   ┆ 3   ┆ 1   ┆ 0.842739  ┆ 0   ┆ 2.511837  │
│ 7   ┆ 2   ┆ 2   ┆ -0.604616 ┆ 1   ┆ 1.386531  │
│ 8   ┆ 0   ┆ 3   ┆ -0.770702 ┆ 0   ┆ -1.107861 │
│ 9   ┆ 4   ┆ 4   ┆ 1.446865  ┆ 1   ┆ 4.404883  │
│ 10  ┆ 5   ┆ 5   ┆ 1.991825  ┆ 1   ┆ 2.465375  │
└─────┴─────┴─────┴───────────┴─────┴───────────┘

res = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dr",
    balanced=False,
    verbose=False,
)
res.summary()

Cohort-Time Local Average Treatment Effects on the Treated:
 E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
 2   2       0.2438       2.2645       0.6899            0.9124        3.6166  *
 2   3       0.1709       2.6230       0.9937            0.6753        4.5707  *
 2   4       0.1572       2.3952       1.0712            0.2957        4.4947  *
 2   5       0.2008       2.0019       0.8194            0.3959        3.6079  *
 3   3       0.1715       1.0047       0.9070           -0.7730        2.7824
 3   4       0.1882       1.2923       0.8183           -0.3116        2.8962
 3   5       0.2526       1.9867       0.6433            0.7259        3.2475  *
 4   4       0.2911       0.9887       0.5361           -0.0621        2.0394
 4   5       0.3185       1.0436       0.4999            0.0638        2.0234  *
 5   5       0.2761       1.7817       0.5973            0.6109        2.9525  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

agg = idid.agg_latt(res, method="dynamic")
print(agg.estimates)

shape: (4, 5)
┌─────┬──────────┬──────────┬──────────┬──────────┐
│ l   ┆ latt     ┆ se       ┆ lower    ┆ upper    │
│ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
│ i64 ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
╞═════╪══════════╪══════════╪══════════╪══════════╡
│ 0   ┆ 1.538493 ┆ 0.262079 ┆ 1.024819 ┆ 2.052167 │
│ 1   ┆ 1.521017 ┆ 0.369766 ┆ 0.796275 ┆ 2.245759 │
│ 2   ┆ 2.144508 ┆ 0.568804 ┆ 1.029653 ┆ 3.259363 │
│ 3   ┆ 2.001893 ┆ 0.819383 ┆ 0.395903 ┆ 3.607884 │
└─────┴──────────┴──────────┴──────────┴──────────┘