## Quickstart

This page shows how to use the package on a simulated example. We first
generate panel data, inspect its basic evolution, then estimate the
cohort-time $LATT(e, t)$ parameters, and finally aggregate them into
event-study effects. We also show how to specify a DML estimator in
[DML](#dml). The repeated cross-sections case has the exact same syntax;
only the simulated data differs.

### Installation

Install the package from PyPI with:

``` bash
uv pip install idid-py
```

or from Github

``` bash
uv pip install git+https://github.com/jsr-p/idid
```

or install the local development version with:

``` bash
git clone https://github.com/jsr-p/idid && cd idid
uv sync
```

### Panel data

#### Data

``` python
import numpy as np
import polars as pl

import idid

np.random.seed(40)
n = 10_000
E_cohorts = [0, 2, 3, 4, 5]
T = max(E_cohorts)

data = idid.sim_stag_panel(n=n, T=T, E_cohorts=E_cohorts)

with pl.Config(tbl_rows=10, tbl_cols=10):
    print(data.head(10))
```

    shape: (10, 6)
    ┌─────┬─────┬─────┬───────────┬─────┬───────────┐
    │ id  ┆ E   ┆ t   ┆ X         ┆ D_t ┆ Y_t       │
    │ --- ┆ --- ┆ --- ┆ ---       ┆ --- ┆ ---       │
    │ i64 ┆ i64 ┆ i64 ┆ f64       ┆ i64 ┆ f64       │
    ╞═════╪═════╪═════╪═══════════╪═════╪═══════════╡
    │ 0   ┆ 0   ┆ 1   ┆ -0.607548 ┆ 0   ┆ -0.682584 │
    │ 0   ┆ 0   ┆ 2   ┆ -0.607548 ┆ 0   ┆ 0.089033  │
    │ 0   ┆ 0   ┆ 3   ┆ -0.607548 ┆ 1   ┆ 3.061392  │
    │ 0   ┆ 0   ┆ 4   ┆ -0.607548 ┆ 0   ┆ 0.186256  │
    │ 0   ┆ 0   ┆ 5   ┆ -0.607548 ┆ 0   ┆ -0.989965 │
    │ 1   ┆ 0   ┆ 1   ┆ -0.126136 ┆ 0   ┆ -0.09219  │
    │ 1   ┆ 0   ┆ 2   ┆ -0.126136 ┆ 0   ┆ -0.620078 │
    │ 1   ┆ 0   ┆ 3   ┆ -0.126136 ┆ 0   ┆ -1.533602 │
    │ 1   ┆ 0   ┆ 4   ┆ -0.126136 ┆ 0   ┆ -0.117459 │
    │ 1   ┆ 0   ┆ 5   ┆ -0.126136 ┆ 0   ┆ 1.390381  │
    └─────┴─────┴─────┴───────────┴─────┴───────────┘

Plotting the evolution of the treatment and outcome gives:

``` python
from idid.plotting import plot_evolution, summarize_evolution

fig, ax = plot_evolution(
    summarize_evolution(data),
    include_bands=True,
)
fig
```

![](quickstart_files/figure-commonmark/cell-3-output-1.png)

The true simulated effects are $LATT(e, t) = 1$ for all $e \geq t$.

#### Estimating all LATT(e, t)s

``` python
res = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dr",
    balanced=True,
    verbose=False,
)
print(res)
```

    IDidResult(n=10000, dp=IDidParams(e_col='E', t_col='t', d_col='D_t', y_col='Y_t'), periods=Periods(...), estimates=DataFrame[10x9; E, t, latt, se, ...], IFs=ndarray(10000, 10), IFs_aet=ndarray(10000, 10))

We can inspect the estimates as a DataFrame:

``` python
with pl.Config(tbl_rows=10, tbl_cols=10):
  print(res.estimates)
```

    shape: (10, 9)
    ┌─────┬─────┬──────────┬──────────┬──────────┬──────────┬──────┬──────────┬──────────┐
    │ E   ┆ t   ┆ latt     ┆ se       ┆ num      ┆ denom    ┆ ns   ┆ lower    ┆ upper    │
    │ --- ┆ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---      ┆ ---  ┆ ---      ┆ ---      │
    │ i64 ┆ i64 ┆ f64      ┆ f64      ┆ f64      ┆ f64      ┆ i64  ┆ f64      ┆ f64      │
    ╞═════╪═════╪══════════╪══════════╪══════════╪══════════╪══════╪══════════╪══════════╡
    │ 2   ┆ 2   ┆ 1.115836 ┆ 0.230915 ┆ 0.258145 ┆ 0.231346 ┆ 4022 ┆ 0.663243 ┆ 1.568428 │
    │ 2   ┆ 3   ┆ 1.225298 ┆ 0.244773 ┆ 0.268993 ┆ 0.219533 ┆ 4022 ┆ 0.745543 ┆ 1.705053 │
    │ 2   ┆ 4   ┆ 0.966331 ┆ 0.216551 ┆ 0.237252 ┆ 0.245519 ┆ 4022 ┆ 0.541892 ┆ 1.39077  │
    │ 2   ┆ 5   ┆ 0.86206  ┆ 0.24246  ┆ 0.188788 ┆ 0.218997 ┆ 4022 ┆ 0.386838 ┆ 1.337281 │
    │ 3   ┆ 3   ┆ 0.91629  ┆ 0.234961 ┆ 0.20478  ┆ 0.223488 ┆ 4046 ┆ 0.455767 ┆ 1.376814 │
    │ 3   ┆ 4   ┆ 1.062971 ┆ 0.258129 ┆ 0.21958  ┆ 0.206572 ┆ 4046 ┆ 0.557037 ┆ 1.568904 │
    │ 3   ┆ 5   ┆ 0.535162 ┆ 0.251652 ┆ 0.114376 ┆ 0.213722 ┆ 4046 ┆ 0.041925 ┆ 1.028399 │
    │ 4   ┆ 4   ┆ 1.288046 ┆ 0.248529 ┆ 0.27902  ┆ 0.216622 ┆ 3976 ┆ 0.800929 ┆ 1.775164 │
    │ 4   ┆ 5   ┆ 0.749072 ┆ 0.248445 ┆ 0.16084  ┆ 0.214718 ┆ 3976 ┆ 0.26212  ┆ 1.236023 │
    │ 5   ┆ 5   ┆ 0.461413 ┆ 0.223437 ┆ 0.114358 ┆ 0.247844 ┆ 3995 ┆ 0.023476 ┆ 0.899349 │
    └─────┴─────┴──────────┴──────────┴──────────┴──────────┴──────┴──────────┴──────────┘

or print out a summary:

``` python
res.summary()
```

    Cohort-Time Local Average Treatment Effects on the Treated:
     E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
     2   2       0.2313       1.1158       0.2309            0.6632        1.5684  *
     2   3       0.2195       1.2253       0.2448            0.7455        1.7051  *
     2   4       0.2455       0.9663       0.2166            0.5419        1.3908  *
     2   5       0.2190       0.8621       0.2425            0.3868        1.3373  *
     3   3       0.2235       0.9163       0.2350            0.4558        1.3768  *
     3   4       0.2066       1.0630       0.2581            0.5570        1.5689  *
     3   5       0.2137       0.5352       0.2517            0.0419        1.0284  *
     4   4       0.2166       1.2880       0.2485            0.8009        1.7752  *
     4   5       0.2147       0.7491       0.2484            0.2621        1.2360  *
     5   5       0.2478       0.4614       0.2234            0.0235        0.8993  *
    ---
    Signif. codes: `*' confidence band does not cover 0
    Control group: Never treated
    Estimation Method: Doubly Robust

The `IDidResult` result object is documented in
{py:class}`IDidResult <idid.estimators.IDidResult>`.

#### Aggregated Effects

``` python
agg = idid.agg_latt(res, method="dynamic")
print(agg)
```

    AggLattResult(method='dynamic', overall_latt=0.893323526028491, overall_se=0.1249504706156244, estimates=DataFrame[4x5; l, latt, se, lower, ...], IF=ndarray(10000, 4), IF_overall=ndarray(10000, 1))

These correspond to estimates of the $\{\theta^{IV}_{es}(l) : l \in
\{0,1,\ldots,h\}\}$ parameters of the paper.

Again, we can inspect the estimates as a DataFrame:

``` python
print(agg.estimates)
```

    shape: (4, 5)
    ┌─────┬──────────┬──────────┬──────────┬──────────┐
    │ l   ┆ latt     ┆ se       ┆ lower    ┆ upper    │
    │ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
    │ i64 ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
    ╞═════╪══════════╪══════════╪══════════╪══════════╡
    │ 0   ┆ 0.931205 ┆ 0.091954 ┆ 0.750976 ┆ 1.111434 │
    │ 1   ┆ 1.015631 ┆ 0.13086  ┆ 0.759145 ┆ 1.272117 │
    │ 2   ┆ 0.764398 ┆ 0.162559 ┆ 0.445783 ┆ 1.083013 │
    │ 3   ┆ 0.86206  ┆ 0.24246  ┆ 0.386838 ┆ 1.337281 │
    └─────┴──────────┴──────────┴──────────┴──────────┘

or print out a summary:

``` python
agg.summary()
```

    Overall summary of ATT's based on event-study/dynamic aggregation:
      LATT    Std. Error    [95%      Conf. Band]
    0.8933        0.1250    0.6484         1.1382  *

    Dynamic effects:
      Event time    Estimate    Std. Error    [95% Pointwise    Conf. Band]
               0      0.9312        0.0920            0.7510         1.1114  *
               1      1.0156        0.1309            0.7591         1.2721  *
               2      0.7644        0.1626            0.4458         1.0830  *
               3      0.8621        0.2425            0.3868         1.3373  *
    ---
    Signif. codes: `*' confidence band does not cover 0
    Control group: Never treated
    Estimation Method: Doubly Robust

The `AggLattResult` result object is documented in
{py:class}`AggLattResult <idid.aggregate.AggLattResult>`. See also
extended guide in [Aggregation](aggregation).

##### Multiplier Bootstrap and Simultaneous Confidence Bands

We can conduct multiplier bootstrap to obtain simultaneous confidence
bands on all the event study parameters:

``` python
agg_b = idid.agg_latt(res, method="dynamic", boot=True)
agg_b.summary()
```

    Overall summary of ATT's based on event-study/dynamic aggregation:
      LATT    Std. Error    [95% Simult.    Conf. Band]
    0.8933        0.1246          0.6448         1.1419  *

    Dynamic effects:
      Event time    Estimate    Std. Error    [95% Simult.    Conf. Band]
               0      0.9312        0.0926          0.7036         1.1588  *
               1      1.0156        0.1302          0.6956         1.3356  *
               2      0.7644        0.1640          0.3614         1.1674  *
               3      0.8621        0.2337          0.2876         1.4366  *
    ---
    Signif. codes: `*' confidence band does not cover 0
    Control group: Never treated
    Estimation Method: Doubly Robust
    Multiplier bootstrap: B=1000, c=2.4580, overall c=1.9942

Comparing the two in a figure:

``` python
from idid.plotting import plot_agg
from matplotlib import pyplot as plt

fig, ax = plot_agg(
    [agg, agg_b],
    labels=[
        "Dynamic",
        "Dynamic (Simultaneous)",
    ],
)
```

![](quickstart_files/figure-commonmark/cell-11-output-1.png)

------------------------------------------------------------------------

#### DML

Using linear regression for the outcome nuisance model and `FastLogit`
for the treatment nuisance models:

``` python
from idid.nuisance_estimators import OLS, FastLogit


res_dml = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dml",
    dml_kwargs={
        "nfolds": 5,
        "m_m": OLS(),
        "g_m": FastLogit(),
        "p_m": FastLogit(),
    },
    balanced=True,
    verbose=False,
)

res_dml.summary()
```

    Cohort-Time Local Average Treatment Effects on the Treated:
     E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
     2   2       0.2311       1.1174       0.2313            0.6641        1.5707  *
     2   3       0.2190       1.2323       0.2456            0.7510        1.7137  *
     2   4       0.2453       0.9707       0.2169            0.5455        1.3959  *
     2   5       0.2187       0.8720       0.2431            0.3955        1.3485  *
     3   3       0.2237       0.9163       0.2348            0.4561        1.3766  *
     3   4       0.2073       1.0554       0.2573            0.5510        1.5598  *
     3   5       0.2139       0.5332       0.2517            0.0399        1.0265  *
     4   4       0.2166       1.2848       0.2487            0.7973        1.7723  *
     4   5       0.2141       0.7480       0.2495            0.2590        1.2370  *
     5   5       0.2474       0.4659       0.2236            0.0276        0.9041  *
    ---
    Signif. codes: `*' confidence band does not cover 0
    Control group: Never treated
    Estimation Method: Double Machine Learning
    DML nuisance models: m_m=OLS, g_m=FastLogit, p_m=FastLogit
    DML cross-fitting folds: 5

`FastLogit` comes from
[`fastlr`](https://github.com/jsr-p/idid/blob/main/src/idid/logreg.py).

Using sklearn objects for the nuisance models:

``` python
from sklearn.linear_model import LinearRegression, LogisticRegression


res_dml_sk = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dml",
    dml_kwargs={
        "nfolds": 5,
        "m_m": LinearRegression(),
        "g_m": LogisticRegression(),
        "p_m": LogisticRegression(),
    },
    balanced=True,
    verbose=False,
)

res_dml_sk.summary()
```

    Cohort-Time Local Average Treatment Effects on the Treated:
     E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
     2   2       0.2311       1.1174       0.2313            0.6641        1.5707  *
     2   3       0.2190       1.2323       0.2456            0.7510        1.7137  *
     2   4       0.2453       0.9707       0.2169            0.5455        1.3959  *
     2   5       0.2187       0.8720       0.2431            0.3955        1.3485  *
     3   3       0.2237       0.9163       0.2348            0.4561        1.3765  *
     3   4       0.2072       1.0554       0.2573            0.5510        1.5598  *
     3   5       0.2139       0.5332       0.2517            0.0399        1.0265  *
     4   4       0.2166       1.2848       0.2487            0.7974        1.7723  *
     4   5       0.2141       0.7480       0.2495            0.2590        1.2370  *
     5   5       0.2474       0.4659       0.2236            0.0277        0.9041  *
    ---
    Signif. codes: `*' confidence band does not cover 0
    Control group: Never treated
    Estimation Method: Double Machine Learning
    DML nuisance models: m_m=LinearRegression, g_m=LogisticRegression, p_m=LogisticRegression
    DML cross-fitting folds: 5

The `dml_kwargs` argument is documented in
{py:class}`DMLKwargs <idid._types.DMLKwargs>`.

------------------------------------------------------------------------

### Repeated cross-sections

The repeated cross-section case is handled analogously to the panel
case; the only difference is setting `balanced=False` in the `estimate`
function.

``` python
import numpy as np
import polars as pl

import idid

np.random.seed(40)
n = 10_000
E_cohorts = [0, 2, 3, 4, 5]
T = max(E_cohorts)

data = idid.sim_stag_rc(n=n, T=T, E_cohorts=E_cohorts)

with pl.Config(tbl_rows=10, tbl_cols=10):
    print(data.head(10))
```

    shape: (10, 6)
    ┌─────┬─────┬─────┬───────────┬─────┬───────────┐
    │ id  ┆ E   ┆ t   ┆ X         ┆ D_t ┆ Y_t       │
    │ --- ┆ --- ┆ --- ┆ ---       ┆ --- ┆ ---       │
    │ u32 ┆ i64 ┆ i64 ┆ f64       ┆ i64 ┆ f64       │
    ╞═════╪═════╪═════╪═══════════╪═════╪═══════════╡
    │ 1   ┆ 5   ┆ 1   ┆ -0.607548 ┆ 0   ┆ 0.01554   │
    │ 2   ┆ 3   ┆ 2   ┆ 0.545059  ┆ 1   ┆ 0.92165   │
    │ 3   ┆ 5   ┆ 3   ┆ 0.188836  ┆ 0   ┆ 1.154266  │
    │ 4   ┆ 0   ┆ 4   ┆ -0.982146 ┆ 0   ┆ 0.417061  │
    │ 5   ┆ 5   ┆ 5   ┆ 0.639967  ┆ 1   ┆ 3.541751  │
    │ 6   ┆ 3   ┆ 1   ┆ 0.842739  ┆ 0   ┆ 2.511837  │
    │ 7   ┆ 2   ┆ 2   ┆ -0.604616 ┆ 1   ┆ 1.386531  │
    │ 8   ┆ 0   ┆ 3   ┆ -0.770702 ┆ 0   ┆ -1.107861 │
    │ 9   ┆ 4   ┆ 4   ┆ 1.446865  ┆ 1   ┆ 4.404883  │
    │ 10  ┆ 5   ┆ 5   ┆ 1.991825  ┆ 1   ┆ 2.465375  │
    └─────┴─────┴─────┴───────────┴─────┴───────────┘

``` python
res = idid.estimate(
    data,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dr",
    balanced=False,
    verbose=False,
)
res.summary()
```

    Cohort-Time Local Average Treatment Effects on the Treated:
     E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
     2   2       0.2438       2.2645       0.6899            0.9124        3.6166  *
     2   3       0.1709       2.6230       0.9937            0.6753        4.5707  *
     2   4       0.1572       2.3952       1.0712            0.2957        4.4947  *
     2   5       0.2008       2.0019       0.8194            0.3959        3.6079  *
     3   3       0.1715       1.0047       0.9070           -0.7730        2.7824
     3   4       0.1882       1.2923       0.8183           -0.3116        2.8962
     3   5       0.2526       1.9867       0.6433            0.7259        3.2475  *
     4   4       0.2911       0.9887       0.5361           -0.0621        2.0394
     4   5       0.3185       1.0436       0.4999            0.0638        2.0234  *
     5   5       0.2761       1.7817       0.5973            0.6109        2.9525  *
    ---
    Signif. codes: `*' confidence band does not cover 0
    Control group: Never treated
    Estimation Method: Doubly Robust

``` python
agg = idid.agg_latt(res, method="dynamic")
print(agg.estimates)
```

    shape: (4, 5)
    ┌─────┬──────────┬──────────┬──────────┬──────────┐
    │ l   ┆ latt     ┆ se       ┆ lower    ┆ upper    │
    │ --- ┆ ---      ┆ ---      ┆ ---      ┆ ---      │
    │ i64 ┆ f64      ┆ f64      ┆ f64      ┆ f64      │
    ╞═════╪══════════╪══════════╪══════════╪══════════╡
    │ 0   ┆ 1.538493 ┆ 0.262079 ┆ 1.024819 ┆ 2.052167 │
    │ 1   ┆ 1.521017 ┆ 0.369766 ┆ 0.796275 ┆ 2.245759 │
    │ 2   ┆ 2.144508 ┆ 0.568804 ┆ 1.029653 ┆ 3.259363 │
    │ 3   ┆ 2.001893 ┆ 0.819383 ┆ 0.395903 ┆ 3.607884 │
    └─────┴──────────┴──────────┴──────────┴──────────┘