Examples¶

This page collects a few slightly less standard use cases that build on the main workflow from the quickstart and aggregation pages.

Custom Group Aggregation¶

You can aggregate cohorts into custom groups by passing a custom_map to agg_latt(..., method="cohort_custom").

import idid

df = idid.sim_stag_panel(
    n=10_000,
    T=5,
    E_cohorts=[0, 2, 3, 4, 5],
)

res = idid.estimate(
    df,
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="never",
    method="dr",
    balanced=True,
    verbose=False,
)

custom_map = {
    2: "2-3",
    3: "2-3",
    4: "4-5",
    5: "4-5",
}

agg = idid.agg_latt(
    res,
    method="cohort_custom",
    agg_kwargs={"custom_map": custom_map},
)
agg.summary()

Overall summary of ATT's based on custom cohort aggregation:
  LATT    Std. Error    [95%      Conf. Band]
1.1750        0.1013    0.9765         1.3734  *

Custom cohort effects:
Cohort      Estimate    Std. Error    [95% Pointwise    Conf. Band]
2-3           0.9925        0.1214            0.7545         1.2304  *
4-5           1.3506        0.1584            1.0402         1.6610  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

Compare that to:

agg = idid.agg_latt(res, method="cohort")
agg.summary()

Overall summary of ATT's based on cohort aggregation:
  LATT    Std. Error    [95%      Conf. Band]
1.1522        0.0992    0.9577         1.3466  *

Cohort effects:
  Cohort    Estimate    Std. Error    [95% Pointwise    Conf. Band]
       2      0.9999        0.1625            0.6813         1.3185  *
       3      0.9818        0.1793            0.6304         1.3331  *
       4      1.5078        0.2244            1.0680         1.9477  *
       5      1.0982        0.2087            0.6891         1.5072  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

Group Difference Aggregation¶

To estimate differences between two groups, estimate the cohort-time LATTs within each group and then call group_diff_idid. The true difference between the groups equals \(1/2\) for all \((e, t)\).

import polars as pl

from idid.aggregate import group_diff_idid
from idid.plotting import plot_agg, plot_evolution_groups, summarize_evolution

df = idid.sim_stag_panel(
    n=20_000,
    T=4,
    E_cohorts=[0, 2, 3, 4],
    confounded=True,
    with_group=True,
)
print(df.head())

shape: (5, 7)
┌─────┬─────┬─────┬───────────┬─────┬──────────┬─────┐
│ id  ┆ E   ┆ t   ┆ X         ┆ D_t ┆ Y_t      ┆ F   │
│ --- ┆ --- ┆ --- ┆ ---       ┆ --- ┆ ---      ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ f64       ┆ i64 ┆ f64      ┆ i64 │
╞═════╪═════╪═════╪═══════════╪═════╪══════════╪═════╡
│ 0   ┆ 4   ┆ 1   ┆ 1.285605  ┆ 1   ┆ 7.275993 ┆ 1   │
│ 0   ┆ 4   ┆ 2   ┆ 1.285605  ┆ 0   ┆ 2.930226 ┆ 1   │
│ 0   ┆ 4   ┆ 3   ┆ 1.285605  ┆ 0   ┆ 3.64175  ┆ 1   │
│ 0   ┆ 4   ┆ 4   ┆ 1.285605  ┆ 1   ┆ 7.260314 ┆ 1   │
│ 1   ┆ 4   ┆ 1   ┆ -0.303553 ┆ 1   ┆ 3.297906 ┆ 0   │
└─────┴─────┴─────┴───────────┴─────┴──────────┴─────┘

gp0 = summarize_evolution(df.filter(pl.col("F").eq(0)))
gp1 = summarize_evolution(df.filter(pl.col("F").eq(1)))
fig, ax = plot_evolution_groups(
    gp0,
    gp1,
    include_bands=True,
)

ests = {}
for f in [0, 1]:
    res = idid.estimate(
        df.filter(pl.col("F").eq(f)).with_columns(
            id=pl.lit("M" if f == 0 else "F") + pl.col("id").cast(pl.Utf8),
        ),
        cohort="E",
        time="t",
        outcome="Y_t",
        treatment="D_t",
        unit="id",
        covariates=["X"],
        control="never",
        method="dr",
        balanced=True,
        verbose=False,
    )
    ests[f] = res

diff = group_diff_idid(ests[1], ests[0])

print(diff.estimates)
diff.summary()

shape: (6, 7)
┌─────┬─────┬───────────┬──────────┬──────────┬───────────┬───────────┐
│ E   ┆ t   ┆ latt      ┆ denom    ┆ se       ┆ lower     ┆ upper     │
│ --- ┆ --- ┆ ---       ┆ ---      ┆ ---      ┆ ---       ┆ ---       │
│ i64 ┆ i64 ┆ f64       ┆ f64      ┆ f64      ┆ f64       ┆ f64       │
╞═════╪═════╪═══════════╪══════════╪══════════╪═══════════╪═══════════╡
│ 2   ┆ 2   ┆ -0.713377 ┆ 0.188419 ┆ 0.486029 ┆ -1.665993 ┆ 0.239239  │
│ 2   ┆ 3   ┆ -0.721623 ┆ 0.19596  ┆ 0.452582 ┆ -1.608683 ┆ 0.165437  │
│ 2   ┆ 4   ┆ -0.831864 ┆ 0.211267 ┆ 0.415799 ┆ -1.64683  ┆ -0.016897 │
│ 3   ┆ 3   ┆ -0.747214 ┆ 0.214008 ┆ 0.41707  ┆ -1.564672 ┆ 0.070243  │
│ 3   ┆ 4   ┆ -0.991648 ┆ 0.225592 ┆ 0.391804 ┆ -1.759584 ┆ -0.223713 │
│ 4   ┆ 4   ┆ -0.591208 ┆ 0.211862 ┆ 0.417223 ┆ -1.408965 ┆ 0.226549  │
└─────┴─────┴───────────┴──────────┴──────────┴───────────┴───────────┘
Cohort-Time Local Average Treatment Effects on the Treated:
 E   t    AET(e, t)   LATT(e, t)   Std. Error   [95% Pointwise.   Conf. Band]
 2   2       0.1884      -0.7134       0.4860           -1.6660        0.2392
 2   3       0.1960      -0.7216       0.4526           -1.6087        0.1654
 2   4       0.2113      -0.8319       0.4158           -1.6468       -0.0169  *
 3   3       0.2140      -0.7472       0.4171           -1.5647        0.0702
 3   4       0.2256      -0.9916       0.3918           -1.7596       -0.2237  *
 4   4       0.2119      -0.5912       0.4172           -1.4090        0.2265
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

Aggregated dynamic effects:

agg = idid.agg_latt(diff, method="dynamic")
agg.summary()

Overall summary of ATT's based on event-study/dynamic aggregation:
   LATT    Std. Error    [95%      Conf. Band]
-0.7932        0.2551   -1.2933        -0.2932  *

Dynamic effects:
  Event time    Estimate    Std. Error    [95% Pointwise    Conf. Band]
           0     -0.6834        0.2080           -1.0910        -0.2758  *
           1     -0.8644        0.2980           -1.4485        -0.2802  *
           2     -0.8319        0.4158           -1.6468        -0.0169  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust

fig, ax = plot_agg([agg])

agg_diff = idid.agg_latt(diff, method="dynamic", boot=True)
agg_diff.summary()

Overall summary of ATT's based on event-study/dynamic aggregation:
   LATT    Std. Error    [95% Simult.    Conf. Band]
-0.7932        0.2619         -1.3063        -0.2802  *

Dynamic effects:
  Event time    Estimate    Std. Error    [95% Simult.    Conf. Band]
           0     -0.6834        0.2119         -1.1694        -0.1975  *
           1     -0.8644        0.3044         -1.5624        -0.1663  *
           2     -0.8319        0.4086         -1.7690         0.1053
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust
Multiplier bootstrap: B=1000, c=2.2934, overall c=1.9588

Compare that to:

agg0 = idid.agg_latt(ests[0], method="dynamic", boot=True)
agg0.summary()

Overall summary of ATT's based on event-study/dynamic aggregation:
  LATT    Std. Error    [95% Simult.    Conf. Band]
1.1506        0.1841          0.7958         1.5054  *

Dynamic effects:
  Event time    Estimate    Std. Error    [95% Simult.    Conf. Band]
           0      1.0224        0.1550          0.6773         1.3675  *
           1      1.2295        0.2259          0.7265         1.7325  *
           2      1.2000        0.2980          0.5366         1.8633  *
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust
Multiplier bootstrap: B=1000, c=2.2264, overall c=1.9274

agg1 = idid.agg_latt(ests[1], method="dynamic", boot=True)
agg1.summary()

Overall summary of ATT's based on event-study/dynamic aggregation:
  LATT    Std. Error    [95% Simult.    Conf. Band]
0.3494        0.1739         -0.0236         0.7224

Dynamic effects:
  Event time    Estimate    Std. Error    [95% Simult.    Conf. Band]
           0      0.3214        0.1408          0.0042         0.6387  *
           1      0.3587        0.2134         -0.1220         0.8394
           2      0.3681        0.2922         -0.2901         1.0263
---
Signif. codes: `*' confidence band does not cover 0
Control group: Never treated
Estimation Method: Doubly Robust
Multiplier bootstrap: B=1000, c=2.2528, overall c=2.1445

We can plot the three aggregated objects:

fig, ax = plot_agg(
    [agg0, agg1, agg_diff],
    labels=[
        "F = 0",
        "F = 1",
        "Group Difference",
    ],
)

Introspection of estimates¶

The implementation allows for inspection of specific \(LATT(e, t)\) estimates. After calling idid.estimate, the cell-level results are stored in the latts dictionary of the returned object. This is useful when you want to inspect one particular effect and see which treated and control observations entered that comparison.

from idid._types import FailedLATT
from idid.estimators import get_controls

res = idid.estimate(
    idid.sim_stag_panel(
        n=10_000,
        T=5,
        E_cohorts=[2, 3, 4, 5],
    ),
    cohort="E",
    time="t",
    outcome="Y_t",
    treatment="D_t",
    unit="id",
    covariates=["X"],
    control="notyet",
    method="dr",
    balanced=True,
    verbose=False,
)

e = 2
t = 3
latt = res.latts[(e, t)]
if isinstance(latt, FailedLATT):
    raise ValueError("Failed LATT estimation")

print(latt)

LATT(g=2, t=3, latt=float64(), num=float64(), denom=float64(), ns=7417, ids=DataFrame[7417x2; id, E], IF=ndarray(7417, 1), IF_aet=ndarray(7417, 1), extra=dict[num, den])

Each LATT object stores the underlying unit-period ids used for that comparison. In the ids data frame, E = 1 denotes treated observations and E = 0 denotes controls.

print(latt.ids.head())
print(latt.ids["E"].value_counts())

shape: (5, 2)
┌─────┬─────┐
│ id  ┆ E   │
│ --- ┆ --- │
│ i64 ┆ i8  │
╞═════╪═════╡
│ 1   ┆ 1   │
│ 2   ┆ 0   │
│ 3   ┆ 0   │
│ 4   ┆ 0   │
│ 5   ┆ 0   │
└─────┴─────┘
shape: (2, 2)
┌─────┬───────┐
│ E   ┆ count │
│ --- ┆ ---   │
│ i8  ┆ u32   │
╞═════╪═══════╡
│ 0   ┆ 4945  │
│ 1   ┆ 2472  │
└─────┴───────┘

A small helper returns the control units for a given LATT object and IDidResult.

controls = get_controls(latt, res).sort("id")
print(controls.head())

shape: (5, 2)
┌─────┬─────┐
│ id  ┆ E   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 2   ┆ 5   │
│ 3   ┆ 4   │
│ 4   ┆ 5   │
│ 5   ┆ 4   │
│ 6   ┆ 5   │
└─────┴─────┘

print(f"#Controls = {controls.shape[0]}")
print(controls[res.dp.e_col].value_counts().sort(res.dp.e_col))

#Controls = 4945
shape: (2, 2)
┌─────┬───────┐
│ E   ┆ count │
│ --- ┆ ---   │
│ i64 ┆ u32   │
╞═════╪═══════╡
│ 4   ┆ 2453  │
│ 5   ┆ 2492  │
└─────┴───────┘

I.e. the not-yet-exposed controls for \(\hat{LATT}(2, 3)\) are evenly distributed across the cohorts \(E \in \{4, 5\}\).