Re-weighting the randomized controlled trial for generalization: finite-sample error and variable selection

Gael Varoquaux

Researcher at Inria, Co-Founder at scikit-learn & Probabl

Published May 27, 2024

+ Follow

Just published at JRSSA: An asymptotic and finite-sample sample analysis of external generalization from an RCT

A short explainer

Settings: one trial R, data from the target population T. Trial may not represent well the target population

Estimator: Reweighting (propensity score) can adjust for difference (IPSW estimator).

Question: which covariates to include in the IPSW model? Shifted covariate? Treatment-effect modifiers?

The covariable necessary for identifiability are those that are both treatment effect modifiers and shifted between the study population and target population.

But a detailed analysis of the error (bias & variance) of the ISPW reveals best choice of covariate: for reduced variance, use all treatment effect modifiers.

Detailed results

The error of the IPSW is function of n (# in trial) and m (# in target sample).

(expression of the error on the generalized average effect, details in the manuscript)

Studying the formula above shows that adding shifted variables increases dimensionality in the estimation of the propensity scores, thus inflating variance of the estimated transported effect.

Conversely, adding modifiers which are not shifted explains better the outcome as a function of the population traits, and thus decreases the variance of the estimated transported effect

This is a technical paper (it shows consistency of IPSW, which had not been established), but such mathematical work is useful to go beyond hand-waiving, and leads to very practical recommendations.

Preprint here

Alessandro C.

Research Group Lead

6mo

Great insight. Always subtle

1 Reaction

To view or add a comment, sign in

See all

Re-weighting the randomized controlled trial for generalization: finite-sample error and variable selection

Gael Varoquaux

Researcher at Inria, Co-Founder at scikit-learn & Probabl

Detailed results

More articles by this author

Insights from the community

Others also viewed

Deciphering Statistical Significance

The Central Limit Theorem (CLT)

Statistical Distributions

Replication crisis

Pair Shaped

Statistical Industry Classification

Neuro-Amorphic Function

“What is your Banana?” - The 5 Monkeys Experiment

Fibonacci and Covid

BREAKING NEWS !!!! England's King Richard III identified with DNA

Explore topics

Detailed results

✨📰preprint - From prediction to prescription: Machine learning and Causal Inference

Nov 14, 2024

✨ #ICML2024 accepted! CARTE: Pretraining and Transfer for Tabular Learning

Jun 3, 2024

Prix de l'académie de science - Inria - Dassault

Nov 26, 2019

Insights from the community

Others also viewed

Deciphering Statistical Significance

The Central Limit Theorem (CLT)

Statistical Distributions

Replication crisis

Pair Shaped

Statistical Industry Classification

Neuro-Amorphic Function

“What is your Banana?” - The 5 Monkeys Experiment

Fibonacci and Covid

BREAKING NEWS !!!! England's King Richard III identified with DNA

Explore topics