Skip to content

Biostatistics - Causal Inference

Because causal evidence is central to informing operational and clinical decisions, most research activities conducted by investigators in the Division of Research aim to draw causal inferences from experimental or nonexperimental data. Recent decades have seen considerable advances in the development of formal causal-inference methodologies.

In collaboration with academic partners, DOR researchers are developing software to streamline the application of such advanced causal inference methods in the routine conduct of research. They also contribute to the development of these methods through evaluation of their applicability and practical performance with real-world data and by tackling methodologic challenges and new opportunities in drawing causal inferences from complex electronic health record data.

Software for Causal Inference Research


This R package automates the implementation of various estimators of the effects of time-varying static, dynamic, and stochastic treatment and monitoring interventions on time-to-event outcomes (e.g., counterfactual discrete-time survival curves or coefficients of Marginal Structural Models). To adjust for both observed time-dependent confounding and informative right-censoring, the following estimation approaches are automated: inverse probability weighting, g-computation, and targeted minimum loss based estimation. Nuisance parameters can be estimated using user-specified generalized linear models or H2O machine learning algorithms (including an H2O ensemble learning approach, also known as Super Learning). Analytic results can be automatically exported in standard HTML, MS Word, or PDF reports.


This R package is a flexible tool for simulating complex longitudinal data using structural equations, with emphasis on problems in causal inference. The user interface is designed to facilitate the conduct of transparent and reproducible simulation studies, and allows concise expression of complex functional dependencies for a large number of time-varying nodes. In particular, the following steps of a standard data simulation workflow are facilitated by this software: specify interventions and simulate from intervened data generating distributions, define and evaluate treatment-specific means, the average treatment effects and coefficients from working marginal structural models.

MSMstructure (Right click to save zip file)


This SAS macro and R package automate the processing of longitudinal electronic health record data from an observational cohort study into a structured analytic dataset suitable for the evaluation of the effects of time-varying treatment and monitoring interventions on a survival outcome using, for example, inverse probability weighting or targeted minimum loss based estimation. In particular, output from both of these software products can be used with the MSM macroltmle R package, or the stremr R package described above. The R routine f_Long_to_Wide may be used to convert the output data in long format (generated either by the MSMstructure macro or the LtAtStructuR R package) into the wide format used by the ltmle R package.

DSA: Data-Adaptive Estimation with Cross-Validation and the D/S/A Algorithm

DSA_3.1.4.tar.gz (Right click to save tar file) (Right click to save zip file)

modelUtils_3.1.4.tar.gz (Right click to save tar file) (Right click to save zip file)

This combination of two R packages (modelUtils must be installed and loaded first) performs data-adaptive estimation through estimator selection based on cross-validation and the L2 loss function. Candidate estimators are defined with polynomial generalized linear models generated with the Deletion/Substitution/Addition (D/S/A) algorithm under user-specified constraints. This software may be used for prediction or for data-adaptive estimation of the nuisance parameters (e.g., propensity scores) involved in the estimation of causal estimands.

For more information, contact Romain Neugebauer, PhD.

Back To Top