Propensity Score Weighting Analysis

Unlike randomized clinical trials, observational studies must adjust for differences such as confounding to ensure patient characteristics are comparable across treatment groups. This is frequently addressed through propensity scores (PS), which summarizes differences in patient characteristics between treatment groups. Propensity Score is the probability that each individual will be assigned to receive the treatment of interest given their measured covariates. Matching or Weighting on the PS is used to adjust comparisons between the 2 groups, thus reducing the potential bias in estimated effects of observational studies.

The following use cases assume a binary treatment or exposure in order to infer causality. Given a treatment and control with one outcome observed per unit, can we estimate the treatment effect? Note we can only estimate the treatment effect, identification of causality is not possible through observational studies.

Estimation of Propensity Scores

Propensity scores are most commonly estimated using binomial regression models (logistic regression, probit, etc.). Other propensity score methods include:

Classification Trees
Bagging/Boosting
Neural Networks
Recursive Partitioning

~~All~~Basically, any model that provides predictive probability and where all the covariates related to treatment and outcome that were measured before treatment ~~should be~~are included in the propensity score estimation model. The SAS example below estimates propensity scores for the treatment variable group predicted from covariates var1, var2, and var3 .using a logistic regression:

PROC LOGISTIC data=ps_est;
  title 'Propensity Score Estimation';
  model group = var1-var3 / lackfit outroc = ps_r;
  output out = ps_p;ps_p pred = ps xbeta=logit_ps;
  /* Output the propensity score and the logit of the propensity score */
run;

Once we have the propensity scores estimated, we must make sure the measured covariates are balanced in order to reduce bias. There are several ways to achieve this:

Graphic of the propensity score distribution - The distribution of propensity score between the two groups should overlap. Non-overlapping distributions suggest that one or more covariates are strongly predictive, and variable selection or stratification should be reconsidered.

Standardized differences of each covariate between treatment groups - the magnitude of the difference between baseline characteristics of the groups can be calculated depending on the method of deriving propensity scores. One limitation of this method is the lack of consensus as to what the threshold should be, though researchers have suggested a standardized difference of .1 or more denotes meaningful imbalance in the baseline covariates.

Stratify by deciles or quintiles - By stratifying the propensity score by deciles or quintils, a boxplot can represent each quintile.

When the scores aren't balanced, the covariates in the model should be adjusted. This could mean adding or removing covariates, adding interactions, or substituting a non-linear term for a continuous one.

Treatment Effects

After we obtain the propensity score, the next step is to estimate the treatment effect. It is important to determine the estimand of interest, whether it be:

the Average Treatment effect on the Treated (ATT)

the Average Treatment Effect in the population (ATE)

References

Overlap Weighting: A Propensity Score Method That Mimics Attributes of a Randomized Clinical Trials ~~(Thomas, Li, Pencina; 2020)~~

Propensity Score Analysis and Assessment of propensity Score Approaches Using SAS Procedures