Propensity Score Weighting Analysis

Unlike randomized clinical trials, observational studies must adjust for differences such as confounding to ensure patient characteristics are comparable across treatment groups. This is frequently addressed through propensity scores (PS), which summarizes differences in patient characteristics between treatment groups. Propensity Score is the probability that each individual will be assigned to receive the treatment of interest given their measured covariates. Matching or Weighting on the PS is used to adjust comparisons between the 2 groups, thus reducing the potential bias in estimated effects of observational studies.

The following use cases assume a binary treatment or exposure in order to infer causality. Given a treatment and control with one outcome observed per unit, can we estimate the treatment effect? Note we can only estimate the treatment effect, identification of causality is not possible through observational studies.

Estimation of Propensity Scores

Propensity scores are most commonly estimated using binomial regression models (logistic regression, probit, etc.). Other propensity score methods include:

Classification Trees
Bagging/Boosting
Neural Networks
Recursive Partitioning

Basically, any model that provides predictive probability and where all the covariates related to treatment and outcome that were measured before treatment are included in the propensity score estimation model. The SAS example below estimates propensity scores for the treatment variable group predicted from covariates var1, var2, and var3 using a logistic regression:

PROC LOGISTIC data=ps_est;
  title 'Propensity Score Estimation';
  model group = var1-var3 / lackfit outroc = ps_r;
  output out = ps_p pred = ps xbeta=logit_ps;
  /* Output the propensity score and the logit of the propensity score */
run;

Once we have the propensity scores estimated, we must make sure the measured covariates are balanced in order to reduce bias. There are several ways to achieve this:

Graphic of the propensity score distribution - The distribution of propensity score between the two groups should overlap. Non-overlapping distributions suggest that one or more covariates are strongly predictive, and variable selection or stratification should be reconsidered.
Standardized differences of each covariate between treatment groups - the magnitude of the difference between baseline characteristics of the groups can be calculated depending on the method of deriving propensity scores. One limitation of this method is the lack of consensus as to what the threshold should be, though researchers have suggested a standardized difference of .1 or more denotes meaningful imbalance in the baseline covariates.
Stratify by deciles or quintiles - By stratifying the propensity score by deciles or quintils, a boxplot can represent each quintile.

When the scores aren't balanced, the covariates in the model should be adjusted. This could mean adding or removing covariates, adding interactions, or substituting a non-linear term for a continuous one.

Estimating Treatment Effects

After we obtain the propensity score, the next step is to estimate the Average Treatment Effect in the population (ATE). Using multiple methods can help strengthen conclusions, while discrepancies can indicate confounding and/or sensitivity to the analysis approach.

Stratification

Stratification divides individuals into many groups on the basis of their propensity score values. The optimal number of strata depends on sample size and the amount of overlap between treatment and control group propensity scores, however researchers have suggested 5 subclasses is sufficient enough to remove 90% of bias in the majority of PS studies.

$$ \hat\mu_{1i} - \hat\mu_{2i} = {{\sum w_i (\bar X_{1i} - \bar X_{2i}) } \over {\sum w_i}} $$

/* Stratify */
PROC RANK data=ps_p out=ps_strata groups=5;
	var ps_pred;
    ranks ps_pred_rank;
run;

/* Sort */
proc sort data = ps_strataranks;
	by ps_pred_rank;

/* Compute the difference between group means in each stratum,
	as well as the standard error of this within stratum difference */
proc ttest;
  by ps_pred_rank;
  class group;
  var outcome;
  ods output statistics = strata_out;

/* Find stratum specific weights and mutliply by mean difference */
data weights;
  set strata_out;
  if class = 'Diff (1-2)';
  wt_i = 1/(StdErr**2);
  wt_diff = wt_i*Mean;
 
/* Find the mean weighted difference and its standard error */
proc means noprint data = weights;
  var wt_i wt_diff;
  output out = total sum = sum_wt sum_diff;
  data total2;
  set total;
  Mean_diff = sum_diff/sum_wt;
  SE_Diff = SQRT(1/sum_wt);

proc print data = total2;

run;

References

Overlap Weighting: A Propensity Score Method That Mimics Attributes of a Randomized Clinical Trials

Propensity Score Analysis and Assessment of propensity Score Approaches Using SAS Procedures