Interactive Stata Output Lab - Decomposition Workshop

S Interactive Stata Output Lab - Learn to Read Decomposition Results Values shown are for demonstration purposes only

← Back to Presentation

Command History - Click to View

Oaxaca-Blinder Commands

oaxaca SBP BMI age, by(pop) A. Three-fold decomposition (default)

oaxaca SBP BMI age, by(pop) pooled B. Two-fold with pooled (RECOMMENDED)

oaxaca ... vce(bootstrap, reps(100)) C. With bootstrap standard errors

oaxaca ... (functional: adl iadl) D. Grouped variables

oaxaca ... pooled svy E. Survey-weighted decomposition

Fairlie Commands

fairlie hypertension BMI age, by(pop) A. Basic Fairlie decomposition

fairlie ... pooled(pop) ro reps(300) B. Preferred: pooled + RO + groups

bootstrap, reps(100) seed(12345): fairlie ... C. Bootstrap SEs for publication

Stata Results Window Click any highlighted section for detailed explanation

. oaxaca SBP BMI age, by(population) Blinder-Oaxaca decomposition Number of obs = 2,000 Model = linear Group 1: population = 0 N of obs 1 = 1,000 Group 2: population = 1 N of obs 2 = 1,000 endowments: (X1 - X2) * b2 coefficients: X2 * (b1 - b2) interaction: (X1 - X2) * (b1 - b2) ------------------------------------------------------------------------------ SBP | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | 175.6579 .3630455 483.85 0.000 174.9464 176.3695 group_2 | 148.3537 .3165138 468.71 0.000 147.7334 148.9741 difference | 27.3042 .4816461 56.69 0.000 26.3602 28.24821 endowments | 11.45145 .6372081 17.97 0.000 10.20255 12.70036 coefficients | 14.32706 .6640514 21.58 0.000 13.02554 15.62858 interaction | 1.525692 .7903181 1.93 0.054 -.0233032 3.074687 -------------+---------------------------------------------------------------- endowments | BMI | 7.825849 .5644088 13.87 0.000 6.719628 8.93207 age | 3.625604 .2893155 12.53 0.000 3.058556 4.192653 -------------+---------------------------------------------------------------- coefficients | BMI | 4.497143 2.22776 2.02 0.044 .1308142 8.863471 age | .2580753 1.60092 0.16 0.872 -2.87967 3.395821 _cons | 9.571841 3.155704 3.03 0.002 3.386775 15.75691 -------------+---------------------------------------------------------------- interaction | BMI | 1.477729 .7328098 2.02 0.044 .041448 2.91401 age | .0479629 .2975408 0.16 0.872 -.5352063 .631132 ------------------------------------------------------------------------------

Click output to see explanation

Click any highlighted section in the Stata output to see a detailed explanation here.

Model Info

Formula

Overall Gap

Explained

Unexplained

Variables

Confidence Intervals

Result Text

Decomposition Results

. oaxaca Y X, by(group) weight(0.5) Blinder-Oaxaca decomposition Model = linear Group 1: group = A N of obs 1 = 1,000 Group 2: group = B N of obs 2 = 1,000 Reference coefficients: b* = 0.5*b1 + 0.5*b2 explained: (X1 - X2) * b* unexplained: X1*(b1-b*) + X2*(b*-b2) + (a1-a2) ----------------------------------------------------------------- Y | Coef. Std.err. z P>|z| [95% Conf.Int] -----------+------------------------------------------------- overall | group_A | 175.0000 0.363 482.27 0.000 174.29 175.71 group_B | 140.0000 0.317 442.34 0.000 139.38 140.62 difference | 35.0000 0.482 72.68 0.000 34.06 35.94 explained | 11.2500 0.492 22.86 0.000 10.29 12.21 unexplained| 23.7500 0.550 43.17 0.000 22.67 24.83 -----------+------------------------------------------------- explained | X | 11.2500 0.424 26.55 0.000 10.42 12.08 -----------------------------------------------------------------

The mean outcome difference between groups was 0.0 units (95% CI, -0.8–0.8), with higher values in Group A (90.0) compared to Group B (90.0). Of this gap, 0.0 units (95% CI, -0.4–0.4; 0%) was statistically explained by differences in measured covariates (X), whereas 0.0 units (95% CI, -0.3–0.3; 0%) remained unexplained. The unexplained component may reflect unmeasured confounding, model misspecification, or true group differences in how X affects Y.

Oaxaca-Blinder Decomposition

Gap (Ȳ_A − Ȳ_B) = Explained (ΔX̄ × β*) + Unexplained

β* = (β_A + β_B) / 2 = (2.5 + 2.0) / 2 = 2.25 (Stata: weight(0.5))

175.0 − 140.0 = 35.0 = (30 − 25) × 2.25 = 11.3 + 23.8

Gap Decomposition Total: 0.0

Explained: 0%

Unexplained: 0%

GROUP A

X̄ 20.0

α 80

β 0.5

GROUP B

X̄ 20.0

α 80

β 0.5

Regression Lines

A B

Fairlie Decomposition Results

. fairlie hypertension age, by(population) Logistic regression (reference model: population == 0) Number of obs = 1,000 LR chi2(1) = 436.22 Prob > chi2 = 0.0000 Log likelihood = -321.05 Pseudo R2 = 0.4041 -------------------------------------------------------------- hypertension | Coef. Std.err. z P>|z| [95% CI] -------------+------------------------------------------------ age | 0.120 0.011 10.91 0.000 0.099 0.141 _cons | -5.750 0.616 -9.34 0.000 -6.96 -4.54 -------------------------------------------------------------- Non-linear decomposition by population (G) Number of obs = 2,000 N of obs G=0 = 1,000 N of obs G=1 = 1,000 Pr(Y!=0|G=0) = .934 Pr(Y!=0|G=1) = .148 Difference = .786 Total expl. = .372 -------------------------------------------------------------- hypertension | Coef. Std.err. z P>|z| [95% CI] -------------+------------------------------------------------ age | .372 .032 11.63 0.000 .310 .434 --------------------------------------------------------------

The outcome prevalence difference between groups was 78.6 percentage points (95% CI, 73.0–84.2), with higher prevalence in Group 0 (93.4%) compared to Group 1 (14.8%). Using Fairlie nonlinear decomposition (Stata default, reference(0)), 37.2 pp (95% CI, 31.0–43.4; 47%) was statistically explained by differences in age, reflecting higher average age in Group 0. The remaining gap may reflect unmeasured confounding or true group differences in how age affects hypertension risk.

FAIRLIE DECOMPOSITION

Gap = Explained + Unexplained

β* = β₀ (reference: Group 0, Stata default)

78.6 = 37.2 + 41.4

(47% explained, 53% unexplained)

Gap Decomposition Total: 50.0 pp

Explained Unexplained

Group 0 (Reference)

X̄ (Age) 55

β(Age) 0.080

Group 1

X̄ (Age) 55

β(Age) 0.080

Key Concepts (Fairlie 2005)

• Binary outcome decomposition via logistic regression
• Results in percentage points (pp)
• Variable ordering affects individual contributions; use ro to randomize

Logistic Probability Curves

G=0 G=1

Baseline p	After adding β = 0.50	Actual increase
0.10	0.17	+0.07
0.50	0.62	+0.12
0.90	0.95	+0.05

How Oaxaca-Blinder Splits the Gap — Step by Step

Start with the raw gap

Add a clever zero

Regroup the pieces

Two meaningful components

Functional Form & Model Misspecification

Common Support & Extrapolation

Non-Additivity on the Probability Scale

Linear predictor (log-odds scale)

Predicted probability (logistic)

Why Linear Models Fail for Binary Outcomes

Linear Probability Model (LPM)

Logistic Regression

How Splines Work: One Curve, Many Lines

Whole Curve