Cox Regression Applied to Recidivism Data
We return to the recidivism data that we analyzed using parametric models. Let us start by reading the data and setting them up for survival analysis
. use http://www.stata.com/data/jwooldridge/eacsap/recid, clear
. gen fail = 1 - cens
. stset durat, failure(fail)
failure event: fail != 0 & fail < .
obs. time interval: (0, durat]
exit on or before: failure
------------------------------------------------------------------------------
1445 total obs.
0 exclusions
------------------------------------------------------------------------------
1445 obs. remaining, representing
552 failures in single record/single failure data
80013 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 81
A Proportional Hazards Model
We fit a proportional hazards model using, as I recommend, the
efron method for handling ties
. local model workprg priors tserved felon drugs black married educ age
. stcox `model', efron
failure _d: fail
analysis time _t: durat
Iteration 0: log likelihood = -3891.7741
Iteration 1: log likelihood = -3841.8477
Iteration 2: log likelihood = -3822.6663
Iteration 3: log likelihood = -3820.9451
Iteration 4: log likelihood = -3820.9142
Iteration 5: log likelihood = -3820.9142
Refining estimates:
Iteration 0: log likelihood = -3820.9142
Cox regression -- Efron method for ties
No. of subjects = 1445 Number of obs = 1445
No. of failures = 552
Time at risk = 80013
LR chi2(9) = 141.72
Log likelihood = -3820.9142 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
workprg | 1.054341 .0952997 0.59 0.558 .8831665 1.258691
priors | 1.100222 .014745 7.13 0.000 1.071699 1.129505
tserved | 1.012483 .0016826 7.47 0.000 1.009191 1.015786
felon | .7361351 .077207 -2.92 0.003 .5993523 .904134
drugs | 1.30457 .1278366 2.71 0.007 1.076607 1.580803
black | 1.485848 .1301791 4.52 0.000 1.251406 1.76421
married | .8741874 .0953226 -1.23 0.218 .7059735 1.082482
educ | .9804111 .0190581 -1.02 0.309 .9437606 1.018485
age | .9966611 .0005162 -6.46 0.000 .9956499 .9976733
------------------------------------------------------------------------------
. estimates store efron // for later comparisons
The estimated coefficients are similar to those obtained using a Weibull model. Here's a side by side comparison:
. quietly streg `model', distribution(weibull)
. estimates store weibull
. estimates table efron weibull, equation(1:1) // option needed to match equations
----------------------------------------
Variable | efron weibull
-------------+--------------------------
#1 |
workprg | .05291557 .05775252
priors | .0955123 .09672029
tserved | .01240588 .01285382
felon | -.30634168 -.3201125
drugs | .26587357 .27186642
black | .39598547 .41429122
married | -.13446046 -.12972325
educ | -.01978331 -.02192692
age | -.0033445 -.00345801
_cons | -3.3512677
-------------+--------------------------
ln_p |
_cons | -.22080169
----------------------------------------
For example the Cox coefficient for blacks indicates that African Americans face a 48.6% higher risk of recidivism at any duration since release from prison than non-blacks with the same observed characteristics. The Weibull analysis yielded an estimate of 51.3% higher hazard.
The Treatment of Ties
Let us compare all four methods of handling ties. Note if you run this code
that the exactp option takes sustantially longer than the others
. local model workprg priors tserved felon drugs black married educ age
. quietly stcox `model', breslow // default
. estimates store breslow
. quietly stcox `model', exactm
. estimates store exactm
. quietly stcox `model', exactp
. estimates store exactp
. estimates table breslow efron exactm exactp
------------------------------------------------------------------
Variable | breslow efron exactm exactp
-------------+----------------------------------------------------
workprg | .05284355 .05291557 .05291409 .05315276
priors | .09503255 .0955123 .09551509 .09645497
tserved | .01229776 .01240588 .01240713 .01254959
felon | -.30525144 -.30634168 -.3063558 -.30993357
drugs | .26393017 .26587357 .26588358 .26686928
black | .39366079 .39598547 .39600249 .398402
married | -.13422414 -.13446046 -.13446367 -.13577505
educ | -.01971757 -.01978331 -.01978529 -.02012032
age | -.0033222 -.0033445 -.00334456 -.00335309
------------------------------------------------------------------
As is often the case, the Efron method comes closer to the exact marginal and partial likelihood approaches with substantial less computational effort, although in this application the four sets of results are quite similar.
The Baseline Hazard
As noted in class, after fitting a Cox model we can obtain estimates of the baseline survival or cumulative hazard using extensions of the Kaplan-Meier and Nelson-Aalen estimators.
Estimates of the hazard itself are usually too "spiky" to be useful, but
Stata provides a smoothed hazard through an option of the
stcurve command.
(Although this is a "post-estimation" command, in Stata 10 and earlier
one needs to request calculation of the hazard when the model is fitted.)
By default stcurve computes the baseline hazard or survival
by setting all covariates to their means (not zero).
You can request computation at other values by using the option
at(varname=value). Below I will plot the hazards for
blacks and non-blacks setting all other variables to their means:
. stcurve , hazard at(black=0) at(black=1) . graph export recidhaz.png, width(500) replace (file recidhaz.png written in PNG format)

The estimates suggests that the hazard raises a bit in the first few weeks after release and then declines with duration. This result is consistent with the observation that a log-normal model fits better than a Weibull.
Schoenfeld Residuals and Proportionality of Hazards
Stata can compute the Schoenfeld residuals, which are basically the differences between the values of an individual's covariates at failure and the risk-weighted mean covariates for those at risk, and can use these to construct a test of proportionality of hazards.
There are in fact two tests. Stata can compute a global test, which
requires ordinary Schoenfeld residuals. Stata can also compute a more
detailed variable-by-variable test, but this requires `scaled' Schoenfeld
residuals. Using the
estat phtest command with the detail option
In Stata 10 and earlier both types of residuals must be requested when the model is fit. Stata 11 and later can do the required calculations at the post-estimation stage.
. estat phtest, detail
Test of proportional-hazards assumption
Time: Time
----------------------------------------------------------------
| rho chi2 df Prob>chi2
------------+---------------------------------------------------
workprg | 0.04790 1.27 1 0.2595
priors | -0.08288 3.06 1 0.0802
tserved | -0.09808 3.59 1 0.0580
felon | 0.00891 0.04 1 0.8423
drugs | -0.00486 0.01 1 0.9090
black | 0.04834 1.27 1 0.2591
married | 0.04923 1.38 1 0.2399
educ | -0.06091 1.83 1 0.1766
age | 0.04229 1.31 1 0.2531
------------+---------------------------------------------------
global test | 12.76 9 0.1740
----------------------------------------------------------------
We have no evidence against the assumption of proportionality of hazards. The only variable that might deserve further inspection is time served, which has the largest chi-squared, although it is not significant. Just out of curiosity we can plot the scaled Schoenfeld residuals against time:
. stphtest, plot(tserved) . graph export recidpht.png, width(500) replace (file recidpht.png written in PNG format)

We see no evidence of a trend in the effect of time served, so we have no evidence against the proportionality assumption.
More detailed exploration of can be pursued by introducting interactions with durations, as we demonstrated using Cox's example.
