Home | GLMs | Multilevel | Survival | Demography | Stata | R

Cox Regression Applied to Recidivism Data

We return to the recidivism data that we analyzed using parametric models. Let us start by reading the data and setting them up for survival analysis

. use http://www.stata.com/data/jwooldridge/eacsap/recid, clear

. gen fail = 1 - cens

. stset durat, failure(fail)

     failure event:  fail != 0 & fail < .
obs. time interval:  (0, durat]
 exit on or before:  failure

------------------------------------------------------------------------------
     1445  total obs.
        0  exclusions
------------------------------------------------------------------------------
     1445  obs. remaining, representing
      552  failures in single record/single failure data
    80013  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        81

A Proportional Hazards Model

We fit a proportional hazards model using, as I recommend, the efron method for handling ties

. local model workprg priors tserved felon drugs black married educ age

. stcox `model', efron

         failure _d:  fail
   analysis time _t:  durat

Iteration 0:   log likelihood = -3891.7741
Iteration 1:   log likelihood = -3841.8477
Iteration 2:   log likelihood = -3822.6663
Iteration 3:   log likelihood = -3820.9451
Iteration 4:   log likelihood = -3820.9142
Iteration 5:   log likelihood = -3820.9142
Refining estimates:
Iteration 0:   log likelihood = -3820.9142

Cox regression -- Efron method for ties

No. of subjects =         1445                     Number of obs   =      1445
No. of failures =          552
Time at risk    =        80013
                                                   LR chi2(9)      =    141.72
Log likelihood  =   -3820.9142                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     workprg |   1.054341   .0952997     0.59   0.558     .8831665    1.258691
      priors |   1.100222    .014745     7.13   0.000     1.071699    1.129505
     tserved |   1.012483   .0016826     7.47   0.000     1.009191    1.015786
       felon |   .7361351    .077207    -2.92   0.003     .5993523     .904134
       drugs |    1.30457   .1278366     2.71   0.007     1.076607    1.580803
       black |   1.485848   .1301791     4.52   0.000     1.251406     1.76421
     married |   .8741874   .0953226    -1.23   0.218     .7059735    1.082482
        educ |   .9804111   .0190581    -1.02   0.309     .9437606    1.018485
         age |   .9966611   .0005162    -6.46   0.000     .9956499    .9976733
------------------------------------------------------------------------------

. estimates store efron // for later comparisons

The estimated coefficients are similar to those obtained using a Weibull model. Here's a side by side comparison:

. quietly streg `model', distribution(weibull)

. estimates store weibull

. estimates table efron weibull, equation(1:1) // option needed to match equations

----------------------------------------
    Variable |   efron       weibull    
-------------+--------------------------
#1           |
     workprg |  .05291557    .05775252  
      priors |   .0955123    .09672029  
     tserved |  .01240588    .01285382  
       felon | -.30634168    -.3201125  
       drugs |  .26587357    .27186642  
       black |  .39598547    .41429122  
     married | -.13446046   -.12972325  
        educ | -.01978331   -.02192692  
         age |  -.0033445   -.00345801  
       _cons |              -3.3512677  
-------------+--------------------------
ln_p         |
       _cons |              -.22080169  
----------------------------------------

For example the Cox coefficient for blacks indicates that African Americans face a 48.6% higher risk of recidivism at any duration since release from prison than non-blacks with the same observed characteristics. The Weibull analysis yielded an estimate of 51.3% higher hazard.

The Treatment of Ties

Let us compare all four methods of handling ties. Note if you run this code that the exactp option takes sustantially longer than the others

. local model workprg priors tserved felon drugs black married educ age

. quietly stcox `model', breslow // default

. estimates store breslow

. quietly stcox `model', exactm

. estimates store exactm

. quietly stcox `model', exactp

. estimates store exactp

. estimates table breslow efron exactm exactp

------------------------------------------------------------------
    Variable |  breslow       efron        exactm       exactp    
-------------+----------------------------------------------------
     workprg |  .05284355    .05291557    .05291409    .05315276  
      priors |  .09503255     .0955123    .09551509    .09645497  
     tserved |  .01229776    .01240588    .01240713    .01254959  
       felon | -.30525144   -.30634168    -.3063558   -.30993357  
       drugs |  .26393017    .26587357    .26588358    .26686928  
       black |  .39366079    .39598547    .39600249      .398402  
     married | -.13422414   -.13446046   -.13446367   -.13577505  
        educ | -.01971757   -.01978331   -.01978529   -.02012032  
         age |  -.0033222    -.0033445   -.00334456   -.00335309  
------------------------------------------------------------------

As is often the case, the Efron method comes closer to the exact marginal and partial likelihood approaches with substantial less computational effort, although in this application the four sets of results are quite similar.

The Baseline Hazard

As noted in class, after fitting a Cox model we can obtain estimates of the baseline survival or cumulative hazard using extensions of the Kaplan-Meier and Nelson-Aalen estimators.

Estimates of the hazard itself are usually too "spiky" to be useful, but Stata provides a smoothed hazard through an option of the stcurve command. Although this is a "post-estimation" command, one needs to request calculation of the hazard when the model is fitted.

By default the baseline hazard or survival is computed by setting all covariates to their means (not zero). You can request computation at other values by using the option at(varname=value). Below I will plot the hazards for blacks and non-blacks setting all other variables to their means:

. local model workprg priors tserved felon drugs black married educ age

. quietly stcox `model', efron basehc(h0)

. stcurve , hazard at(black=0) at(black=1)

. graph export recidhaz.png, width(500) replace
(file recidhaz.png written in PNG format)

The estimates suggests that the hazard raises a bit in the first few weeks after release and then declines with duration. This result is consistent with the observation that a log-normal model fits better than a Weibull.

Schoenfeld Residuals and Proportionality of Hazards

Stata can compute the Schoenfeld residuals, which are basically the differences between the values of an individual's covariates at failure and the risk-weighted mean covariates for those at risk, and can use these to construct a test of proportionality of hazards.

There are in fact two tests. Stata can compute a global test, which requires ordinary Schoenfeld residuals. Stata can also compute a more detailed variable-by-variable test, but this requires `scaled' Schoenfeld residuals. Both types of residuals must be requested when the model is fit.

In order to obtain both the lgobal and detailed test I decided to compute both types of residuals, using an `r' prefix for the ordinary and an `rs' prefix for the scaled residuals when fitting the model. Using the estat phtest command with the detail option produces both tests.

. local model workprg priors tserved felon drugs black married educ age

. local residuals rworkprg rpriors rtserved rfelon rdrugs rblack rmarried reduc rage

. local scaledr srworkprg srpriors srtserved srfelon srdrugs srblack srmarried sreduc srage

. quietly stcox `model', efron schoenfeld(`residuals') scaledsch(`scaledr')

. estat phtest, detail

      Test of proportional-hazards assumption

      Time:  Time
      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      workprg     |      0.04790         1.27        1         0.2595
      priors      |     -0.08288         3.06        1         0.0802
      tserved     |     -0.09808         3.59        1         0.0580
      felon       |      0.00891         0.04        1         0.8423
      drugs       |     -0.00486         0.01        1         0.9090
      black       |      0.04834         1.27        1         0.2591
      married     |      0.04923         1.38        1         0.2399
      educ        |     -0.06091         1.83        1         0.1766
      age         |      0.04229         1.31        1         0.2531
      ------------+---------------------------------------------------
      global test |                     12.76        9         0.1740
      ----------------------------------------------------------------

We have no evidence against the assumption of proportionality of hazards. The only variable that might deserve further inspection is time served, which has the largest chi-squared, although it is not significant. Just out of curiosity we can plot the scaled Schoenfeld residuals against time:

. stphtest, plot(tserved)

. graph export recidpht.png, width(500) replace
(file recidpht.png written in PNG format)

We see no evidence of a trend in the effect of time served, so we have no evidence against the proportionality assumption.

More detailed exploration of can be pursued by introducting interactions with durations, as we demonstrated using Cox's example.