Home | GLMs | Multilevel | Survival | Demography | Stata | R

Comparing Continuous and Discrete Models

Let's have another look at the recidivism data. We will split duration into single years with an open-ended category at 5+ and fit a piecewise exponential model with the same covariates as Wooldridge.

We will then treat the data as discrete, assuming all we know is that recidivism occured somewhere in the year. We will fit a binary data model with a logit link, which corresponds to the discrete time model, and using a complementary-log-log link, which corresponds to a grouped continuous time model.

A Piece-wise Exponential Model

We read, set and split the data and then fit our model

. use http://www.stata.com/data/jwooldridge/eacsap/recid, clear

. gen fail = 1-cens

. gen id = _n

. stset durat, fail(fail) id(id)

                id:  id
     failure event:  fail != 0 & fail < .
obs. time interval:  (durat[_n-1], durat]
 exit on or before:  failure

------------------------------------------------------------------------------
     1445  total obs.
        0  exclusions
------------------------------------------------------------------------------
     1445  obs. remaining, representing
     1445  subjects
      552  failures in single failure-per-subject data
    80013  total analysis time at risk, at risk from t =         0
                             earliest observed entry t =         0
                                  last observed exit t =        81

. stsplit year, at(12 24 36 48 60 100) // max is 81
(5273 observations (episodes) created)

. replace year = year/12 
(5273 real changes made)

. local x workprg priors tserved felon alcohol drugs ///
>     black married educ age

. streg i.year `x', distribution(exponential) nohr

         failure _d:  fail
   analysis time _t:  durat
                 id:  id

Iteration 0:   log likelihood = -1739.8944  
Iteration 1:   log likelihood = -1653.6678  
Iteration 2:   log likelihood = -1587.2575  
Iteration 3:   log likelihood = -1583.7542  
Iteration 4:   log likelihood = -1583.7129  
Iteration 5:   log likelihood = -1583.7129  

Exponential regression -- log relative-hazard form 

No. of subjects =         1445                     Number of obs   =      6718
No. of failures =          552
Time at risk    =        80013
                                                   LR chi2(15)     =    312.36
Log likelihood  =   -1583.7129                     Prob > chi2     =    0.0000

------------------------------------------------------------------------------
          _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
          1  |    .036532   .1093659     0.33   0.738    -.1778212    .2508851
          2  |  -.3738156   .1296172    -2.88   0.004    -.6278607   -.1197706
          3  |  -.8115436   .1564067    -5.19   0.000    -1.118095   -.5049921
          4  |  -.9382311   .1683272    -5.57   0.000    -1.268146   -.6083159
          5  |  -1.547178   .2033594    -7.61   0.000    -1.945755   -1.148601
             |
     workprg |   .0838291   .0907983     0.92   0.356    -.0941324    .2617906
      priors |   .0872458   .0134763     6.47   0.000     .0608327     .113659
     tserved |   .0130089   .0016863     7.71   0.000     .0097039    .0163139
       felon |  -.2839252   .1061534    -2.67   0.007     -.491982   -.0758685
     alcohol |   .4324425   .1057254     4.09   0.000     .2252245    .6396605
       drugs |   .2747141   .0978667     2.81   0.005     .0828989    .4665293
       black |   .4335559   .0883658     4.91   0.000     .2603622    .6067497
     married |  -.1540477   .1092154    -1.41   0.158    -.3681059    .0600104
        educ |  -.0214162   .0194453    -1.10   0.271    -.0595283     .016696
         age |    -.00358   .0005223    -6.85   0.000    -.0046037   -.0025563
       _cons |  -3.830127    .280282   -13.67   0.000     -4.37947   -3.280785
------------------------------------------------------------------------------

. estimates store phaz

A Logit Model

For a discrete-time survival analysis we have to make sure we only include intervals with complete exposure, where we can classify the outcome as failure or survival. The convicts were released between July 1, 1977 and June 30, 1978 and the data were collected in April 1984, so the length of observation ranges between 70 and 81 months. We therefore restrict our attention to 5 years or 60 months. (We could go up to 6 years or 72 months for some convicts, but unfortunately we don't have the date of release, so we can't identify these cases and must censor everyone at 60.)

. drop if _t0 >= 60
(921 observations deleted)

. logit _d i.year `x'

Iteration 0:   log likelihood = -1759.0583  
Iteration 1:   log likelihood = -1654.9242  
Iteration 2:   log likelihood = -1637.1916  
Iteration 3:   log likelihood = -1637.1267  
Iteration 4:   log likelihood = -1637.1267  

Logistic regression                               Number of obs   =       5797
                                                  LR chi2(14)     =     243.86
                                                  Prob > chi2     =     0.0000
Log likelihood = -1637.1267                       Pseudo R2       =     0.0693

------------------------------------------------------------------------------
          _d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
          1  |   .0305282   .1193583     0.26   0.798    -.2034098    .2644661
          2  |  -.4131403   .1384065    -2.98   0.003    -.6844119   -.1418686
          3  |  -.8641487   .1639958    -5.27   0.000    -1.185575   -.5427229
          4  |  -.9936625   .1756322    -5.66   0.000    -1.337895   -.6494297
             |
     workprg |   .1109887   .1003087     1.11   0.269    -.0856129    .3075902
      priors |   .0992921   .0164654     6.03   0.000     .0670205    .1315636
     tserved |   .0149221   .0021429     6.96   0.000     .0107221    .0191222
       felon |  -.3196621   .1178117    -2.71   0.007    -.5505687   -.0887555
     alcohol |   .4724998   .1184177     3.99   0.000     .2404055    .7045941
       drugs |    .316729   .1086092     2.92   0.004     .1038589    .5295992
       black |   .4580275   .0973977     4.70   0.000     .2671315    .6489235
     married |  -.2048073   .1204593    -1.70   0.089    -.4409032    .0312885
        educ |  -.0267259   .0215052    -1.24   0.214    -.0688754    .0154235
         age |  -.0040231    .000584    -6.89   0.000    -.0051678   -.0028784
       _cons |  -1.140803   .3084159    -3.70   0.000    -1.745287   -.5363185
------------------------------------------------------------------------------

. estimates store logit

A Complementary Log-Log Model

Finally we use a complementary log-log link

. glm _d i.year `x', family(binomial) link(cloglog)

Iteration 0:   log likelihood = -1818.8831  
Iteration 1:   log likelihood = -1640.7899  
Iteration 2:   log likelihood = -1637.5235  
Iteration 3:   log likelihood = -1637.5083  
Iteration 4:   log likelihood = -1637.5083  

Generalized linear models                          No. of obs      =      5797
Optimization     : ML                              Residual df     =      5782
                                                   Scale parameter =         1
Deviance         =  3275.016541                    (1/df) Deviance =  .5664159
Pearson          =  5908.117581                    (1/df) Pearson  =  1.021812

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(-ln(1-u))             [Complementary log-log]

                                                   AIC             =  .5701253
Log likelihood   =  -1637.50827                    BIC             = -46826.57

------------------------------------------------------------------------------
             |                 OIM
          _d |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |
          1  |   .0216127   .1095455     0.20   0.844    -.1930925    .2363179
          2  |  -.3926148   .1296978    -3.03   0.002    -.6468179   -.1384117
          3  |  -.8249973   .1564479    -5.27   0.000    -1.131629   -.5183651
          4  |  -.9483385   .1683997    -5.63   0.000    -1.278396   -.6182811
             |
     workprg |   .1044651   .0932517     1.12   0.263    -.0783049    .2872351
      priors |   .0887063   .0139849     6.34   0.000     .0612964    .1161163
     tserved |    .013267   .0017417     7.62   0.000     .0098534    .0166806
       felon |  -.2885449   .1091355    -2.64   0.008    -.5024465   -.0746433
     alcohol |   .4397795   .1090665     4.03   0.000      .226013    .6535459
       drugs |   .2991025   .1002774     2.98   0.003     .1025625    .4956425
       black |   .4272096   .0909458     4.70   0.000      .248959    .6054602
     married |  -.1830403   .1137539    -1.61   0.108     -.405994    .0399133
        educ |  -.0233346   .0201545    -1.16   0.247    -.0628367    .0161674
         age |   -.003851   .0005466    -7.04   0.000    -.0049224   -.0027796
       _cons |  -1.238797   .2893845    -4.28   0.000     -1.80598   -.6716138
------------------------------------------------------------------------------

. estimates store cloglog

Comparison of Estimates

All that remains is to compare the estimates

. estimates table phaz cloglog logit, eq(1:1:1)

-----------------------------------------------------
    Variable |    phaz       cloglog       logit     
-------------+---------------------------------------
        year |
          1  |  .03653199    .02161269    .03052816  
          2  | -.37381564   -.39261481   -.41314026  
          3  | -.81154363   -.82499726    -.8641487  
          4  | -.93823111   -.94833849   -.99366251  
          5  | -1.5471779                            
             |
     workprg |   .0838291     .1044651    .11098865  
      priors |  .08724582    .08870634    .09929206  
     tserved |  .01300886    .01326703    .01492214  
       felon | -.28392523   -.28854488    -.3196621  
     alcohol |  .43244249    .43977945    .47249981  
       drugs |  .27471411    .29910246    .31672903  
       black |  .43355595    .42720963     .4580275  
     married | -.15404774   -.18304032   -.20480734  
        educ | -.02141618   -.02333462   -.02672593  
         age |    -.00358   -.00385099   -.00402309  
       _cons | -3.8301275    -1.238797   -1.1408026  
-----------------------------------------------------

As one would expect, the estimates of the relative risks based on the c-log-log link are closer to the continuous time estimates than those based on the logit link.

This result makes sense because the piece wise exponential and c-log-log link models are estimating the same continuous time hazard, one from continuous and one from grouped data, while the logit model is estimating a discrete time hazard.

Recall that in a continuous time model the relative risk multiplies the hazard or instantaneous failure rate, whereas in a discrete time model it multiplies the conditional odds of failure at a given time (or in a given time interval) given survival to that time (or interval).

All three approaches, however, lead to similar predicted survival probabilities.