Home | GLMs | Multilevel | Survival | Demography | Stata | R
Home Lecture Notes Stata Logs Datasets Problem Sets

3.8 Regression Diagnostics for Binary Data

We now consider regression diagnostics for binary data, focusing on logistic regression models. We will work with the additive model of contraceptive use by age, education, and desire for more children, which we know to be inadequate.

Covariate Patterns

Stata offers several tools as part of the predict and estat post-estimation commands. These are available after issuing a logit or logistic command, with more restricted choices (essentially just fitted values) after blogit.

When working with individual data Stata relies strongly on the concept of covariate patterns, grouping together all observations that share the same values of the covariates. In particular, it defines as saturated a model that has a separate parameter for each covariate pattern, not for each observation.

In terms of our data on contraceptive use by age, education and desire for more children, we could work with blogit and 16 groups, but would have to compute most diagnostics by hand. Instead we will continue to work with the weighted dataset simulating individual data and use the logit command. As it happens we get the same answers because the groups coincide with the covariate patterns.

To make this point clear we will fit: (1) a model with a separate parameter for each covariate pattern and (2) the additive model

. quietly logit cuse i.age##c.educ##c.nomore [fw=n]

. predict pobs, pr

. scalar sat = e(ll)

. logit cuse i.age educ nomore [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -934.92579  
Iteration 2:   log likelihood = -933.92045  
Iteration 3:   log likelihood =  -933.9192  
Iteration 4:   log likelihood =  -933.9192  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(5)      =     135.86
                                                  Prob > chi2     =     0.0000
Log likelihood =  -933.9192                       Pseudo R2       =     0.0678

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
          2  |   .3893816   .1758501     2.21   0.027     .0447219    .7340414
          3  |   .9086135   .1646211     5.52   0.000     .5859621    1.231265
          4  |   1.189239     .21443     5.55   0.000     .7689639    1.609514
             |
        educ |   .3249947   .1240355     2.62   0.009     .0818894    .5680999
      nomore |   .8329548   .1174705     7.09   0.000     .6027169    1.063193
       _cons |  -1.966169   .1720307   -11.43   0.000    -2.303343   -1.628995
------------------------------------------------------------------------------

. di -2*e(ll)
1867.8384

. di 2*(sat-e(ll))
29.917222

The additive model has a deviance of 1867.8 when we define the saturated model in terms of individual observations, and a deviance of 29.92 when we define the saturated model in terms of covariate patterns or, equivalently, the 16 groups of women.

Deviance and Pearson Residuals

The predict command can be used to obtain predicted probabilities, deviance residuals and Pearson residuals, with the last two defined as the square root of the contribution of a given covariate pattern to the model deviance or Pearson chi-squared statistic

. predict pfit, pr    // probability

. predict dr, dev     // deviance residual

. predict pr, res     // Pearson residual

We will verify that if we square these residuals and sum them over the covariate patterns we get the deviance and Pearson chi-squared statistics. (Recall that we have two rows for each covariate pattern, one with users and one with non-users. The residuals for both rows in each pair are identical and we need to sum just one of them.)

. gen drsq = dr^2

. quietly sum drsq if cuse==1

. di r(sum)
29.917221

. gen prsq = pr^2

. quietly sum prsq if cuse==1

. di r(sum)
28.288336

So the deviance is 29.9 as noted at the outset, and Pearson's chi-squared is 28.3. We now list all cells with squared deviance residuals above 3.84 (same as absolute values above 1.96).

. list age educ nomore pobs pfit pr dr if cuse==1 & pr^2 > 3.84

     +---------------------------------------------------------------------+
     |   age   educ   nomore       pobs       pfit          pr          dr |
     |---------------------------------------------------------------------|
  8. |   <25   Some        1   .1666667   .3082699   -2.375281   -2.514795 |
 16. | 25-29   Some        1   .2934783   .3967947   -2.025574    -2.06526 |
 26. | 40-49   None        0   .1463415   .3149818   -2.324661   -2.491414 |
     +---------------------------------------------------------------------+

We see that a substantial part of the deviance of 29.92 comes from three groups, where the model overestimates the probability of using contraception: women < 25 and 25-29 with some education who want no more children, and women in their forties with lower primary or less who want more children.

Leverage and Influence

Pregibon extended regression diagnostics to GLMs and introduced a weighted hat matrix. The diagonal elements are leverages and can be calculated with the lev option of the predict command.

. predict lev, hat

. sum lev if cuse==1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         lev |        16        .375    .1788555   .0902827   .6696332

. gsort -lev

. list n age educ nomore lev dr if cuse==1 in 1/6

     +---------------------------------------------------+
     |  n     age   educ   nomore        lev          dr |
     |---------------------------------------------------|
  1. | 52     <25   Some        0   .6696332    1.487477 |
  3. | 54   25-29   Some        0   .5774811     1.22864 |
  6. | 48   40-49   None        1   .5601446   -.0652542 |
     +---------------------------------------------------+

The three cells with potentially the largest influence in the fit are young women with some education who want more children and older women with no education who want no more children.

The elements of the hat matrix can be used to standardize Pearson (or deviance) residuals and to compute influence statistics.

The rs option of the predict command computes standardized Pearson residuals. Standardized deviance residuals can be computed 'by hand':

. predict ps, rs           // standardized Pearson residual

. gen ds = dr/sqrt(1-lev)  // standardized Deviance residual

. gen sc = ps^2

. gsort -sc

. list age educ nomore ps ds if cuse==1 in 1/6

     +-----------------------------------------------+
     |   age   educ   nomore          ps          ds |
     |-----------------------------------------------|
  1. |   <25   Some        1   -2.887789   -3.057405 |
  3. | 40-49   None        0   -2.720763    -2.91593 |
  5. | 25-29   Some        1   -2.687572   -2.740228 |
     +-----------------------------------------------+

We identify the same three observations picked up by the unstandardized residuals but the absolute values are now closer to three, highlighting the lack of fit to these groups.

The db option of the predict command computes the one-step approximation of Cook's distance. (This statistic is called Pregibon's influence statistic in the Stata documentation, and their calculation differs from the formula on page 49 of the notes in that it leaves out the number of parameters p.)

. predict cook, db

. sum cook if cuse==1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        cook |        16    2.827582    3.841448   .0123296    14.3152

. gsort - cook

. list n age educ nomore lev dr cook if cuse==1 in 1/6

     +-------------------------------------------------------------+
     |  n     age   educ   nomore        lev         dr       cook |
     |-------------------------------------------------------------|
  1. | 52     <25   Some        0   .6696332   1.487477    14.3152 |
  3. | 80   30-39   None        1   .5422943   1.645289   7.056133 |
  6. | 27   25-29   Some        1   .4319641   -2.06526   5.492778 |
     +-------------------------------------------------------------+

The covaraite pattern with the highest leverage turned out to be also the most influential: women under 25 with upper primary or more who want more children.

Goodness of Fit

The estat gof command implements Hosmer and Lemeshow's goodnes of fit test. By default the command works with covariate patterns, which in this case is a good idea because we have only 16 groups. The test compares observed and fitted frequencies in each group using Pearson's formula and, therefore, yields the same value as the Pearson chi-squared computed earlier:

. estat gof

Logistic model for cuse, goodness-of-fit test

       number of observations =      1607
 number of covariate patterns =        16
             Pearson chi2(10) =        28.29
                  Prob > chi2 =         0.0016

The test has 10 d.f. because we have 16 covariate patterns and the model has 6 parameters. You can get more detailed output using the table option.

With truly individual data you will find too many covariate patterns, particularly if you have a few continuous predictors, and the chi-squared approximation will not be valid. In these cases you should group the data using the group(#) option, which relies on the linear predictor to cluster observations with similar covariate values. Usually about 10 groups are used. The groups are based on quantiles so they have the same size. A test with g groups is treated as (approximately) chi-squared with g-2 d.f.

The Classification Table

The estat classification command produces a crostabulation of observed and predicted outcomes, where one predicts a positive outcome if the probability is 0.5 or more and a negative outcome otherwise.

. estat classif

Logistic model for cuse

              -------- True --------
Classified |         D            ~D  |      Total
-----------+--------------------------+-----------
     +     |       157           126  |        283
     -     |       350           974  |       1324
-----------+--------------------------+-----------
   Total   |       507          1100  |       1607

Classified + if predicted Pr(D) >= .5
True D defined as cuse != 0
--------------------------------------------------
Sensitivity                     Pr( +| D)   30.97%
Specificity                     Pr( -|~D)   88.55%
Positive predictive value       Pr( D| +)   55.48%
Negative predictive value       Pr(~D| -)   73.56%
--------------------------------------------------
False + rate for true ~D        Pr( +|~D)   11.45%
False - rate for true D         Pr( -| D)   69.03%
False + rate for classified +   Pr(~D| +)   44.52%
False - rate for classified -   Pr( D| -)   26.44%
--------------------------------------------------
Correctly classified                        70.38%
--------------------------------------------------

In our application we predict correctly 70.4% of the cases. To put this in perspective, note that only 31.5% use contraception, so if we simply predicted everyone to be a non-user we would get 68.5% correct.

Thus, taking age, education and desire for more children into account reduced classification errors from 0.315 to 0.296, a six percent reduction in error. (We can do a bit better adding the interaction between age and desire.) It is not unusual to find large misclassification probabilities even when the covariates in the model have significant effects.

Other indices of interest, particularly in epidemiological research, are

Stata can plot the sensitivity and specificity against the cutoff point using the lsens command. Another graph of interest in classification problems is the receiver operating characteristic (ROC) curve, implemented in the lroc command. This is a plot of specificity versus 1-sensitivity generated by varying the cutpoint. A model with no predictive power has a 45-degree ROC line. Better models have concave ROC curves that lie above the diagonal. A summary measure of predictive power is the area under the ROC line, which ranges between 0.5 and 1.0. Our model has an area of 0.67.

Exercises:

1. Compute diagnostic statistics for one of the models with an age by desire interaction and compare results for the three covaraite patterns that had large residuals.

2. Recast the contraceptive use data as 16 groups (for example using the file cuse.dat in the datasets section), fit the additive model used here, and compute the deviance and Pearson residuals from first principles.


Continue with 4 Poisson Models for Count Data