3.8 Regression Diagnostics for Binary Data
We now consider regression diagnostics for binary data, focusing on logistic regression models. We will work with the additive model of contraceptive use by age, education, and desire for more children, which we know to be inadequate.
Covariate Patterns
Stata offers several tools as part of the predict
and estat post-estimation commands.
These are available after issuing a logit or
logistic command,
with more restricted choices (essentially just fitted
values) after blogit.
When working with individual data Stata relies strongly on the concept of covariate patterns, grouping together all observations that share the same values of the covariates. In particular, it defines as saturated a model that has a separate parameter for each covariate pattern, not for each observation.
In terms of our data on contraceptive use by age, education and
desire for more children, we could work with blogit
and 16 groups, but would have to compute most diagnostics by hand.
Instead we will continue to work with the weighted dataset simulating
individual data and use the logit command.
As it happens we get the same answers because the groups coincide
with the covariate patterns.
To make this point clear we will fit: (1) a model with a separate parameter for each covariate pattern and (2) the additive model
. quietly logit cuse i.age##c.educ##c.nomore [fw=n]
. predict pobs, pr
. scalar sat = e(ll)
. logit cuse i.age educ nomore [fw=n]
Iteration 0: log likelihood = -1001.8468
Iteration 1: log likelihood = -934.92579
Iteration 2: log likelihood = -933.92045
Iteration 3: log likelihood = -933.9192
Iteration 4: log likelihood = -933.9192
Logistic regression Number of obs = 1607
LR chi2(5) = 135.86
Prob > chi2 = 0.0000
Log likelihood = -933.9192 Pseudo R2 = 0.0678
------------------------------------------------------------------------------
cuse | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
2 | .3893816 .1758501 2.21 0.027 .0447219 .7340414
3 | .9086135 .1646211 5.52 0.000 .5859621 1.231265
4 | 1.189239 .21443 5.55 0.000 .7689639 1.609514
|
educ | .3249947 .1240355 2.62 0.009 .0818894 .5680999
nomore | .8329548 .1174705 7.09 0.000 .6027169 1.063193
_cons | -1.966169 .1720307 -11.43 0.000 -2.303343 -1.628995
------------------------------------------------------------------------------
. di -2*e(ll)
1867.8384
. di 2*(sat-e(ll))
29.917222
The additive model has a deviance of 1867.8 when we define the saturated model in terms of individual observations, and a deviance of 29.92 when we define the saturated model in terms of covariate patterns or, equivalently, the 16 groups of women.
Deviance and Pearson Residuals
The predict command can be used to obtain predicted
probabilities, deviance residuals and Pearson residuals, with the
last two defined as the square root of the contribution of a given
covariate pattern to the model deviance or Pearson chi-squared
statistic
. predict pfit, pr // probability . predict dr, dev // deviance residual . predict pr, res // Pearson residual
We will verify that if we square these residuals and sum them over the covariate patterns we get the deviance and Pearson chi-squared statistics. (Recall that we have two rows for each covariate pattern, one with users and one with non-users. The residuals for both rows in each pair are identical and we need to sum just one of them.)
. gen drsq = dr^2 . quietly sum drsq if cuse==1 . di r(sum) 29.917221 . gen prsq = pr^2 . quietly sum prsq if cuse==1 . di r(sum) 28.288336
So the deviance is 29.9 as noted at the outset, and Pearson's chi-squared is 28.3. We now list all cells with squared deviance residuals above 3.84 (same as absolute values above 1.96).
. list age educ nomore pobs pfit pr dr if cuse==1 & pr^2 > 3.84
+---------------------------------------------------------------------+
| age educ nomore pobs pfit pr dr |
|---------------------------------------------------------------------|
8. | <25 Some 1 .1666667 .3082699 -2.375281 -2.514795 |
16. | 25-29 Some 1 .2934783 .3967947 -2.025574 -2.06526 |
26. | 40-49 None 0 .1463415 .3149818 -2.324661 -2.491414 |
+---------------------------------------------------------------------+
We see that a substantial part of the deviance of 29.92 comes from three groups, where the model overestimates the probability of using contraception: women < 25 and 25-29 with some education who want no more children, and women in their forties with lower primary or less who want more children.
Leverage and Influence
Pregibon extended regression diagnostics to GLMs and introduced
a weighted hat matrix. The diagonal elements are leverages and
can be calculated with the lev option of the
predict command.
. predict lev, hat
. sum lev if cuse==1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lev | 16 .375 .1788555 .0902827 .6696332
. gsort -lev
. list n age educ nomore lev dr if cuse==1 in 1/6
+---------------------------------------------------+
| n age educ nomore lev dr |
|---------------------------------------------------|
1. | 52 <25 Some 0 .6696332 1.487477 |
3. | 54 25-29 Some 0 .5774811 1.22864 |
6. | 48 40-49 None 1 .5601446 -.0652542 |
+---------------------------------------------------+
The three cells with potentially the largest influence in the fit are young women with some education who want more children and older women with no education who want no more children.
The elements of the hat matrix can be used to standardize Pearson (or deviance) residuals and to compute influence statistics.
The rs option of the
predict command computes standardized Pearson
residuals. Standardized deviance residuals can be computed
'by hand':
. predict ps, rs // standardized Pearson residual
. gen ds = dr/sqrt(1-lev) // standardized Deviance residual
. gen sc = ps^2
. gsort -sc
. list age educ nomore ps ds if cuse==1 in 1/6
+-----------------------------------------------+
| age educ nomore ps ds |
|-----------------------------------------------|
1. | <25 Some 1 -2.887789 -3.057405 |
3. | 40-49 None 0 -2.720763 -2.91593 |
5. | 25-29 Some 1 -2.687572 -2.740228 |
+-----------------------------------------------+
We identify the same three observations picked up by the unstandardized residuals but the absolute values are now closer to three, highlighting the lack of fit to these groups.
The db option of the
predict command computes the one-step approximation of
Cook's distance. (This statistic is called Pregibon's influence
statistic in the Stata documentation, and their calculation
differs from the formula on page 49 of the notes in that it leaves
out the number of parameters p.)
. predict cook, db
. sum cook if cuse==1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cook | 16 2.827582 3.841448 .0123296 14.3152
. gsort - cook
. list n age educ nomore lev dr cook if cuse==1 in 1/6
+-------------------------------------------------------------+
| n age educ nomore lev dr cook |
|-------------------------------------------------------------|
1. | 52 <25 Some 0 .6696332 1.487477 14.3152 |
3. | 80 30-39 None 1 .5422943 1.645289 7.056133 |
6. | 27 25-29 Some 1 .4319641 -2.06526 5.492778 |
+-------------------------------------------------------------+
The covaraite pattern with the highest leverage turned out to be also the most influential: women under 25 with upper primary or more who want more children.
Goodness of Fit
The estat gof command implements Hosmer and Lemeshow's
goodnes of fit test. By default the command works with covariate
patterns, which in this case is a good idea because we have only 16
groups.
The test compares observed and fitted frequencies in each group
using Pearson's formula and, therefore, yields the same value as the
Pearson chi-squared computed earlier:
. estat gof
Logistic model for cuse, goodness-of-fit test
number of observations = 1607
number of covariate patterns = 16
Pearson chi2(10) = 28.29
Prob > chi2 = 0.0016
The test has 10 d.f. because we have 16 covariate patterns and the model
has 6 parameters.
You can get more detailed output using the table option.
With truly individual data you will find too many covariate patterns,
particularly if you have a few continuous predictors, and the
chi-squared approximation will not be valid. In these cases you
should group the data using the group(#) option,
which relies on the linear predictor to cluster observations with
similar covariate values. Usually about 10 groups are used.
The groups are based on quantiles so they have the same size.
A test with g groups is treated as (approximately) chi-squared
with g-2 d.f.
The Classification Table
The estat classification command produces
a crostabulation of observed and predicted outcomes, where one
predicts a positive outcome if the probability is 0.5 or more
and a negative outcome otherwise.
. estat classif
Logistic model for cuse
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 157 126 | 283
- | 350 974 | 1324
-----------+--------------------------+-----------
Total | 507 1100 | 1607
Classified + if predicted Pr(D) >= .5
True D defined as cuse != 0
--------------------------------------------------
Sensitivity Pr( +| D) 30.97%
Specificity Pr( -|~D) 88.55%
Positive predictive value Pr( D| +) 55.48%
Negative predictive value Pr(~D| -) 73.56%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 11.45%
False - rate for true D Pr( -| D) 69.03%
False + rate for classified + Pr(~D| +) 44.52%
False - rate for classified - Pr( D| -) 26.44%
--------------------------------------------------
Correctly classified 70.38%
--------------------------------------------------
In our application we predict correctly 70.4% of the cases. To put this in perspective, note that only 31.5% use contraception, so if we simply predicted everyone to be a non-user we would get 68.5% correct.
Thus, taking age, education and desire for more children into account reduced classification errors from 0.315 to 0.296, a six percent reduction in error. (We can do a bit better adding the interaction between age and desire.) It is not unusual to find large misclassification probabilities even when the covariates in the model have significant effects.
Other indices of interest, particularly in epidemiological research, are
- the sensitivity, or probability of predicting use among users, here 31%. Think of this as the probability of detecting a disease among people who have it. The complement of this probability is the false negative rate.
- the specificity or probability of predicting non-use among non-users, here 89%. Think of it as the probability of giving a clean bill of health to people who do not have a disease. The complement of this probability is the false positive rate.
Stata can plot the sensitivity and specificity against the cutoff
point using the lsens command. Another graph of interest
in classification problems is the
receiver operating characteristic (ROC) curve,
implemented in the lroc command.
This is a plot of specificity versus 1-sensitivity generated by varying
the cutpoint. A model with no predictive power has a 45-degree ROC line.
Better models have concave ROC curves that lie above the diagonal.
A summary measure of predictive power is the area under the ROC line,
which ranges between 0.5 and 1.0. Our model has an area of 0.67.
Exercises:
1. Compute diagnostic statistics for one of the models with an age by desire interaction and compare results for the three covaraite patterns that had large residuals.
2. Recast the contraceptive use data as 16 groups
(for example using the file cuse.dat in the datasets
section), fit the additive model used here, and compute the
deviance and Pearson residuals from first principles.
Continue with 4 Poisson Models for Count Data

