![]() |
|
![]() | |||
|
|
|||||
Let us take a more general look at logistic regression models with a single predictor by considering the comparison of k groups. This will help us illustrate the logit analogues of one-way analysis of variance and simple linear regression models.
Consider a cross-tabulation of contraceptive use by age, as summarized in Table 3.4. The structure of the data is the same as in the previous section, except that we now have four groups rather than two.
| Age i |
Using yi |
Not Using ni-yi |
Total ni |
| < 25 | 72 | 325 | 397 |
| 25-29 | 105 | 299 | 404 |
| 30-39 | 237 | 375 | 612 |
| 40-49 | 93 | 101 | 194 |
| Total | 507 | 1100 | 1607 |
The analysis of this table proceeds along the same lines as in the two-by-two case. The null model yields exactly the same estimate of the overall logit and its standard error as before. The deviance, however, is now 79.2 on three d.f. This value is highly significant, indicating that the assumption of a common probability of using contraception for the four age groups is not tenable.
Consider now a one-factor model, where we allow each group or level of the discrete factor to have its own logit. We write the model as
|
Fitting this model to Table 3.4 leads to the parameter estimates and standard errors in Table 3.5. The deviance for this model is of course zero because the model is saturated: it uses four parameters to model four groups.
| Parameter | Symbol | Estimate | Std. Error | z-ratio | |
| Constant | h | -1.507 | 0.130 | -11.57 | |
| Age | 25-29 | a2 | 0.461 | 0.173 | 2.67 |
| 30-39 | a3 | 1.048 | 0.154 | 6.79 | |
| 40-49 | a4 | 1.425 | 0.194 | 7.35 | |
The baseline logit of -1.51 for women under age 25 corresponds to odds of 0.22. Exponentiating the age coefficients we obtain odds ratios of 1.59, 2.85 and 4.16. Thus, the odds of using contraception increase by 59% and 185% as we move to ages 25-29 and 30-39, and are quadrupled for ages 40-49, all compared to women under age 25.
All of these estimates can be obtained directly from the frequencies in Table 3.4 in terms of the logits of the observed proportions. For example the constant is logit(72/397) = -1.507, and the effect for women 25-29 is logit(105/404) minus the constant.
To test the hypothesis of no age effects we can compare this model with the null model. Since the present model is saturated, the difference in deviances is exactly the same as the deviance of the null model, which was 79.2 on three d.f. and is highly significant. An alternative test of
|
|
|
Note that the estimated logits in Table 3.5 (and therefore the odds and probabilities) increase monotonically with age. In fact, the logits seem to increase by approximately the same amount as we move from one age group to the next. This suggests that the effect of age may actually be linear in the logit scale.
To explore this idea we treat age as a variate rather than a factor. A thorough exploration would use the individual data with age in single years (or equivalently, a 35 by two table of contraceptive use by age in single years from 15 to 49). However, we can obtain a quick idea of whether the model would be adequate by keeping age grouped into four categories but representing these by the mid-points of the age groups. We therefore consider a model analogous to simple linear regression, where
|
Fitting this model gives a deviance of 2.40 on two d.f. , which indicates a very good fit. The parameter estimates and standard errors are shown in Table 3.6. Incidentally, there is no explicit formula for the estimates of the constant and slope in this model, so we must rely on iterative procedures to obtain the estimates.
| Parameter | Symbol | Estimate | Std. Error | z-ratio |
| Constant | a | -2.673 | 0.233 | -11.46 |
| Age (linear) | b | 0.061 | 0.007 | 8.54 |
The slope indicates that the logit of the probability of using contraception increases 0.061 for every year of age. Exponentiating this value we note that the odds of using contraception are multiplied by 1.063-that is, increase 6.3%-for every year of age. Note, by the way, that eb 1+b for small |b|. Thus, when the logit coefficient is small in magnitude, 100b provides a quick approximation to the percent change in the odds associated with a unit change in the predictor. In this example the effect is 6.3% and the approximation is 6.1%.
To test the significance of the slope we can use the Wald test, which gives a z statistic of 8.54 or equivalently a chi-squared of 73.9 on one d.f. Alternatively, we can construct a likelihood ratio test by comparing this model with the null model. The difference in deviances is 76.8 on one d.f. Comparing these results with those in the previous subsection shows that we have captured most of the age effect using a single degree of freedom.

Adding the estimated constant to the product of the slope by the mid-points of the age groups gives estimated logits at each age, and these may be compared with the logits of the observed proportions using contraception. The results of this exercise appear in Figure 3.2. The visual impression of the graph confirms that the fit is quite good. In this example the assumption of linear effects on the logit scale leads to a simple and parsimonious model. It would probably be worthwhile to re-estimate this model using the individual ages.
Continue with 3.5. Models With Two Predictors
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.