![]() |
|
![]() | |||
|
|
|||||
All the models considered so far use the logit transformation of the probabilities, but other choices are possible. In fact, any transformation that maps probabilities into the real line could be used to produce a generalized linear model, as long as the transformation is one-to-one, continuous and differentiable.
In particular, suppose F(.) is the cumulative distribution function (c.d.f.) of a random variable defined on the real line, and write
|
|
Popular choices of c.d.f.'s in this context are the normal, logistic and extreme value distributions. In this section we motivate this general approach by introducing models for binary data in terms of latent variables.
Let Yi denote a random variable representing a binary response coded zero and one, as usual. We will call Yi the manifest response. Suppose that there is an unobservable continuous random variable Y*i which can take any value in the real line, and such that Yi takes the value one if an only if Y*i exceeds a certain threshold q. We will call Y*i the latent response. Figure 3.6 shows the relationship between the latent variable and the response when the threshold is zero.

The interpretation of Yi and Y*i depends on the context. An economist, for example, may view Yi as a binary choice, such as purchasing or renting a home, and Y*i as the difference in the utilities of purchasing and renting. A psychologist may view Yi as a response to an item in an attitude scale, such as agreeing or disagreeing with school vouchers, and Y*i as the underlying attitude. Biometricians often view Y*i as a dose and Yi as a response, hence the name dose-response models.
Since a positive outcome occurs only when the latent response exceeds the threshold, we can write the probability pi of a positive outcome as
|
Suppose now that the outcome depends on a vector of covariates x. To model this dependence we use an ordinary linear model for the latent variable, writing
| (3.15) |
Under this model, the probability pi of observing a positive outcome is
| ||||||||||||||||||||||
|
| (3.16) |
| (3.17) |
The obvious choice of an error distribution is the normal. Assuming that the error term has a standard normal distribution Ui ~ N(0,1), the results of the previous section lead to
|
|
It is instructive to consider the more general case where the error term Ui ~ N(0,s2) has a normal distribution with variance s2. Following the same steps as before we find that
|
This development shows that we cannot identify b and s separately, because the probability depends on them only through their ratio b/s. This is another way of saying that the scale of the latent variable is not identified. We therefore take s = 1, or equivalently interpret the b's in units of standard deviation of the latent variable.
As a simple example, consider fitting a probit model to the contraceptive use data by age and desire for more children. In view of the results in Section 3.5, we introduce a main effect of wanting no more children, a linear effect of age, and a linear age by desire interaction. Fitting this model gives a deviance of 8.91 on four d.f. Estimates of the parameters and standard errors appear in Table 3.16
| Parameter | Symbol | Estimate | Std. Error | z-ratio |
| Constant | a1 | -0.7297 | 0.0460 | -15.85 |
| Age | b1 | 0.0129 | 0.0061 | 2.13 |
| Desire | a2-a1 | 0.4572 | 0.0731 | 6.26 |
| Age × Desire | b2-b1 | 0.0305 | 0.0092 | 3.32 |
To interpret these results we imagine a latent continuous variable representing the woman's motivation to use contraception (or the utility of using contraception, compared to not using). At the average age of 30.6, not wanting more children increases the motivation to use contraception by almost half a standard deviation. Each year of age is associated with an increase in motivation of 0.01 standard deviations if she wants more children and 0.03 standard deviations more (for a total of 0.04) if she does not. In the next section we compare these results with logit estimates.
A slight disadvantage of using the normal distribution as a link for binary response models is that the c.d.f. does not have a closed form, although excellent numerical approximations and computer algorithms are available for computing both the normal probability integral and its inverse, the probit.
An alternative to the normal distribution is the standard logistic distribution, whose shape is remarkably similar to the normal distribution but has the advantage of a closed form expression
|
|
Thus, coefficients in a logit regression model can be interpret not only in terms of log-odds, but also as effects of the covariates on a latent variable that follows a linear model with logistic errors.
The logit and probit transformations are almost linear functions of each other for values of pi in the range from 0.1 to 0.9, and therefore tend to give very similar results. Comparison of probit and logit coefficients should take into account the fact that the standard normal and the standard logistic distributions have different variances. Recall that with binary data we can only estimate the ratio b/s. In probit analysis we have implicitly set s = 1. In a logit model, by using a standard logistic error term, we have effectively set s = p/3. Thus, coefficients in a logit model should be standardized dividing by p/3 before comparing them with probit coefficients.

Figure 3.7 compares the logit and probit links (and a third link discussed below) after standardizing the logits to unit variance. The solid line is the probit and the dotted line is the logit divided by p/3. As you can see, they are barely distinguishable.
To illustrate the similarity of these links in practice, consider our models of contraceptive use by age and desire for more children in Tables 3.10 and 3.16. The deviance of 9.14 for the logit model is very similar to the deviance of 8.91 for the probit model, indicating an acceptable fit. The Wald tests of individual coefficients are also very similar, for example the test for the effect of wanting no more children at age 30.6 is 6.22 in the logit model and 6.26 in the probit model. The coefficients themselves look somewhat different, but of course they are not standardized. The effect of wanting no more children at the average age is 0.758 in the logit scale. Dividing by p/3, the standard deviation of the underlying logistic distribution, we find this effect equivalent to an increase in the latent variable of 0.417 standard deviations. The probit analysis estimates the effect as 0.457 standard deviations.
A third choice of link is the complementary log-log transformation
|
|
This particular choice of link function can also be obtained from our general latent variable formulation if we assume that -Ui (note the minus sign) has a standard extreme value distribution, so the error term itself has a reverse extreme value distribution, with c.d.f.
|
Inverting the reverse extreme value c.d.f. and applying Equation 3.17, which is valid for both symmetric and asymmetric distributions, we find that the link corresponding to this error distribution is the complementary log-log.
Thus, coefficients in a generalized linear model with binary response and a complementary log-log link can be interpreted as effects of the covariates on a latent variable which follows a linear model with reverse extreme value errors.
To compare these coefficients with estimates based on a probit analysis we should standardize them, dividing by p/6. To compare coefficients with logit analysis we should divide by 2, or standardize both c-log-log and logit coefficients.
Figure 3.7 compares the c-log-log link with the probit and logit after standardizing it to have mean zero and variance one. Although the c-log-log link differs from the other two, one would need extremely large sample sizes to be able to discriminate empirically between these links.
The complementary log-log transformation has a direct interpretation in terms of hazard ratios, and thus has practical applications in terms of hazard models, as we shall see later in the sequel.
Continue with 3.8. Regression Diagnostics for Binary Data
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.