We now consider models for the probabilities pij. In particular, we would like to consider models where these probabilities depend on a vector xi of covariates associated with the i-th individual or group. In terms of our example, we would like to model how the probabilities of being sterilized, using another method or using no method at all depend on the woman's age.
Perhaps the simplest approach to multinomial data is to nominate one of the response categories as a baseline or reference cell, calculate log-odds for all other categories relative to the baseline, and then let the log-odds be a linear function of the predictors.
Typically we pick the last category as a baseline and calculate the odds that a member of group I falls in category j as opposed to the baseline as pi1/piJ. In our example we could look at the odds of being sterilized rather than using no method, and the odds of using another method rather than no method. For women aged 45-49 these odds are 91:183 (or roughly 1 to 2) and 10:183 (or 1 to 18).
Figure 6.1 shows the empirical log-odds of sterilization and other method (using no method as the reference category) plotted against the mid-points of the age groups. (Ignore for now the solid lines.) Note how the log-odds of sterilization increase rapidly with age to reach a maximum at 30-34 and then decline slightly. The log-odds of using other methods rise gently up to age 25-29 and then decline rapidly.
This model is analogous to a logistic regression model, except that the probability distribution of the response is multinomial instead of binomial and we have J-1 equations instead of one. The J-1 multinomial logit equations contrast each of categories 1, 2, J-1 with category J, whereas the single logistic regression equation is a contrast between successes and failures. If J = 2 the multinomial logit model reduces to the usual logistic regression model.
Note that we need only J-1 equations to describe a variable with J response categories and that it really makes no difference which category we pick as the reference cell, because we can always convert from one formulation to another. In our example with J = 3 categories we contrast categories 1 versus 3 and 2 versus 3. The missing contrast between categories 1 and 2 can easily be obtained in terms of the other two, since log(pi1/pi2) = log(pi1/pi3) - log(pi2/pi3).
Looking at Figure 6.1, it would appear that the logits are a quadratic function of age. We will therefore entertain the model
The multinomial logit model may also be written in terms of the original probabilities pij rather than the log-odds. Starting from Equation 6.3 and adopting the convention that hiJ = 0, we can write
Note that Equation 6.5 will automatically yield probabilities that add up to one for each i.
Estimation of the parameters of this model by maximum likelihood proceeds by maximization of the multinomial likelihood (6.2) with the probabilities pij viewed as functions of the αj and βj parameters in Equation 6.3. This usually requires numerical procedures, and Fisher scoring or Newton-Raphson often work rather well. Most statistical packages include a multinomial logit procedure.
In terms of our example, fitting the quadratic multinomial logit model of Equation 6.4 leads to a deviance of 20.5 on 8 d.f. The associated P-value is 0.009, so we have significant lack of fit.
The quadratic age effect has an associated likelihood-ratio c2 of 500.6 on four d.f. (521.1 - 20.5 = 500.6 and 12 - 8 = 4), and is highly significant. Note that we have accounted for 96% of the association between age and method choice (500.6/521.1 = 0.96) using only four parameters.
|Ster. Vs. None||Other vs. None|
Table 6.2 shows the parameter estimates for the two multinomial logit equations. I used these values to calculate fitted logits for each age from 17.5 to 47.5, and plotted these together with the empirical logits in Figure 6.1. The figure suggests that the lack of fit, though significant, is not a serious problem, except possibly for the 15-19 age group, where we overestimate the probability of sterilization.
Under these circumstances, I would probably stick with the quadratic model because it does a reasonable job using very few parameters. However, I urge you to go the extra mile and try a cubic term. The model should pass the goodness of fit test. Are the fitted values reasonable?
Multinomial logit models may also be fit by maximum likelihood working with an equivalent log-linear model and the Poisson likelihood. (This section will only be of interest to readers interested in the equivalence between these models and may be omitted at first reading.)
First, the model includes a separate parameter qi for each multinomial observation, i.e. each individual or group. This assures exact reproduction of the multinomial denominators ni. Note that these denominators are fixed known quantities in the multinomial likelihood, but are treated as random in the Poisson likelihood. Making sure we get them right makes the issue of conditioning moot.
Second, the model includes a separate parameter α*j for each response category. This allows the counts to vary by response category, permitting non-uniform margins.
Third, the model uses interaction terms xiβ*j to represent the effects of the covariates xi on the log-odds of response j. Once again we have a `step-up' situation, where main effects in a logistic model become interactions in the equivalent log-linear model.
The log-odds that observation I will fall in response category j relative to the last response category J can be calculated from Equation 6.6 as
In terms of our example, we can treat the counts in the original 7 ×3 table as 21 independent Poisson observations, and fit a log-linear model including the main effect of age (treated as a factor), the main effect of contraceptive use (treated as a factor) and the interactions between contraceptive use (a factor) and the linear and quadratic components of age:
Continue with 6.3. The Conditional Logit Model