6.3 The Conditional Logit Model
In this section I will describe an extension of the multinomial logit model that is particularly appropriate in models of choice behavior, where the explanatory variables may include attributes of the choice alternatives (for example cost) as well as characteristics of the individuals making the choices (such as income). To motivate the extension I will first reintroduce the multinomial logit model in terms of an underlying latent variable.
6.3.1 A General Model of Choice
Suppose that Y_{i} represents a discrete choice among J alternatives. Let U_{ij} represent the value or utility of the jth choice to the ith individual. We will treat the U_{ij} as independent random variables with a systematic component h_{ij} and a random component e_{ij} such that
 (6.9) 
 (6.10) 
It can be shown that if the error terms e_{ij} have standard Type I extreme value distributions with density
 (6.11) 
 (6.12) 
In the special case where J = 2, individual I will choose the first alternative if U_{i1}U_{i2} > 0. If the random utilities U_{ij} have independent extreme value distributions, their difference can be shown to have a logistic distribution, and we obtain the standard logistic regression model.
Luce (1959) derived Equation 6.12 starting from a simple requirement that the odds of choosing alternative j over alternative k should be independent of the choice set for all pairs j,k. This property is often referred to as the axiom of independence from irrelevant alternatives. Whether or not this assumption is reasonable (and other alternatives are indeed irrelevant) depends very much on the nature of the choices.
A classical example where the multinomial logit model does not work well is the socalled ``red/blue bus'' problem. Suppose you have a choice of transportation between a train, a red bus and a blue bus. Suppose half the people take the train and half take the bus. Suppose further that people who take the bus are indifferent to the color, so they distribute themselves equally between the red and the blue buses. The choice probabilities of p = (.50, .25, .25) would be consistent with expected utilities of h = (log2, 0, 0).
Suppose now the blue bus service is discontinued. You might expect that all the people who used to take the blue bus would take the red bus instead, leading to a 1:1 split between train and bus. On the basis of the expected utilities of log2 and 0, however, the multinomial logit model would predict a 2:1 split.
Keep this caveat in mind as we consider modeling the expected utilities.
6.3.2 Multinomial Logits
In the usual multinomial logit model, the expected utilities h_{ij} are modeled in terms of characteristics of the individuals, so that

A somewhat restrictive feature of the model is that the same attributes x_{i} are used to model the utilities of all J choices.
6.3.3 Conditional Logits
McFadden (1973) proposed modeling the expected utilities h_{ij} in terms of characteristics of the alternatives rather than attributes of the individuals. If z_{j} represents a vector of characteristics of the jth alternative, then he postulated the model

Note that with J response categories the response margin may be reproduced exactly using any J1 linearly independent attributes of the choices. Generally one would want the dimensionality of z_{j} to be substantially less than J. Consequently, conditional logit models are often used when the number of possible choices is large.
6.3.4 Multinomial/Conditional Logits
A more general model may be obtained by combining the multinomial and conditional logit formulations, so the underlying utilities h_{ij} depend on characteristics of the individuals as well as attributes of the choices, or even variables defined for combinations of individuals and choices (such as an individual's perception of the value of a choice). The general model is usually written as
 (6.13) 
Some statistical packages have procedures for fitting conditional logit models to datasets where each combination of individual and possible choice is treated as a separate observation. These models may also be fit using any package that does Poisson regression. If the last response category is used as the baseline or reference cell, so that h_{iJ} = 0 for all I, then the z_{ij} should be entered in the model as differences from the last category. In other words, you should use z^{*}_{ij} = z_{ij}  z_{iJ} as the predictor.
6.3.5 Multinomial/Conditional Probits
Changing the distribution of the error term in Equation 6.9 leads to alternative models. A popular alternative to the logit models considered so far is to assume that the e_{ij} have independent standard normal distributions for all I,j. The resulting model is called the multinomial/conditional probit model, and produces results very similar to the multinomial/conditional logit model after standardization.
A more attractive alternative is to retain independence across subjects but allow dependence across alternatives, assuming that the vector e_{i} = (e_{i1}, , e_{iJ}) has a multivariate normal distribution with mean vector 0 and arbitrary correlation matrix R. (As usual with latent variable formulations of binary or discrete response models, the variance of the error term cannot be separated from the regression coefficients. Setting the variances to one means that we work with a correlation matrix rather than a covariance matrix.)
The main advantage of this model is that it allows correlation between the utilities that an individual assigns to the various alternatives. The main difficulty is that fitting the model requires evaluating probabilities given by multidimensional normal integrals, a limitation that effectively restricts routine practical application of the model to problems involving no more than three or four alternatives.
For further details on discrete choice models see Chapter 3 in Maddala (1983).
Continue with 6.4. The Hierarchical Logit Model