![]() |
|
![]() | |||
|
|
|||||
We now consider in more detail linear models for three-way contingency tables, focusing on testing various forms of complete and partial independence using the equivalent Poisson models.
Table classifies 4991 Wisconsin male high school seniors according to socio-economic status (low, lower middle, upper middle, and high), the degree of parental encouragement they receive (low and high) and whether or not they have plans to attend college (no, yes). This is part of a larger table found in Fienberg (1977, p. 101).
| Social stratum |
Parental encouragement |
College Plans | Total | |
| No | Yes | |||
| Lower | Low | 749 | 35 | 784 |
| High | 233 | 133 | 366 | |
| Lower Middle | Low | 627 | 38 | 665 |
| High | 330 | 303 | 633 | |
| Upper Middle | Low | 420 | 37 | 457 |
| High | 374 | 467 | 841 | |
| Higher | Low | 153 | 26 | 179 |
| High | 266 | 800 | 1066 | |
| Total | 3152 | 1938 | 4991 | |
In our analysis of these data we will view all three variables as responses, and we will study the extent to which they are associated. In this process we will test various hypotheses of complete and partial independence.
Let us first introduce some notation. We will use three subscripts to identify the cells in an I ×J ×K table, with i indexing the I rows, j indexing the J columns and k indexing the K layers. In our example I = 4, J = 2, and K = 2 for a total of 16 cells.
Let pijk denote the probability that an observation falls in cell (i,j,k). In our example, this cell represents category i of socio-economic status (S), category j of parental encouragement (E) and category k of college plans (P). These probabilities define the joint distribution of the three variables.
We also let yijk denote the observed count in cell (i,j,k), which we treat as a realization of a random variable Yijk having a multinomial or Poisson distribution.
We will also use the dot convention to indicate summing over a subscript, so pi.. is the marginal probability that an observation falls in row i and yi.. is the number of observations in row i. The notation extends to two dimensions, so pij. is the marginal probability that an observation falls in row i and columnj and yij. is the corresponding count.
In practice we will treat the Yijk as independent Poisson random variables with means mijk = n pijk, and we will fit log-linear models to the expected counts.
Table lists all possible models of interest in the Poisson context that include all three variables, starting with the three-factor additive model S+E+P on status, encouragement and plans, and moving up towards the saturated model S E P. For each model we list the abbreviated model formula, the deviance and the degrees of freedom.
| Model | Deviance | d.f. |
| S+E+P | 2714.0 | 10 |
| SE + P | 1877.4 | 7 |
| SP + E | 1920.4 | 7 |
| S + EP | 1092.0 | 9 |
| SE + SP | 1083.8 | 4 |
| SE + EP | 255.5 | 6 |
| SP + EP | 298.5 | 6 |
| SE+SP+EP | 1.575 | 3 |
We now switch to a multinomial context, where we focus on the joint distribution of the three variables S, E and P. We consider four different types of models that may be of interest in this case, and discuss their equivalence to one of the above Poisson models.
The simplest possible model of interest in the multinomial context is the model of complete independence, where the joint distribution of the three variables is the product of the marginals. The corresponding hypothesis is
| (5.9) |
Under this model the logarithms of the expected cell counts are given by
|
|
The m.l.e.'s of the probabilities under the model of complete independence turn out to be, as you might expect, the products of the marginal proportions. Therefore, the m.l.e.'s of the expected counts under complete independence are
|
To test the hypothesis of complete independence we compare the maximized multinomial log-likelihoods under the model of independence and under the saturated model. Because of the equivalence between multinomial and Poisson models, however, the resulting likelihood ratio statistic is exactly the same as the deviance for the Poisson additive model.
In our example the deviance of the additive model is 2714 with 10 d.f., and is highly significant. We therefore conclude that the hypothesis that social status, parental encouragement and college plans are completely independent is clearly untenable.
The next three log-linear models in Table 5.3 involve one of the two-factor interaction terms. As you might expect from our analysis of a two-by-two table, the presence of an interaction term indicates the existence of association between those two variables.
For example the model SE+P indicates that S and E are associated, but are jointly independent of P. In terms of our example this hypothesis would state that social status and parental encouragement are associated with each other, and are jointly independent of college plans.
Under this hypothesis the joint distribution of the three variables factors into the product of two blocks, representing S and E on one hand and P on the other. Specifically, the hypothesis of block independence is
| (5.10) |
|
To test the hypothesis of block independence we compare the maximized multinomial log-likelihood under the restrictions imposed by Equation 10 with the maximized log-likelihood for the saturated model. Because of the equivalence between multinomial and Poisson models, however, the test statistic would be exactly the same as the deviance for the model SE+P.
In our example the deviance for the model with the SE interaction and a main effect of P is 1877.4 on 7 d.f., and is highly significant. We therefore reject the hypothesis that college plans are independent of social status and parental encouragement.
There are two other models with one interaction term. The model SP+E has a deviance of 1920.4 on 7 d.f., so we reject the hypothesis that parental encouragement is independent of social status and college plans. The model EP+S is the best fitting of this lot, but the deviance of 1092.0 on 9 d.f. is highly significant, so we reject the hypothesis that parental encouragement and college plans are associated but are jointly independent of social status.
The next three log-linear models in Table 5.3 involve two of the three possible two-factor interactions, and thus correspond to cases where two pairs of categorical variables are associated. For example the log-linear model SE+SP corresponds to the case where S and E are associated and so are S and P. In terms of our example we would assume that social status affects both parental encouragement and college plans. The figure below shows this model in path diagram form.

Note that we have assumed no direct link between E and P, that is, the model assumes that parental encouragement has no direct effect on college plans. In a two-way crosstabulation these two variables would appear to be associated because of their common dependency on social status S. However, conditional on social status S, parental encouragement E and college plans P would be independent.
Thus, the model assumes a form of partial or conditional independence, where the joint conditional distribution of EP given S is the product of the marginal conditional distributions of E given S and P given S. In symbols,
|
|
|
| (5.11) |
|
In terms of our example, the deviance of the model with SE and SP interactions is 1083.8 on 4 d.f., and is highly significant. We therefore reject the hypothesis that parental encouragement and college plans are independent within each social stratum.
There are two other models with two interaction terms. Although both of them have smaller deviances than any of the models considered so far, they still show significant lack of fit. The model SP+EP has a deviance of 298.5 on 6 d.f., so we reject the hypothesis that given college plans P social status S and parental encouragement E are mutually independent. The best way to view this model in causal terms is by assuming that S and E are unrelated and both have effects on P, as shown in the path diagram below.

The model SE+EP has a deviance of 255.5 on 6 d.f., and leads us to reject the hypothesis that given parental encouragement E, social class S and college plans P are independent. In causal terms one might interpret this model as postulating that social class affects parental encouragement which in turn affects college plans, with no direct effect of social class on college plans.

Note that all models consider so far have had explicit formulas for the m.l.e.'s, so no iteration has been necessary and we could have calculated all test statistics using the multinomial likelihood directly. An interesting property of the iterative proportional fitting algorithm mentioned earlier, and which is used by software specializing in contingency tables, is that it converges in one cycle in all these cases. The same is not true of the iteratively re-weighted least squares algorithm used in Poisson regression, which will usually require a few iterations.
The only log-linear model remaining in Table 5.3 short of the saturated model is the model involving all three two-factor interactions. In this model we have a form of association between all pairs of variables, S and E, S and P, as well as E and P. Thus, social class is associated with parental encouragement and with college plans, and in addition parental encouragement has a direct effect on college plans.
How do we interpret the lack of a three-factor interaction? To answer this question we start from what we know about interaction effects in general and adapt it to the present context, where interaction terms in models for counts represent association between the underlying classification criteria. The conclusion is that in this model the association between any two of the variables is the same at all levels of the third.
This model has no simple interpretation in terms of independence, and as a result we cannot write the structure of the joint probabilities in terms of the two-way margins. In particular
|
A consequence of this fact is that the m.l.e.'s cannot be written in closed form and must be calculated using an iterative procedure. They do, however, depend only on the three two-way margins SE, SP and EP.
In terms of our example, the model SP+SE+EP has a deviance of 1.6 on three d.f., and therefore fits the data quite well. We conclude that we have no evidence against the hypothesis that all three variables are associated, but the association between any two is the same at all levels of the third. In particular, we may conclude that the association between parental encouragement E and college plans P is the same in all social strata.
To further appreciate the nature of this model, we give the fitted values in Table . Comparison of the estimated expected counts in this table with the observed counts in Table 5.2 highlights the goodness of fit of the model.
| Social stratum |
Parental encouragement |
College Plans | |
| No | Yes | ||
| Lower | Low | 753.1 | 30.9 |
| High | 228.9 | 137.1 | |
| Lower Middle | Low | 626.0 | 39.0 |
| High | 331.0 | 302.0 | |
| Upper Middle | Low | 420.9 | 36.1 |
| High | 373.1 | 467.9 | |
| Higher | Low | 149.0 | 30.0 |
| High | 270.0 | 796.0 | |
We can also use the fitted values to calculate measures of association between parental encouragement E and college plans P for each social stratum. For the lowest group, the odds of making college plans are barely one to 24.4 with low parental encouragement, but increase to one to 1.67 with high encouragement, giving an odds ratio of 14.6. If you repeat the calculation for any of the other three social classes you will find exactly the same ratio of 14.6.
We can verify that this result follows directly from the lack of a three-factor interaction in the model. The logs of the expected counts in this model are
|
|
|
Our analysis so far has treated the three classification criteria as responses, and has focused on their correlation structure. An alternative approach would treat one of the variables as a response and the other two as predictors in a regression framework. We now compare these two approaches in terms of our example on educational aspirations, treating college plans as a dichotomous response and socio-economic status and parental encouragement as discrete predictors.
To this end, we treat each of the 16 rows in Table 5.2 as a group. Let Yij denote the number of high school seniors who plan to attend college out of the nij seniors in category i of socio-economic status and category j of parental encouragement. We assume that these 16 counts are independent and have binomial distributions with Yij ~ B(nij,pij), where pij is the probability of making college plans We can then fit logistic regression models to study how the probabilities depend on social stratum and parental encouragement.
| Model | Deviance | d.f. |
| Null | 1877.4 | 7 |
| S | 1083.8 | 4 |
| E | 255.5 | 6 |
| S+E | 1.575 | 3 |
Table 5.5 shows the results of fitting four possible logit models of interest, ranging from the null model to the additive model on socioeconomic status (S) and parental encouragement (E). It is clear from these results that both social class and encouragement have significant gross and net effects on the probability of making college plans. The best fitting model is the two-factor additive model, with a deviance of 1.6 on three d.f. Table shows parameter estimates for the additive model.
| Variable | Category | Estimate | Std. Err. |
| Constant | -3.195 | 0.119 | |
| Socio- economic status |
low | - | - |
| lower middle | 0.420 | 0.118 | |
| upper middle | 0.739 | 0.114 | |
| high | 1.593 | 0.115 | |
| Parental encouragement | low | - | - |
| high | 2.683 | 0.099 | |
Exponentiating the estimates we see that the odds of making college plans increase five-fold as we move from low to high socio-economic status. Furthermore, in each social stratum, the odds of making college plans among high school seniors with high parental encouragement are 14.6 times the odds among seniors with low parental encouragement.
The conclusions of this analysis are consistent with those from the previous subsection, except that this time we do not study the association between social stratification and parental encouragement, but focus on their effect on making college plans. In fact it is not just the conclusions, but all estimates and tests of significance, that agree. A comparison of the binomial deviances in Table 5.5 with the Poisson deviances in Table 5.3 shows the following `coincidences':
| log-linear model | logit model |
| SE+P | Null |
| SE+SP | S |
| SE+EP | E |
| SE+SP+EP | S+E |
The models listed as equivalent have similar interpretations if you translate from the language of correlation analysis to the language of regression analysis. Note that all the log-linear models include the SE interaction, so they allow for association between the two predictors. Also, all of them include a main effect of the response P, allowing it to have a non-uniform distribution. The log-linear model with just these two terms assumes no association between P and either S or E, and is thus equivalent to the null logit model.
The log-linear model with an SP interaction allows for an association between S and P, and is therefore equivalent to the logit model where the response depends only on S. A similar remark applies to the log-linear model with an EP interaction. Finally, the log-linear model with all three two-factor interactions allows for associations between S and P, and between E and P, and assumes that in each case the strength of association does not depend on the other variable. But this is exactly what the additive logit model assumes: the response depends on both S and E, and the effect of each factor is the same at all levels of the other predictor.
In general, log-linear and logit models are equivalent as long as the log-linear model
includes a main effect for the factor treated as response (in our example P), and
includes a two-factor (or higher order) interaction between a predictor and the response for each main effect (or interaction) included in the logit model (in our example it includes SP for the main effect of S, and son on).
This equivalence extends to parameter estimates as well as tests of significance. For example, multiplying the fitted probabilities based on the additive logit model S+E by the sample sizes in each category of social status and parental encouragement leads to the same expected counts that we obtained earlier from the log-linear model SE+SP+EP. An interesting consequence of this fact is that one can use parameter estimates based on a log-linear model to calculate logits, as we did in Section 5.2.6, and obtain the same results as in logistic regression. For example the log of the odds ratio summarizing the effect of parental encouragement on college plans within each social stratum was estimated as 2.683 in the previous subsection, and this value agrees exactly with the estimate on Table 5.6.
In our example the equivalence depends crucially on the fact that the log-linear models include the SE interaction, and therefore reproduce exactly the binomial denominators used in the logistic regression. But what would have happened if the SE interaction had turned out to be not significant? There appear to be two schools of thought on this matter.
Bishop et al. (1975), in a classic boook on the multivariate analysis of qualitative data, emphasize log-linear models because they provide a richer analysis of the structure of association among all factors, not just between the predictors and the response. If the SE interaction had turned out to be not significant they would probably leave it out of the model. They would still be able to translate their parameter estimates into fitted logits, but the results would not coincide exactly with the logistic regression analysis (although they would be rather similar if the omitted interaction is small.)
Cox (1972), in a classic book on the analysis of binary data, emphasizes logit models. He argues that if your main interest is on the effects of two variables, say S and E on a third factor, say P, then you should condition on the SE margin. This means that if you are fitting log-linear models with the intention of understanding effects on P, you would include the SE interaction even if it is not significant. In that case you would get exactly the same results as a logistic regression analysis, which is probably what you should have done in the first place if you wanted to study specifically how the response depends on the predictors.
Continue with 6. Multinomial Response Models
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.