5. Log-Linear Models for Contingency Tables Table of Contents 6. Multinomial Response Models

5.2  Models for Three-Dimensional Tables

We now consider in more detail linear models for three-way contingency tables, focusing on testing various forms of complete and partial independence using the equivalent Poisson models.

5.2.1  Educational Aspirations in Wisconsin

Table classifies 4991 Wisconsin male high school seniors according to socio-economic status (low, lower middle, upper middle, and high), the degree of parental encouragement they receive (low and high) and whether or not they have plans to attend college (no, yes). This is part of a larger table found in Fienberg (1977, p. 101).

Table 5.2: Socio-economic Status, Parental Encouragement and
Educational Aspirations of High School Seniors

Social
stratum
Parental
encouragement
College PlansTotal
NoYes
Lower Low74935784
High233133366
Lower MiddleLow62738665
High330303633
Upper MiddleLow42037457
High374467841
Higher Low15326179
High2668001066
Total 315219384991

In our analysis of these data we will view all three variables as responses, and we will study the extent to which they are associated. In this process we will test various hypotheses of complete and partial independence.

Let us first introduce some notation. We will use three subscripts to identify the cells in an I ×J ×K table, with i indexing the I rows, j indexing the J columns and k indexing the K layers. In our example I = 4, J = 2, and K = 2 for a total of 16 cells.

Let pijk denote the probability that an observation falls in cell (i,j,k). In our example, this cell represents category i of socio-economic status (S), category j of parental encouragement (E) and category k of college plans (P). These probabilities define the joint distribution of the three variables.

We also let yijk denote the observed count in cell (i,j,k), which we treat as a realization of a random variable Yijk having a multinomial or Poisson distribution.

We will also use the dot convention to indicate summing over a subscript, so pi.. is the marginal probability that an observation falls in row i and yi.. is the number of observations in row i. The notation extends to two dimensions, so pij. is the marginal probability that an observation falls in row i and columnj and yij. is the corresponding count.

5.2.2  Deviances for Poisson Models

In practice we will treat the Yijk as independent Poisson random variables with means mijk = n pijk, and we will fit log-linear models to the expected counts.

Table lists all possible models of interest in the Poisson context that include all three variables, starting with the three-factor additive model S+E+P on status, encouragement and plans, and moving up towards the saturated model S E P. For each model we list the abbreviated model formula, the deviance and the degrees of freedom.

Table 5.3: Deviances for Log-linear Models
Fitted to Educational Aspirations Data

ModelDevianced.f.
S+E+P 2714.010
SE + P1877.47
SP + E1920.47
S + EP1092.09
SE + SP1083.84
SE + EP255.56
SP + EP298.56
SE+SP+EP1.5753

We now switch to a multinomial context, where we focus on the joint distribution of the three variables S, E and P. We consider four different types of models that may be of interest in this case, and discuss their equivalence to one of the above Poisson models.

5.2.3  Complete Independence

The simplest possible model of interest in the multinomial context is the model of complete independence, where the joint distribution of the three variables is the product of the marginals. The corresponding hypothesis is

H0: pijk = pi.. p.j. p..k,
(5.9)
where pi.. is the marginal probability that an observation falls in row i, and p.j. and p..k are the corresponding column and layer margins.

Under this model the logarithms of the expected cell counts are given by

logmijk = logn + logpi.. + logp.j. + logp..k,
and can be seen to depend only on quantities indexed by i, j and k but none of the combinations (such as ij, jk or ik). The notation is reminiscent of the Poisson additive model, where
logmijk = h+ ai + bj + gk,
and in fact the two formulations can be shown to be equivalent, differing only on the choice of constraints: the marginal probabilities add up to one, whereas the main effects in the log-linear model satisfy the reference cell restrictions.

The m.l.e.'s of the probabilities under the model of complete independence turn out to be, as you might expect, the products of the marginal proportions. Therefore, the m.l.e.'s of the expected counts under complete independence are

^
m
 

ijk 
= yi.. y.j. y..k / n2.
Note that the estimates depend only on row, column and layer totals, as one would expect from considerations of marginal sufficiency.

To test the hypothesis of complete independence we compare the maximized multinomial log-likelihoods under the model of independence and under the saturated model. Because of the equivalence between multinomial and Poisson models, however, the resulting likelihood ratio statistic is exactly the same as the deviance for the Poisson additive model.

In our example the deviance of the additive model is 2714 with 10 d.f., and is highly significant. We therefore conclude that the hypothesis that social status, parental encouragement and college plans are completely independent is clearly untenable.

5.2.4  Block Independence

The next three log-linear models in Table 5.3 involve one of the two-factor interaction terms. As you might expect from our analysis of a two-by-two table, the presence of an interaction term indicates the existence of association between those two variables.

For example the model SE+P indicates that S and E are associated, but are jointly independent of P. In terms of our example this hypothesis would state that social status and parental encouragement are associated with each other, and are jointly independent of college plans.

Under this hypothesis the joint distribution of the three variables factors into the product of two blocks, representing S and E on one hand and P on the other. Specifically, the hypothesis of block independence is

H0: pijk = pij. p..k.
(5.10)
The m.l.e.'s of the cell probabilities turn out to be the product of the SE and P marginal probabilities and can be calculated directly. The m.l.e.'s of the expected counts under block independence are then
^
m
 

ijk 
= yij. y..k / n.
Note the similarity between the structure of the probabilities and that of the estimates, depending on the combination of levels of S and E on the one hand, and levels of P on the other.

To test the hypothesis of block independence we compare the maximized multinomial log-likelihood under the restrictions imposed by Equation 10 with the maximized log-likelihood for the saturated model. Because of the equivalence between multinomial and Poisson models, however, the test statistic would be exactly the same as the deviance for the model SE+P.

In our example the deviance for the model with the SE interaction and a main effect of P is 1877.4 on 7 d.f., and is highly significant. We therefore reject the hypothesis that college plans are independent of social status and parental encouragement.

There are two other models with one interaction term. The model SP+E has a deviance of 1920.4 on 7 d.f., so we reject the hypothesis that parental encouragement is independent of social status and college plans. The model EP+S is the best fitting of this lot, but the deviance of 1092.0 on 9 d.f. is highly significant, so we reject the hypothesis that parental encouragement and college plans are associated but are jointly independent of social status.

5.2.5  Partial Independence

The next three log-linear models in Table 5.3 involve two of the three possible two-factor interactions, and thus correspond to cases where two pairs of categorical variables are associated. For example the log-linear model SE+SP corresponds to the case where S and E are associated and so are S and P. In terms of our example we would assume that social status affects both parental encouragement and college plans. The figure below shows this model in path diagram form.

Note that we have assumed no direct link between E and P, that is, the model assumes that parental encouragement has no direct effect on college plans. In a two-way crosstabulation these two variables would appear to be associated because of their common dependency on social status S. However, conditional on social status S, parental encouragement E and college plans P would be independent.

Thus, the model assumes a form of partial or conditional independence, where the joint conditional distribution of EP given S is the product of the marginal conditional distributions of E given S and P given S. In symbols,

Pr{E = j,P = k|S = i} = Pr{E = j|S = i} Pr{P = k|S = i}.
To translate this statement into unconditional probabilities we write the conditional distributions as the product of the joint and marginal distributions, so that the above equation becomes
Pr{E = j,P = k,S = i}
Pr{S = i}
= Pr{E = j,S = i}
Pr{S = i}
Pr{P = k,S = i}
Pr{S = i}
,
from which we see that
Pr{S = i,E = j,P = k} = Pr{S = i,E = j} Pr{S = i,P = k}
Pr{S = i}
,
or, in our usual notation,
pijk = pij. pi.k
pi..
.
(5.11)
The m.l.e.'s of the expected cell counts have a similar structure and depend only on the SE and SP margins:
^
m
 

ijk 
= yij. yi.k
yi..
.
To test the hypothesis of partial independence we need to compare the multinomial log-likelihood maximized under the constraints implied by Equation 11 with the unconstrained maximum. Because of the equivalence between multinomial and Poisson models, however, the resulting likelihood ratio test statistic is the same as the deviance of the model SE+SP.

In terms of our example, the deviance of the model with SE and SP interactions is 1083.8 on 4 d.f., and is highly significant. We therefore reject the hypothesis that parental encouragement and college plans are independent within each social stratum.

There are two other models with two interaction terms. Although both of them have smaller deviances than any of the models considered so far, they still show significant lack of fit. The model SP+EP has a deviance of 298.5 on 6 d.f., so we reject the hypothesis that given college plans P social status S and parental encouragement E are mutually independent. The best way to view this model in causal terms is by assuming that S and E are unrelated and both have effects on P, as shown in the path diagram below.

The model SE+EP has a deviance of 255.5 on 6 d.f., and leads us to reject the hypothesis that given parental encouragement E, social class S and college plans P are independent. In causal terms one might interpret this model as postulating that social class affects parental encouragement which in turn affects college plans, with no direct effect of social class on college plans.

Note that all models consider so far have had explicit formulas for the m.l.e.'s, so no iteration has been necessary and we could have calculated all test statistics using the multinomial likelihood directly. An interesting property of the iterative proportional fitting algorithm mentioned earlier, and which is used by software specializing in contingency tables, is that it converges in one cycle in all these cases. The same is not true of the iteratively re-weighted least squares algorithm used in Poisson regression, which will usually require a few iterations.

5.2.6  Uniform Association

The only log-linear model remaining in Table 5.3 short of the saturated model is the model involving all three two-factor interactions. In this model we have a form of association between all pairs of variables, S and E, S and P, as well as E and P. Thus, social class is associated with parental encouragement and with college plans, and in addition parental encouragement has a direct effect on college plans.

How do we interpret the lack of a three-factor interaction? To answer this question we start from what we know about interaction effects in general and adapt it to the present context, where interaction terms in models for counts represent association between the underlying classification criteria. The conclusion is that in this model the association between any two of the variables is the same at all levels of the third.

This model has no simple interpretation in terms of independence, and as a result we cannot write the structure of the joint probabilities in terms of the two-way margins. In particular

pijk       is not       pij. pi.k p.jk
pi.. p.j. p..k
,
nor any other simple function of the marginal probabilities.

A consequence of this fact is that the m.l.e.'s cannot be written in closed form and must be calculated using an iterative procedure. They do, however, depend only on the three two-way margins SE, SP and EP.

In terms of our example, the model SP+SE+EP has a deviance of 1.6 on three d.f., and therefore fits the data quite well. We conclude that we have no evidence against the hypothesis that all three variables are associated, but the association between any two is the same at all levels of the third. In particular, we may conclude that the association between parental encouragement E and college plans P is the same in all social strata.

To further appreciate the nature of this model, we give the fitted values in Table . Comparison of the estimated expected counts in this table with the observed counts in Table 5.2 highlights the goodness of fit of the model.

Table 5.4: Fitted Values for Educational Aspirations Data
Based on Model of Uniform Association SE+SP+EP

Social
stratum
Parental
encouragement
College Plans
NoYes
Lower Low753.130.9
High228.9137.1
Lower MiddleLow626.039.0
High331.0302.0
Upper MiddleLow420.936.1
High373.1467.9
Higher Low149.030.0
High270.0796.0

We can also use the fitted values to calculate measures of association between parental encouragement E and college plans P for each social stratum. For the lowest group, the odds of making college plans are barely one to 24.4 with low parental encouragement, but increase to one to 1.67 with high encouragement, giving an odds ratio of 14.6. If you repeat the calculation for any of the other three social classes you will find exactly the same ratio of 14.6.

We can verify that this result follows directly from the lack of a three-factor interaction in the model. The logs of the expected counts in this model are

logmijk = h+ai+bj+gk +(ab)ij + (ag)ik + (bg)jk.
The log-odds of making college plans in social stratum i with parental encouragement j are obtained by calculating the difference in expected counts between k = 2 and k = 1, which is
log(mij2/mij1) = g2 - g1 +(ag)i2-(iag)i1 + (bg)j2-(bg)j1,
because all terms involving only i, j or ij cancel out. Consider now the difference in log-odds between high and low encouragement, i.e. when j = 2 and j = 1:
log( mi22/mi21
mi12/mi11
) = (bg)22 - (bg)21 - (bg)12 + (bg)11,
which does not depend on i. Thus, we see that the log of the odds ratio is the same at all levels of S. Furthermore, under the reference cell restrictions all interaction terms involving level one of any of the factors would be set to zero, so the log of the odds ratio in question is simply (bg)22. For the model with no three-factor interaction the estimate of this parameter is 2.683 and exponentiating this value gives 14.6.

5.2.7  Binomial Logits Revisited

Our analysis so far has treated the three classification criteria as responses, and has focused on their correlation structure. An alternative approach would treat one of the variables as a response and the other two as predictors in a regression framework. We now compare these two approaches in terms of our example on educational aspirations, treating college plans as a dichotomous response and socio-economic status and parental encouragement as discrete predictors.

To this end, we treat each of the 16 rows in Table 5.2 as a group. Let Yij denote the number of high school seniors who plan to attend college out of the nij seniors in category i of socio-economic status and category j of parental encouragement. We assume that these 16 counts are independent and have binomial distributions with Yij ~ B(nij,pij), where pij is the probability of making college plans We can then fit logistic regression models to study how the probabilities depend on social stratum and parental encouragement.

Table 5.5: Deviances por Logistic Regression Models
Fitted to the Educational Aspirations Data

ModelDevianced.f.
Null1877.47
S1083.84
E255.56
S+E1.5753

Table 5.5 shows the results of fitting four possible logit models of interest, ranging from the null model to the additive model on socioeconomic status (S) and parental encouragement (E). It is clear from these results that both social class and encouragement have significant gross and net effects on the probability of making college plans. The best fitting model is the two-factor additive model, with a deviance of 1.6 on three d.f. Table shows parameter estimates for the additive model.

Table 5.6: Parameter Estimates for Additive Logit Model
Fitted to the Educational Aspirations Data

VariableCategoryEstimateStd. Err.
Constant-3.1950.119
Socio-
economic
status
low--
lower middle0.4200.118
upper middle0.7390.114
high1.5930.115
Parental
encouragement
low--
high2.6830.099

Exponentiating the estimates we see that the odds of making college plans increase five-fold as we move from low to high socio-economic status. Furthermore, in each social stratum, the odds of making college plans among high school seniors with high parental encouragement are 14.6 times the odds among seniors with low parental encouragement.

The conclusions of this analysis are consistent with those from the previous subsection, except that this time we do not study the association between social stratification and parental encouragement, but focus on their effect on making college plans. In fact it is not just the conclusions, but all estimates and tests of significance, that agree. A comparison of the binomial deviances in Table 5.5 with the Poisson deviances in Table 5.3 shows the following `coincidences':

log-linear modellogit model
SE+PNull
SE+SPS
SE+EPE
SE+SP+EPS+E

The models listed as equivalent have similar interpretations if you translate from the language of correlation analysis to the language of regression analysis. Note that all the log-linear models include the SE interaction, so they allow for association between the two predictors. Also, all of them include a main effect of the response P, allowing it to have a non-uniform distribution. The log-linear model with just these two terms assumes no association between P and either S or E, and is thus equivalent to the null logit model.

The log-linear model with an SP interaction allows for an association between S and P, and is therefore equivalent to the logit model where the response depends only on S. A similar remark applies to the log-linear model with an EP interaction. Finally, the log-linear model with all three two-factor interactions allows for associations between S and P, and between E and P, and assumes that in each case the strength of association does not depend on the other variable. But this is exactly what the additive logit model assumes: the response depends on both S and E, and the effect of each factor is the same at all levels of the other predictor.

In general, log-linear and logit models are equivalent as long as the log-linear model

This equivalence extends to parameter estimates as well as tests of significance. For example, multiplying the fitted probabilities based on the additive logit model S+E by the sample sizes in each category of social status and parental encouragement leads to the same expected counts that we obtained earlier from the log-linear model SE+SP+EP. An interesting consequence of this fact is that one can use parameter estimates based on a log-linear model to calculate logits, as we did in Section 5.2.6, and obtain the same results as in logistic regression. For example the log of the odds ratio summarizing the effect of parental encouragement on college plans within each social stratum was estimated as 2.683 in the previous subsection, and this value agrees exactly with the estimate on Table 5.6.

In our example the equivalence depends crucially on the fact that the log-linear models include the SE interaction, and therefore reproduce exactly the binomial denominators used in the logistic regression. But what would have happened if the SE interaction had turned out to be not significant? There appear to be two schools of thought on this matter.

Bishop et al. (1975), in a classic boook on the multivariate analysis of qualitative data, emphasize log-linear models because they provide a richer analysis of the structure of association among all factors, not just between the predictors and the response. If the SE interaction had turned out to be not significant they would probably leave it out of the model. They would still be able to translate their parameter estimates into fitted logits, but the results would not coincide exactly with the logistic regression analysis (although they would be rather similar if the omitted interaction is small.)

Cox (1972), in a classic book on the analysis of binary data, emphasizes logit models. He argues that if your main interest is on the effects of two variables, say S and E on a third factor, say P, then you should condition on the SE margin. This means that if you are fitting log-linear models with the intention of understanding effects on P, you would include the SE interaction even if it is not significant. In that case you would get exactly the same results as a logistic regression analysis, which is probably what you should have done in the first place if you wanted to study specifically how the response depends on the predictors.


Continue with 6. Multinomial Response Models
Copyright © Germán Rodríguez, 1993-2000. Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.