Germán Rodríguez

Multilevel Models
Princeton University
We will illustrate random intercept logit models using data from Lillard and Panis (2000) on 1060 births to 501 mothers. The outcome of interest is whether the birth was delivered in a hospital or elsewhere. The predictors include the log of income `loginc`

, the distance to the nearest hospital `distance`

, and two indicators of mothers’s education: `dropout`

for less than high school and `college`

for college graduates, so high school or some college is the reference cell.

First we read the data from the course website

. infile hosp loginc distance dropout college mother /// > using https://data.princeton.edu/pop510/hospital.dat, clear (1,060 observations read)

> hosp <- read.table("https://data.princeton.edu/pop510/hospital.dat", + header = FALSE) > names(hosp) <- c("hosp","loginc","distance","dropout","college","mother")

To fit a model with a woman-level random effect we can use `xtlogit`

we use `glmer()`

in the `lme4`

package

. xtlogit hosp loginc distance dropout college, i(mother) re Fitting comparison model: Iteration 0: log likelihood = -644.95401 Iteration 1: log likelihood = -541.44886 Iteration 2: log likelihood = -537.4711 Iteration 3: log likelihood = -537.45771 Iteration 4: log likelihood = -537.45771 Fitting full model: tau = 0.0 log likelihood = -537.45771 tau = 0.1 log likelihood = -534.03315 tau = 0.2 log likelihood = -530.99872 tau = 0.3 log likelihood = -528.53741 tau = 0.4 log likelihood = -526.88308 tau = 0.5 log likelihood = -526.38822 tau = 0.6 log likelihood = -527.67382 Iteration 0: log likelihood = -526.3876 Iteration 1: log likelihood = -522.68442 Iteration 2: log likelihood = -522.65042 Iteration 3: log likelihood = -522.65042 Random-effects logistic regression Number of obs = 1,060 Group variable: mother Number of groups = 501 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 2.1 max = 10 Integration method: mvaghermite Integration pts. = 12 Wald chi2(4) = 110.06 Log likelihood = -522.65042 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ hosp ¦ Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- loginc ¦ .5622009 .0727497 7.73 0.000 .4196141 .7047876 distance ¦ -.0765915 .0323473 -2.37 0.018 -.1399911 -.013192 dropout ¦ -1.997753 .2556249 -7.82 0.000 -2.498769 -1.496737 college ¦ 1.03363 .3884851 2.66 0.008 .2722135 1.795047 _cons ¦ -3.36984 .4794505 -7.03 0.000 -4.309546 -2.430134 -------------+---------------------------------------------------------------- /lnsig2u ¦ .4372018 .3161192 -.1823805 1.056784 -------------+---------------------------------------------------------------- sigma_u ¦ 1.244335 .1966791 .912844 1.696203 rho ¦ .3200274 .0687907 .2020988 .4665343 ------------------------------------------------------------------------------ LR test of rho=0: chibar2(01) = 29.61 Prob >= chibar2 = 0.000 . estimates store xt

> library(lme4) > pql <- glmer(hosp ~ loginc + distance + dropout + college + + (1 | mother), data = hosp, family=binomial) > summary(pql, corr = FALSE) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: binomial ( logit ) Formula: hosp ~ loginc + distance + dropout + college + (1 | mother) Data: hosp AIC BIC logLik deviance df.resid 1061.2 1091.0 -524.6 1049.2 1054 Scaled residuals: Min 1Q Median 3Q Max -2.6758 -0.4606 -0.2779 0.5255 4.4536 Random effects: Groups Name Variance Std.Dev. mother (Intercept) 1.251 1.118 Number of obs: 1060, groups: mother, 501 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.29448 0.46666 -7.060 1.67e-12 *** loginc 0.55040 0.07094 7.759 8.59e-15 *** distance -0.07742 0.03169 -2.443 0.01456 * dropout -1.94727 0.24439 -7.968 1.61e-15 *** college 1.02322 0.37259 2.746 0.00603 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 convergence code: 0 Model failed to converge with max|grad| = 0.00154014 (tol = 0.001, component 1)

By default Stata uses Gauss-Hermite adaptive quadrature using the mean and variance with 12 integration points.

The same model can be fit using `melogit`

, which by default uses only 7 integration points. We can change the number to 12 to get better correspondence.

By default R uses the PQL approximation, which in this case doesn’t converge. Let us specify 12-point adaptive Gaussian quadrature instead and compare the estimates:

. melogit hosp loginc distance dropout college || mother:, intpoints(12) Fitting fixed-effects model: Iteration 0: log likelihood = -539.11554 Iteration 1: log likelihood = -537.46251 Iteration 2: log likelihood = -537.45771 Iteration 3: log likelihood = -537.45771 Refining starting values: Grid node 0: log likelihood = -526.3876 Fitting full model: Iteration 0: log likelihood = -526.3876 Iteration 1: log likelihood = -522.74579 Iteration 2: log likelihood = -522.65053 Iteration 3: log likelihood = -522.65042 Iteration 4: log likelihood = -522.65042 Mixed-effects logistic regression Number of obs = 1,060 Group variable: mother Number of groups = 501 Obs per group: min = 1 avg = 2.1 max = 10 Integration method: mvaghermite Integration pts. = 12 Wald chi2(4) = 110.06 Log likelihood = -522.65042 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ hosp ¦ Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- loginc ¦ .5622029 .0727497 7.73 0.000 .419616 .7047898 distance ¦ -.0765916 .0323474 -2.37 0.018 -.1399914 -.0131919 dropout ¦ -1.997762 .2556281 -7.82 0.000 -2.498784 -1.49674 college ¦ 1.033635 .3884878 2.66 0.008 .2722127 1.795057 _cons ¦ -3.369853 .4794516 -7.03 0.000 -4.309561 -2.430145 -------------+---------------------------------------------------------------- mother ¦ var(_cons)¦ 1.548407 .4894986 .8332867 2.877238 ------------------------------------------------------------------------------ LR test vs. logistic model: chibar2(01) = 29.61 Prob >= chibar2 = 0.0000 . estimates store me . estimates table xt me ---------------------------------------- Variable ¦ xt me -------------+-------------------------- hosp ¦ loginc ¦ .56220088 .56220292 distance ¦ -.07659154 -.07659161 dropout ¦ -1.9977529 -1.9977619 college ¦ 1.0336304 1.0336349 _cons ¦ -3.36984 -3.369853 -------------+-------------------------- lnsig2u ¦ _cons ¦ .4372018 -------------+-------------------------- var( ¦ _cons[mot~r])¦ _cons ¦ 1.5484071 ----------------------------------------

Note that `xtlogit`

reports the logged variance (and the standard deviation) whereas `melogit`

reports the variance, but the results are equivalent. The other estimates are all very close.

> agq <- glmer(hosp ~ loginc + distance + dropout + college + + (1 | mother), data = hosp, family=binomial, nAGQ = 12) > summary(agq, corr = FALSE) Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 12) [glmerMod] Family: binomial ( logit ) Formula: hosp ~ loginc + distance + dropout + college + (1 | mother) Data: hosp AIC BIC logLik deviance df.resid 1057.3 1087.1 -522.7 1045.3 1054 Scaled residuals: Min 1Q Median 3Q Max -2.8085 -0.4497 -0.2659 0.4938 4.5728 Random effects: Groups Name Variance Std.Dev. mother (Intercept) 1.548 1.244 Number of obs: 1060, groups: mother, 501 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.36986 0.47945 -7.029 2.09e-12 *** loginc 0.56220 0.07275 7.728 1.09e-14 *** distance -0.07659 0.03235 -2.368 0.0179 * dropout -1.99775 0.25563 -7.815 5.49e-15 *** college 1.03365 0.38849 2.661 0.0078 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > cbind(fixef(pql), fixef(agq)) [,1] [,2] (Intercept) -3.29448256 -3.3698604 loginc 0.55039538 0.5622039 distance -0.07741503 -0.0765922 dropout -1.94726835 -1.9977473 college 1.02321857 1.0336453 > c(unlist(VarCorr(pql)), unlist(VarCorr(agq))) mother mother 1.250605 1.548383

We see that the fixed effects are similar, but PQL underestimates the variance at the mother level. By the way a simple quadrature check is to increase the number of quadrature points and verify that all estimates remain essentially unchanged.

The estimated fixed effects may be exponentiated and interpreted as odds ratios in the usual way. For example the coefficient of college, which exponentiated is 2.81, means that the odds of hospital delivery for a college graduate are 2.81 times those of a high school graduate with the same observed and unobserved characteristics, including income, distance to the hospital, and the mother-specific residual.

The standard deviation of the random effect may be interpreted exactly the same way, by treating it as the coefficient of a standard normal random effect. In this case the standard deviation of 1.24 exponentiated becomes 3.47, so the odds of hospital delivery for a woman whose unobserved characteristics put her one standard deviation above the mean, are three-and-a-half times those of an average woman with the same observed characteristics, including income, education, and distance to the hospital.

Stata’s `xtlogit`

reports the intra-class correlation “rho” in the *latent* scale as 0.32. We can verify this result We can compute the intra-class correlation in the *latent* scale using the estimated variance at the mother level and recalling that the level-1 variance is p^{2}/3:

. estimates restore xt (results xt are active now) . scalar v2 = exp(_b[/lnsig2u]) . di v2/(v2 + _pi^2/3) .32002744

> v <- unlist(VarCorr(agq)) > v / (v + pi^2/3) mother 0.3200294

The intraclass correlation of 0.32 reflects the correlation between the latent propensity to deliver in a hospital for two births of the same mother. It also means that about one third of the variance in the latent propensity to deliver in a hospital, beyond that explained by income, education and distance to the hospital, can be attributed to other characterisics of the mothers.

We can also calculate the *manifest* correlation at the median linear predictor, which requires “integrating out” the random effect. The `xtrho`

command can calculate this for us at the median or other quantiles. This command is only available after `xtlogit`

, `xtprobit`

or `xtcloglog`

. Fortunately our curent estimate is from `xtlogit`

. We can do this by first calculating the median linear predictor and then computing the average probability of one and two hospital deliveries using the `gauher()`

function to integrate out the random effect. The function can be sourced directly from this website as shown below.

. xtrho Measures of intra-class manifest association in random-effects logit Evaluated at median linear predictor Measure ¦ Estimate [95% Conf.Interval] -----------------+------------------------------------ Marginal prob. ¦ .239252 .217793 .268467 Joint prob. ¦ .093807 .067841 .132652 Odds ratio ¦ 2.72847 1.90755 4.28414 Pearson's r ¦ .200894 .119788 .308453 Yule's Q ¦ .463586 .312136 .621509

> X <- model.matrix(agq) > xb <- X %*% fixef(agq) > md <- median(xb) > source("https://data.princeton.edu/pop510/gauher.R") > ghi <- function(f, gq = gauher(12)) { + sum(f(gq$z) * gq$w) + } > m1 <- ghi(function(z) plogis(md + sqrt(v)*z)); m1 # margin [1] 0.2392533 > m11 <- ghi(function(z) plogis(md + sqrt(v)*z)^2); m11 # joint [1] 0.09381011 > M <- matrix(c(m11, m1 - m11, m1 - m11, 1 - 2*m1 + m11), 2, 2) > or <- M[1,1]*M[2,2]/(M[1,2]*M[2,1]) ; or # odds ratio [1] 2.72868 > r <- (m11 - m1^2)/(m1 * (1-m1)) ; r # Pearson's r [1] 0.2009107 > Q <- (or - 1)/(or + 1) ; Q # Yule's Q [1] 0.4636171

The probability of one birth delivered in a hospital is 23.9% and the probability of two is 9.38% leading to the two-by-two table above.

We see that the correlation between the actual place of delivery for two births of the same woman if she has median characteristics is equivalent to a Pearson’s *r* of 0.20, a Yule’s *Q* of 0.46, or perhaps in more familiar terms an odds ratio of 2.73. This means that the odds of hospital delivery for a woman who delivered a previous birth at a hospital are 2.73 times those of a comparable woman who delivered the previous birth elsewhere.

Earlier we noted that the effect of education implies that the odds of hospital delivery for a college graduate are 2.8 times those of a high school graduate with the same observed and unobserved characteristics. This is a *subject specific* effect, comparing the odds of hospital delivery for essentially the same woman under two different education scenarios. It is also a conditional effect given a fixed value of the random effect.

We can also compute a *population average* effect by averaging or “integrating out” the random effect. Essentially this entails computing the effect at different values of the random effect and averaging, and it can be computed by numerical integration or by simulation. Let us find the effect of education at the mean distance of 3.74 and the mean log income of 5.88 using Gauss-Hermite integration with the built-in function `_gauss_hermite_nodes()`

.

. egen first = tag(mother) . quietly sum dist if first . scalar mdist = r(mean) . quietly sum loginc if first . scalar mloginc = r(mean) . di mdist, mloginc 3.7377246 5.882525 . scalar xb = _b[_cons] + _b[loginc] * mloginc + _b[dist] * mdist . scalar sigma = exp(_b[/lnsig2u]/2) . scalar bcollege = _b[college] . mata: --------------------- mata (type end to exit) --------------------- : xb = st_numscalar("xb") : sigma = st_numscalar("sigma") : b = st_numscalar("bcollege") : gh = _gauss_hermite_nodes(12)' : gh[,1] = gh[, 1] :* sqrt(2) // change of variables : gh[,2] = gh[, 2] :/ sqrt(pi()) // to standard normal : p1 = sum( invlogit(xb :+ b :+ gh[,1] :* sigma) :* gh[,2]) : p0 = sum( invlogit(xb :+ gh[,1] :* sigma) :* gh[,2]) : (p1/(1-p1))/(p0/(1-p0)) 2.21217867 : end ------------------------------------------------------------

> library(dplyr) > x <- summarize(hosp, loginc = mean(loginc), distance= mean(distance)); x loginc distance 1 5.987951 3.918396 > b <- fixef(agq) > xb <- as.numeric(b[1] + x["loginc"] * b["loginc"] + x["distance"]* b["distance"]) > p1 <- ghi(function(z) plogis(xb + b["college"] + sqrt(v)*z)) > p0 <- ghi(function(z) plogis(xb + sqrt(v)*z)) > (p1/(1 - p1)) / (p0/(1-p0)) [1] 2.212816

We get an odds ratio of 2.21. This is smaller in magnitude than the subject-specific odds ratio of 2.81.

You will find on this website analyses of the same data using three Bayesian methods:

The last entry includes a comparison of estimates with all four methods.

Last updated 12 April 2018

Continue with Random Logit Coefficients