Germán Rodríguez
Statistics and Population Princeton University

This page collects some resources related to my research on multilevel models. For materials related to a half-semester course on multi-level models please click here.


Here is a list of my publications on multilevel models.  

Rodríguez, Germán (2015). Multilevel Models in Demography, in International Encyclopedia of the Social and Behavioral Sciences, 2nd Edition, Vol 16, pages 48-56. J. D. Wright, Editor-in-chief. Oxford: Elsevier.

We introduce multilevel models in demography, reviewing random intercept and random slope models, cross-level interactions, estimation of the fixed and random parameters, and calculation of subject-specific and population-average predictions. All ideas are discussed in the context of specific applications, a two-level analysis of contraceptive use in 15 countries and a three-level hazard model of infant and child survival in Kenya, with emphasis on the interpretation of results. Throughout the article we include references to other demographic applications of interest.

Rodríguez, Germán (2008). Multilevel Generalized Linear Models Chapter 9 in Handbook of Multilevel Analysis, Jan de Leeuw and Erik Meijer, Editors. Springer. [The book is available from Amazon.]

This chapter explores extensions of generalized linear models (GLMs) to include random effects in a multilevel setting, viewed as special cases of the Multilevel Generalized Linear Model (MGLM). Section 1 starts with some historical remarks. Section 2 develops the modeling framework, noting the relationship between GLMs and survival models and drawing an important distinction between conditional and marginal models. Section 3 is devoted to a discussion of estimation procedures, including a review of approximate estimation procedures, maximum likelihood estimation using ordinary and adaptive Gaussian quadrature, and Bayesian estimation procedures using the Gibbs sampler. Section 4 is devoted to an application of MGLMs to the study of infant and child mortality in Kenya, using data from a national survey conducted in 1998 and a three-level shared-frailty piece-wise exponential survival model that allows for unobserved family and community effects on child survival.

Rodríguez, Germán and Elo, Irma (2003). Intra-class Correlation in Random-Effects Models for Binary Data. The Stata Journal, Vol. 3, No. 1., pp. 32-46. Article here.

We review the concept of intra-class correlation in random-effects models for binary outcomes as estimated by Stata's xtprobit, xtlogit, and xtclog. We consider the usual measures of correlation based on a latent variable formulation of these models and note corrections to the last two procedures. We also discuss alternative measures of association based on manifest variables or actual outcomes and introduce a new command xtrho for computing these measures for all three types of models.

Glei, Dana; Goldman, Noreen and Rodríguez, Germán (2003). Utilization of Care During Pregnancy in Rural Guatemala: Does Obstetrical Need Matter? Social Science and Medicine Vol. 57, pp. 2447-2463.

This study examines factors associated with the use of biomedical care during pregnancy in Guatemala, focusing on the extent to which complications in an ongoing or previous pregnancy affect a woman's decision to seek care. The findings, based on multilevel models, suggest that obstetrical need, as well as demographic, social and cultural factors, are important predictors of pregnancy care. In contrast, measures of availability and access to health services have modest effects. The results also suggest the importance of unobserved variables--such as quality of care--in explaining women's decisions about pregnancy care. The results imply that improving proximity to biomedical services is unlikely to have a dramatic impact on utilization in the absence of additional changes that improve the quality of care or reduce barriers to access. Moreover, current efforts aimed at incorporating midwives into the formal health-care system may need to extend their focus beyond the modification of midwife practices to consider the provision of culturally appropriate, high-quality services by traditional and biomedical providers alike.

Rodríguez, Germán and Goldman, Noreen (2001). Improved Estimation Procedures for Multilevel Models with Binary Response: A Case Study. Journal of the Royal Statistical Society. Series A (Statistics in Society) Vol. 164, No. 2. [This article is available in JSTOR.]

During recent years, analysts have been relying on approximate methods of inference to estimate multilevel models for binary or count data. In an earlier study of random-intercept models for binary data we used simulated data to demonstrate that one such approximation, known as marginal quasi-likelihood, leads to a substantial attenuation bias in the estimates of both fixed and random effects whenever the random effects are non-trivial. In this paper, we fit three-level random-intercept models to actual data for two binary outcomes, to assess whether refined approximation procedures, namely penalized quadi-likelihood and second-order improvements to marginal and penalized quasi-likelihood, also underestimate the underlying parameters. The extent of the bias is assessed by two standards of comparison: exact maximum likelihood estimates, based on a Gauss-Hermite numerical quadrature procedure, and a set of Bayesian estimates, obtained from Gibbs sampling with diffuse priors. We also examine the effectiveness of a parametric bootstrap procedure for reducing the bias. The results indicate that second-order penalized quasi-likelihood estimates provide a considerable improvement over the other approximations, but all the methods of approximate inference results in substantial underestimation of the fixed and random effects when the random effects are sizable. We also find that the parametric bootstrap can eliminate the bias but is computationally very intensive.

Pebley, Anne; Goldman, Noreen, and Rodríguez, Germán (1996). Prenatal and Delivery Care and Childhood Immunization in Guatemala: Do Family and Community Matter? Demography, Vol. 33, No. 2., pp. 231-247. [This article is available in JSTOR.]

In this paper we investigate family choices about pregnancy-related care and the use of childhood immunization. Estimates obtained from a multilevel logistic model indicate that use of formal (or "modern") health services differs substantially by ethnicity, by social and economic factors, and by availability of health services. The results also show that family and community membership are very important determinants of the use of health care, even in the presence of controls for a large number of observed characteristics of individuals, families, and communities.

Rodríguez, Germán and Goldman, Noreen (1995). An Assessment of Estimation Procedures for Multilevel Models with Binary Responses. Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 158, No. 1., pp. 73-89. [This article is available in JSTOR.]

We evaluate two software packages that are available for fitting multilevel models to binary response data, namely VARCL and ML3, by using a Monte Carlo study designed to represent quite closely the actual structure of a data set used in an analysis of health care utilization in Guatemala. We find that the estimates of fixed effects and variance components produced by the software packages are subject to very substantial downward bias when the random effects are sufficiently large to be interesting. In fact, the fixed effect estimates are no better than the estimates obtained by using standard logit models that ignore the hierarchical structure of the data. The estimates of standard errors appear to be reasonably accurate and superior to those obtained by ignoring clustering, although one might question their utility in the presence of large biases. We conclude that alternative estimation procedures need to be developed and implemented for the binary response case.

Guo, Guang, and Rodríguez, Germán (1992). Estimating a Multivariate Proportional Hazards Model for Clustered Data Using the EM Algorithm, with an Application to Child Survival in Guatemala. Journal of the American Statistical Association, Vol. 98, No. 420, pp. 969-976. [This article is available in JSTOR.]

This article discusses a random-effects model for the analysis of clustered survival times, such as those reflecting the mortality experience of children in the same family. We describe parametric and non-parametric approaches to the specification of the random effect and show how the model may be fitted using an accelerated EM algorithm. We then fit two specifications of the model to child survival data from Guatemala. These data have been analyzed before using standard hazard models that ignore cluster effects. Key words: Frailty; infant and child mortality; mixture models; random effects.


The simulations used in our 1995 JRSS-A paper are available for public use here.

The actual data from Guatemala used in our 1996 Demography paper and in the 2001 JRSS-A paper are available here.