Germán Rodríguez
Multilevel Models Princeton University

Variance Components

We illustrate basic ideas in variance-component models using data from Snijders and Boskers (1999). The dataset has 2287 children from 131 schools in The Netherlands, and is available in Stata format.

. use http://data.princeton.edu/pop510/snijders
(Scores in language test from Snijders and Bosker, 1999)
> library(foreign)

> snijders <- read.dta("http://data.princeton.edu/pop510/snijders.dta")

We are interested in scores in a language test. Here are some descriptive statistics:

. sum langpost

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    langpost |      2,287    40.93485    9.003676          9         58

. scalar om = r(mean)

. bysort schoolnr: gen sn = _N     // school n

. by schoolnr: gen first = _n == 1 // mark first obs in school

. sum sn if first

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          sn |        131    17.45802    7.231618          4         35
> om <- summarize(snijders, mean(langpost)); om
  mean(langpost)
1       40.93485

> sm <- group_by(snijders, schoolnr) %>% summarize( n = n())

> summary(sm$n)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   4.00   11.50   17.00   17.46   23.00   35.00 

The overall mean is 40.93. The number of observations per school ranges from 4 to 35, with an average of 17.46.

Variance Components

Next we fit a simple variance components model of the form Yij = μ + αi + eij using schoolnr as the grouping factor:

. xtreg langpost, i(schoolnr) mle
Iteration 0:   log likelihood =  -8128.005
Iteration 1:   log likelihood = -8126.6359
Iteration 2:   log likelihood = -8126.6093
Iteration 3:   log likelihood = -8126.6092

Random-effects ML regression                    Number of obs     =      2,287
Group variable: schoolnr                        Number of groups  =        131

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          4
                                                              avg =       17.5
                                                              max =         35

                                                Wald chi2(0)      =       0.00
Log likelihood  = -8126.6092                    Prob > chi2       =          .

------------------------------------------------------------------------------
    langpost |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   40.36409   .4271561    94.49   0.000     39.52688     41.2013
-------------+----------------------------------------------------------------
    /sigma_u |   4.407781   .3516687                      3.769711    5.153852
    /sigma_e |   8.035411   .1227603                      7.798372    8.279656
         rho |    .231302   .0292466                      .1780689    .2924018
------------------------------------------------------------------------------
LR test of sigma_u=0: chibar2(01) = 287.98             Prob >= chibar2 = 0.000
> library(lme4)

> vc <- lmer(langpost ~ 1 + (1 | schoolnr), data=snijders, REML = FALSE); vc
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: langpost ~ 1 + (1 | schoolnr)
   Data: snijders
      AIC       BIC    logLik  deviance  df.resid 
16259.219 16276.424 -8126.609 16253.219      2284 
Random effects:
 Groups   Name        Std.Dev.
 schoolnr (Intercept) 4.408   
 Residual             8.035   
Number of obs: 2287, groups:  schoolnr, 131
Fixed Effects:
(Intercept)  
      40.36  

> v <- VarCorr(vc); v
 Groups   Name        Std.Dev.
 schoolnr (Intercept) 4.4078  
 Residual             8.0354  

> s2u <- as.numeric(v)

> s2e <- sigma(vc)^2

> s2u/(s2u + s2e) # icc
[1] 0.231302

We find an intraclass correlation of 0.23, so 23% of the variation in language scores can be attributed to the schools. (Alternative estimation methods will be discussed later, here we just used maximum likelihood estimation.)

The Grand Mean

Note that the estimate of the constant is not the same as the overall mean obtained at the outset. The overall mean can be seen as averaging the school means with weights proportional to the number of students per school, which would be optimal if the observations were independent. At the other extreme one could compute an unweighted average of the school means, which would be optimal if the intra-class correlation was one. Let's compute these:

. egen sm = mean(langpost), by(schoolnr)    // school mean

.         quietly sum sm [fw=sn] if first

.         di r(mean)
40.934849

. quietly sum sm if first

.         di r(mean)
40.129894
> schools <- group_by(snijders, schoolnr) %>% summarize(sn = n(), sm = mean(langpost))

> summarize(schools, mw = weighted.mean(sm, sn), mu = mean(sm))
Source: local data frame [1 x 2]

        mw       mu
     (dbl)    (dbl)
1 40.93485 40.12989

We obtain estimates of 40.93 weighted by school sample size and 40.13 unweighted. The random effects estimate represents a compromise between these extremes, and can be reproduced using weights inversely proportional to the variance of the school means:

. gen sw = 1/(_b[/sigma_u]^2+_b[/sigma_e]^2/sn)

.         quietly sum sm [aw=sw] if first

.         di r(mean)
40.364088
> schools <- mutate(schools, sw = 1/(s2u + s2e/sn))

> summarize(schools, mle =  weighted.mean(sm, sw))
Source: local data frame [1 x 1]

       mle
     (dbl)
1 40.36409

We verify the estimate of 40.36.

School Effects

Next we consider 'estimation' of the school random effects.

If these effects were treated as fixed, so the model is Yi j = μ + αi + ei j we would estimate them by introducing a dummy variable for each school.

It turns out that the estimate of μ is the overall mean, the estimate of μ + αi is the school mean, and thus the estimate of αi is the difference between the school mean and the overall mean.

With random effects we consider the (posterior) distribution of ai given the data yi (the vector with yi j for j = 1, ... ,ni). This can be obtained using Bayes theorem from the (prior) distribution of ai, the conditional distribution of the data yi given ai and the marginal distribution of the data.

The resulting estimate is known as BLUP (for best linear unbiased predictor) and can be shown to be the difference between the school mean and the m.l.e. of the constant "shrunk" towards zero by a factor

Ri=s2u/ (s2u + s2e/ni),

the ratio of the variances of the random effect and the school mean. Note that shrinkage is minimal when (1) ni is large, (2) σ2e is small (students are not informative), and/or (3) σ2u is large (schools are informative). Let us calculate these 'by hand'

. gen R = _b[/sigma_u]^2/(_b[/sigma_u]^2 + _b[/sigma_e]^2/sn)

. gen ahat = R*(sm-_b[_cons])

.         gen alphahat = sm - om

.         scatter ahat alphahat, title(School effects on language scores) ///
>         xtitle(MLE) ytitle(BLUP)

.         graph export langblup.png, width(500) replace
(file langblup.png written in PNG format)

BLUP

> schools <- mutate(schools, 
+   R = s2u/(s2u + s2e/sn), 
+   ahat = R * (sm - fixef(vc)),
+   alphahat = sm - om$m)

> ggplot(schools, aes(alphahat, ahat)) + geom_point() +
+   ggtitle("School Effects on Language Scores")

> ggsave("langblupr.png", width=500/72, height=400/72, dpi=72)

BLUP BLUPS can be computed more easily using xtmixed in Stata or ranef() in R

. xtmixed langpost || schoolnr: , mle 

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -8126.6092  
Iteration 1:   log likelihood = -8126.6092  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =      2,287
Group variable: schoolnr                        Number of groups  =        131

                                                Obs per group:
                                                              min =          4
                                                              avg =       17.5
                                                              max =         35

                                                Wald chi2(0)      =          .
Log likelihood = -8126.6092                     Prob > chi2       =          .

------------------------------------------------------------------------------
    langpost |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   40.36409   .4263628    94.67   0.000     39.52843    41.19974
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolnr: Identity           |
                   sd(_cons) |   4.407783   .3516692      3.769713    5.153856
-----------------------------+------------------------------------------------
                sd(Residual) |   8.035411   .1227603      7.798372    8.279655
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 287.98        Prob >= chibar2 = 0.0000

. predict ahat2, reffects

.         list schoolnr ahat ahat2 if first & schoolnr < 16 

      +----------------------------------+
      | schoolnr        ahat       ahat2 |
      |----------------------------------|
   1. |        1   -3.498956   -3.498958 |
  26. |        2    -11.2898   -11.28981 |
  33. |       10   -5.985623   -5.985626 |
  38. |       12    -7.77485   -7.774852 |
  53. |       15   -6.704085   -6.704087 |
      +----------------------------------+
> eb <- ranef(vc)

>  cbind(head(eb$schoolnr), head(schools$ahat))
   (Intercept) head(schools$ahat)
1    -3.498957          -3.498957
2   -11.289804         -11.289804
10   -5.985623          -5.985623
12   -7.774851          -7.774851
15   -6.704085          -6.704085
16    0.802527           0.802527

We list a few cases to show that the two calculations yield the same result.