Home | GLMs | Multilevel | Survival | Demography | Stata | R
Home 2 Levels 3 Levels Logit winBUGS MLwiN

Variance-Components Models

We illustrate basic ideas in variance component models using data from Snijders and Boskers (1999). The dataset has 2287 children from 131 schools in The Netherlands.

.  infile schoolnr  pupilnr iq_verb  iq_perf sex  minority  repeatgr ///
>    aritpret  classnr aritpost langpret langpost ses denomina schoolses ///
>    satiprin natitest meetings currmeet mixedgra percmino aritdiff ///
>    homework classsiz groupsiz ///
>    using http://data.princeton.edu/wws509/datasets/snijders.dat
(2287 observations read)

We are interested in scores in a language test. Here are some descriptive stats:

. sum langpost
 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
    langpost |      2287    40.93485    9.003676          9         58
 
. scalar om = r(mean)
 
. bysort schoolnr: gen sn = _N      // school n
 
. by schoolnr: gen first = _n == 1 // mark first obs per school
 
. sum sn if first
 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          sn |       131    17.45802    7.231618          4         35

The overall mean is 40.93. The number of observations per school ranges from 4 to 35, with an average of 17.45.

The Variance Components

Next we use xtreg to fit a simple variance components model of the form Yit = m + ai + ei t using schoolnr as the grouping factor:

. xtreg langpost, i(schoolnr) mle
Iteration 0:   log likelihood =  -8128.005
Iteration 1:   log likelihood = -8126.6359
Iteration 2:   log likelihood = -8126.6093
Iteration 3:   log likelihood = -8126.6092
 
Random-effects ML regression                    Number of obs      =      2287
Group variable: schoolnr                        Number of groups   =       131
 
Random effects u_i ~ Gaussian                   Obs per group: min =         4
                                                               avg =      17.5
                                                               max =        35
 
                                                Wald chi2(0)       =      0.00
Log likelihood  = -8126.6092                    Prob > chi2        =         .
 
------------------------------------------------------------------------------
    langpost |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   40.36409   .4271561    94.49   0.000     39.52688     41.2013
-------------+----------------------------------------------------------------
    /sigma_u |   4.407781   .3516687                      3.769711    5.153852
    /sigma_e |   8.035411   .1227603                      7.798372    8.279656
         rho |    .231302   .0292466                      .1780689    .2924018
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)=  287.98 Prob>=chibar2 = 0.000

We find an intraclass correlation of 0.23, so 23% of the variation in language scores can be attributed to the schools. (Alternative estimation methods will be discussed later, here we just used maximum likelihood estimation.)

The Grand Mean

Note that the estimate of the constant is not the same as the overall mean obtained at the outset. The overall mean can be seen as averaging the school means with weights proportional to the number of students per school, which would be optimal if the observations were independent. At the other extreme one could compute an unweighted average of the school means, which would be optimal if the intra-class correlation was one. Let's compute these:

. egen sm = mean(langpost), by(schoolnr)    // school mean
 
. quietly sum sm [fw=sn] if first
 
. di r(mean)
40.934849
 
. quietly sum sm if first
 
. di r(mean)
40.129894

We obtain estimates of 40.93 weighted by school sample size and 40.13 unweighted. The random effects estimate represents a compromise between these extremes, and can be reproduced using weights inversely proportional to the variance of the school means:

. gen sw = 1/(_b[/sigma_u]^2+_b[/sigma_e]^2/sn)
 
. quietly sum sm [aw=sw] if first
 
. di r(mean)
40.364088

We verify the estimate of 40.36.

School Effects

Next we consider 'estimation' of the school random effects.

If these effects were treated as fixed, so the model is Yit = m + ai + ei t we would estimate them by introducing a dummy variable for each school.

It turns out that the estimate of m is the overall mean, the estimate of m+ ai is the school mean, and thus the estimate of ai is the difference between the shool mean and the overall mean.

With random effects we consider the (posterior) distribution of ai given the data yi (the vector with yi t for t=1,...,ni). This can be obtained using Bayes theorem from the (prior) distribution of ai, the conditional distribution of the data yi given ai and the marginal distribution of the data.

The resulting estimate is known as BLUP (for best linear unbiased predictor) and can be shown to be the difference between the school mean and the m.l.e. of the constant "shrunk" towards zero by a factor Ri=s2u/ (s2u + s2e/ni), the ratio of the variances of the random effect and the school mean. Note that shrinkage is minimal when (1) ni is large, (2) s2e is small (students are not informative), and/o r (3) s2u is large (schools are informative). let us calculate these 'by hand'

. gen R = _b[/sigma_u]^2/(_b[/sigma_u]^2+_b[/sigma_e]^2/sn)
 
. gen ahat = R*(sm-_b[_cons])
 
. gen alphahat = sm - om
 
. scatter ahat alphahat, title(School effects on language scores) ///
>  xtitle(MLE) ytitle(BLUP)
 
. graph export langblup.png, width(400) replace
(file langblup.png written in PNG format)

We see some substantial shrinkage for three small schools with fairly low averages:
. list schoolnr sn sm alphahat ahat if first & ahat < -10
 
      +--------------------------------------------------+
      | schoolnr   sn         sm    alphahat        ahat |
      |--------------------------------------------------|
  26. |        2    7   23.71428   -17.22056    -11.2898 |
 331. |       47    8       23.5   -17.43485   -11.91456 |
 838. |      103    4         20   -20.93485   -11.12282 |
      +--------------------------------------------------+

BLUPS can be computed more conveniently using xtmixed. Here's the same model, which is then used to predict the random effects using the reffects option of predict:

. xtmixed langpost || schoolnr: , mle 
 
Performing EM optimization: 
 
Performing gradient-based optimization: 
 
Iteration 0:   log likelihood = -8126.6092  
Iteration 1:   log likelihood = -8126.6092  
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs      =      2287
Group variable: schoolnr                        Number of groups   =       131
 
                                                Obs per group: min =         4
                                                               avg =      17.5
                                                               max =        35
 
                                                Wald chi2(0)       =         .
Log likelihood = -8126.6092                     Prob > chi2        =         .
 
------------------------------------------------------------------------------
    langpost |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   40.36409   .4263626    94.67   0.000     39.52843    41.19974
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolnr: Identity           |
                   sd(_cons) |   4.407781    .351669      3.769711    5.153853
-----------------------------+------------------------------------------------
                sd(Residual) |   8.035411   .1227603      7.798372    8.279656
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =   287.98 Prob >= chibar2 = 0.0000
 
. predict ahat2, reffects
 
. list schoolnr ahat ahat2 if first & schoolnr < 16 
 
      +----------------------------------+
      | schoolnr        ahat       ahat2 |
      |----------------------------------|
   1. |        1   -3.498956   -3.498957 |
  26. |        2    -11.2898    -11.2898 |
  33. |       10   -5.985623   -5.985623 |
  38. |       12    -7.77485   -7.774851 |
  53. |       15   -6.704085   -6.704085 |
      +----------------------------------+

We list a few cases to verify that the two calculations yield the same result.