Germán Rodríguez
Multilevel Models Princeton University

3-Level Models

we now consider 3-level models. We will use time-series data on 1721 students nested within 60 urban public primary schools. The outcome of interest is mathematics achievement. The data were collected at the end of first grade and anually thereafter up to sixth grade, but not all students have six observations.

This dataset is used in Chapter 4 of the manual for HLM 6. It came in three files, one for each level, but I have merged them into a single file available in the course website.

Variance-Components

We read the data and fit a simple variance-components model. Note that we have two specifications for the random part, one at the school level (level 3), and one at the child-level (level 2). They must be specified in this order, going down the hierarchy. Both are null models, so we are simply adding an error term at each level. We also specify maximum likelihood estimation (the default is REML).

. use http://data.princeton.edu/pop510/egm, clear
 
. xtmixed math || schoolid: || childid:  , mle
 
Performing EM optimization: 
 
Performing gradient-based optimization: 
 
Iteration 0:   log likelihood = -12652.991  
Iteration 1:   log likelihood =  -12652.99  
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs      =      7230
 
-----------------------------------------------------------
                |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
       schoolid |       60         18      120.5        387
        childid |     1721          2        4.2          6
-----------------------------------------------------------
 
                                                Wald chi2(0)       =         .
Log likelihood =  -12652.99                     Prob > chi2        =         .
 
------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   -.510021   .0779638    -6.54   0.000    -.6628271   -.3572148
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Identity           |
                   sd(_cons) |   .5635993    .060968      .4559221    .6967071
-----------------------------+------------------------------------------------
childid: Identity            |
                   sd(_cons) |   .7553051   .0223448      .7127556    .8003947
-----------------------------+------------------------------------------------
                sd(Residual) |   1.234461   .0117766      1.211594     1.25776
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(2) =  1404.55   Prob > chi2 = 0.0000
 
Note: LR test is conservative and provided only for reference.

The output shows that we have 60 schools with an average of 120.5 students for a total of 1721 children, with between 2 and 6 observations each for a total of 7230.

Our estimate of the mean score is -0.51 and the standard deviations are 0.56, 0.76 and 1.23 at the school, child and observation levels. I haven't found a command or option to turn these into intraclass correlations, but this is easy to do from first principles:

. mat list e(b)
 
e(b)[1,4]
          math:   lns1_1_1:   lns2_1_1:    lnsig_e:
         _cons       _cons       _cons       _cons
y1  -.51002097  -.57341173  -.28063352   .21063432
 
. scalar v3 = exp(2*_b[lns1_1_1:_cons])
 
. scalar v2 = exp(2*_b[lns2_1_1:_cons])
 
. scalar v1 = exp(2*_b[lnsig_e:_cons])
 
. display v3/(v3+v2+v1), (v2+v3)/(v3+v2+v1)
.13169199 .36820949

I used mat list to find the names of the parameter estimates, which are "log-sigmas", and exponentiate twice the estimate to get the variance.

We find intra-class correlations of 0.13 at the school level and 0.37 at the child level. These numbers represent the correlation in math achievement between two children in the same school, and between two measurements on the same child (in the same school). We can also conclude that 13% of the variation in math achievement can be attributed to the schools and 37% to the children (which includes the school).

We can also obtain posterior Bayes estimates of the school and child random effects. These come up in the same order as the equations.

. predict r3 r2, reffects

We will now compute school and child means and compare with our Bayesian estimates:

. // 3
. egen m3 = mean(math), by(schoolid)
 
. sum m3
 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          m3 |      7230   -.5369243    .5488036  -2.493857   .7969334
 
. gen b3 = _b[_cons] + r3
 
. bysort schoolid: gen lev3 = _n==1 // a level-3 marker
 
. twoway (scatter b3 m3 if lev3) (function y=x, range(-2.5 .8)), ///
>         legend(off) title(School Means) name(school, replace)
 
. // 2
. egen m2 = mean(math), by(childid)
 
. egen n2 = count(math), by(childid)
 
. sum m2 if n2 > 2
 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          m2 |      7220   -.5357388    1.093914    -3.6545      3.141
 
. gen b2 = b3 + r2
 
. bysort schoolid childid: gen lev2 = _n==1 // a level-2 marker
 
. twoway (scatter b2 m2 if n2>2 & lev2) (function y=x, range(-3.6 3.1)), ///
>         legend(off) title(Child Means) name(child, replace)
 
. // graph
. graph combine school child, xsize(4) ysize(2)
 
. graph export egmfig1.png, width(500) replace
(file egmfig1.png written in PNG format)

We see the usual shrinkage, particularly for the child means.

Growth Curves

The obvious predictor at the observation level is study year, which is coded -2.5, -1.5, -0.5, 0.5, 1.5 and 2.5 for the six years of data collection. There's also a variable called grade reflecting the grade minus one, which is almost the same thing:

. tab year grade
 
           |                               grade
      year |         0          1          2          3          4          5 |     Total
-----------+------------------------------------------------------------------+----------
      -2.5 |       131          0          0          0          0          0 |       131 
      -1.5 |     1,346          0          0          0          0          0 |     1,346 
       -.5 |       104      1,412          4          0          0          0 |     1,520 
        .5 |         1        194      1,470          7          0          0 |     1,672 
       1.5 |         0          5        174      1,200          8          0 |     1,387 
       2.5 |         0          1          5        149      1,010          9 |     1,174 
-----------+------------------------------------------------------------------+----------
     Total |     1,582      1,612      1,653      1,356      1,018          9 |     7,230 

We will follow the HLM manual in using years to track progress in math achievement over time. Year zero is near the start of of third grade and I will refer to it as 'third grade' for simplicity.

Our first model will allow for random intercepts and slopes with no predictors at levels 2 or 3. Note that I specify covariance(unstructured) at each level, so the intercept and slope can be correlated at level 2 as well as level 3.

. xtmixed math year ///
>         || schoolid: year, covariance(unstructured) ///
>       || childid:  year, covariance(unstructured) mle 
 
Performing EM optimization: 
 
Performing gradient-based optimization: 
 
Iteration 0:   log likelihood = -8169.1304  
Iteration 1:   log likelihood = -8163.2241  
Iteration 2:   log likelihood =  -8163.116  
Iteration 3:   log likelihood = -8163.1156  
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs      =      7230
 
-----------------------------------------------------------
                |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
       schoolid |       60         18      120.5        387
        childid |     1721          2        4.2          6
-----------------------------------------------------------
 
                                                Wald chi2(1)       =   2499.87
Log likelihood = -8163.1156                     Prob > chi2        =    0.0000
 
------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |   .7630273   .0152609    50.00   0.000     .7331165    .7929382
       _cons |  -.7793052   .0578294   -13.48   0.000    -.8926487   -.6659616
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured       |
                    sd(year) |   .1049623   .0122048      .0835713    .1318286
                   sd(_cons) |   .4065909   .0441625      .3286269    .5030512
            corr(year,_cons) |   .3994274   .1367405      .1037116    .6302611
-----------------------------+------------------------------------------------
childid: Unstructured        |
                    sd(year) |   .1060966   .0092415      .0894453    .1258477
                   sd(_cons) |   .8002871   .0157337      .7700361    .8317264
            corr(year,_cons) |   .5510119   .0670882      .4061691    .6688458
-----------------------------+------------------------------------------------
                sd(Residual) |   .5490335   .0060473      .5373081    .5610147
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(6) =  5595.58   Prob > chi2 = 0.0000
 
Note: LR test is conservative and provided only for reference.

The first thing you will notice is that as models become more complex they take a bit longer to run. The results can be compared with the output from HLM 6 starting on page 80 of the manual and are in very close agreement.

The fixed-effect estimates yield an intercept of -0.78 and a slope of 0.76. These values can be interpreted as the average achievement at third grade and the average rate of change per year. Clearly average math achievement increases significantly over time.

The best way to obtain variances or correlations of the random effects is to use estat recovariance, which has an option correlation to get the correlation matrix. (There is also an option to print just one level.)

. estat recov
 
Random-effects covariance matrix for level schoolid
 
             |      year      _cons 
-------------+----------------------
        year |  .0110171            
       _cons |  .0170463   .1653162 
 
Random-effects covariance matrix for level childid
 
             |      year      _cons 
-------------+----------------------
        year |  .0112565            
       _cons |  .0467852   .6404594 
 
. estat recov, corr
 
Random-effects correlation matrix for level schoolid
 
             |      year      _cons 
-------------+----------------------
        year |         1            
       _cons |  .3994274          1 
 
Random-effects correlation matrix for level childid
 
             |      year      _cons 
-------------+----------------------
        year |         1            
       _cons |  .5510119          1 

So the intercepts and slope have correlations of 0.40 at the school level and 0.55 at the child level. (In a hierarchical model the correlation at level 2 is never lower than at level 3.) The child estimate (0.55) can be interpreted as the correlation between a child's achievement around third grade and her rate of growth. The school estimate (0.40) represents the correlation between a school's achievement around third grade and the rate of change per year.

Most of the variance in the intercept comes from the child level, whereas the variance in the slope is about the same at each level. These results indicate fairly significant variation among schools in mean achievement around third grade as well as in the rates of change over time.

We can now estimate growth curves for each school and for each child, using predict with the reffects option. With a total of four random effects we need four variables to store the Bayes posterior means. I'll number the residuals by level, and will use a for intercept and b for slope:

. predict r3b r3a r2b r2a, reffects
 
. // 3
. capture drop b3
 
. gen a3 = _b[_cons] + r3a
 
. gen b3 = _b[year] + r3b
 
. local xtitle xtitle(intercept) ytitle(slope)
 
. twoway scatter b3 a3 if lev3, `xtitle' title(School) ///
>         name(school, replace)
 
. // 2
. capture drop b2
 
. gen a2 = _b[_cons] + r3a + r2a // same as a3 + r2a
 
. gen b2 = _b[year]  + r3b + r2b // same as b3 + r3b
 
. local xtitle xtitle(intercept) ytitle(slope)
 
. twoway (scatter b2 a2 if lev2) ///
>          (scatter b2 a2 if lev2 & b3 < .55), legend(off) ///
>         title(Child) `xtitle' name(child, replace)
 
. // graph
. graph combine school child, xsize(4) ysize(2) ///
>         title(School and Child Intercepts and Slopes)
 
. graph export egmfig2.png, width(400) replace
(note: file egmfig2.png not found)
(file egmfig2.png written in PNG format)

We see substantial variation in school levels and rates of change in math achievement. One school is notable for having a reasonable third-grade mean but a low rate of change per year (just above 0.5). I highlight the children from this school in the next graph, that shows their expected third-grade scores and rates of change.

Note the positive correlation between intercepts and slopes at both levels, so children and schools with higher third-grade math achievement also tend to show greater gains over time.

We can also plot fitted lines at the school and child level. It might not be bad idea to test for linearity of the growth curves.

Growth Curve Predictors

One way to look for school or individual characteristics that may be associated with levels and trends in mathematics achievement is to regress the Bayes estimates of intercepts and slopes on potential predictors at each level.

Here I'll follow the HLM manual in considering a model where a child's growth curve depends on ethnicity, with different intercepts and slopes for whites, blacks and hispanics, and where the school average curve depends on the percent of students with low income.

One way to develop a model of this type is to write down the equation for each level and then substitute down the hierarchy to obtain the fixed and random components in the complete model. This process shows the need to create cross-level interactions between year and the predictors at the school and child level.

Before creating cross-products we consider centering. This is not needed (not, in my view, advisable) for dummy variables, where zero is meaningful because it identifies the reference category. One might consider centering the percent poor, but for consistency with the HLM manual I will not. Bear in mind that the main effects then refer to a school with no poor students.

. gen yearBlack = year * black
 
. gen yearHispanic = year * hispanic
 
. gen yearLowinc = year * lowinc
 
. xtmixed math year black hispanic lowinc ///
>         yearBlack yearHispanic yearLowinc ///
>         || schoolid: year, covariance(unstructured) ///
>         || childid:  year, covariance(unstructured) mle
 
Performing EM optimization: 
 
Performing gradient-based optimization: 
 
Iteration 0:   log likelihood = -8125.8349  
Iteration 1:   log likelihood = -8119.6993  
Iteration 2:   log likelihood = -8119.6037  
Iteration 3:   log likelihood = -8119.6035  
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs      =      7230
 
-----------------------------------------------------------
                |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
       schoolid |       60         18      120.5        387
        childid |     1721          2        4.2          6
-----------------------------------------------------------
 
                                                Wald chi2(7)       =   3324.79
Log likelihood = -8119.6035                     Prob > chi2        =    0.0000
 
------------------------------------------------------------------------------
        math |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |   .8745122   .0391403    22.34   0.000     .7977987    .9512258
       black |  -.5021083   .0778753    -6.45   0.000    -.6547412   -.3494755
    hispanic |  -.3193816   .0860935    -3.71   0.000    -.4881217   -.1506414
      lowinc |  -.0075778   .0016908    -4.48   0.000    -.0108918   -.0042638
   yearBlack |  -.0309253   .0224586    -1.38   0.169    -.0749433    .0130926
yearHispanic |   .0430865    .024659     1.75   0.081    -.0052442    .0914172
  yearLowinc |  -.0013689   .0005226    -2.62   0.009    -.0023933   -.0003446
       _cons |   .1406379   .1274906     1.10   0.270    -.1092391    .3905149
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
schoolid: Unstructured       |
                    sd(year) |   .0893313   .0115087      .0693972    .1149913
                   sd(_cons) |   .2794454   .0351444      .2183964    .3575595
            corr(year,_cons) |   .0327362   .1782169     -.3067244    .3648084
-----------------------------+------------------------------------------------
childid: Unstructured        |
                    sd(year) |    .105327   .0092652      .0886468    .1251458
                   sd(_cons) |   .7888289   .0155546       .758924    .8199121
            corr(year,_cons) |   .5611815   .0680564      .4135205    .6800794
-----------------------------+------------------------------------------------
                sd(Residual) |   .5491733   .0060468      .5374488    .5611535
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(6) =  4797.28   Prob > chi2 = 0.0000
 
Note: LR test is conservative and provided only for reference.

Results can be compared with the HLM output starting on page 88 of the manual. In this model each child has her own growth curve, but the intercept and slope depend on the school's income composition, school unobserved characteristics, the child's owh ethnicity, and unobserved characteristics of the child.

One way to think about the estimates is as follows. We start with a growth curve that has 0.141 points of math achievement by the middle of third grade and increases by 0.875 points per year provided the school has no low income students. Each percentage point of low income students is associated with a third grade mean that's 0.008 lower and increases 0.001 points less per year, so a school with 100% poor students would have a third-grade mean of -0.617 and increase by 0.738 points per year. The intercept and slope in any particular school will vary around the average line for its percent poor.

Next come characteristics of the individual, in this case ethnicity. Blacks have a third-grade mean 0.502 points below whites in the same school, and the increase per year is 0.021 points less than for whites in the same school. The results for hispanics indicate a third-grade mean 0.319 points below whites but a growth rate per year 0.043 points higher than whites in the same school. Each child's growth curve, however, will vary around the average line for kids of his ethnicity in his school.

The random effects indicate that the average third-grade score varies by school with a standard deviation of 0.28, while the rate of change has a standard deviation of 0.089. Interestingly the correlation between intercept and slope has practically vanished, suggesting that it can be attributed to the school's socio-economic composition.

Variation at the child level is even more sustantial; expected third-grade scores vary with a standard deviation of 0.789 and the rates of change vary with a standard deviation of 0.105. These two random effects remain highly correlated, so kids who have high levels of math proficiency in third grade also tend to improve faster over time.

Plotting Illustrative Growth Curves

It's useful to compare fixed and random effects on a similar scale. Here I will consider four types of schools, with fixed and random effects one standard deviation above and below the mean. In each school I will consider kids from the three ethnic groups whose intercepts and slopes fall one standard deviation above and below the mean of their ethnic group in their school.

The easiest way to do this is to create a few variables stored in the first few observations to represent all the combinations we need. First I generate the patterns needed

. foreach var in sr sf cr cf X {
  2.         quietly gen `var' = .
  3. }
 
. local k = 0
 
. foreach sr in -1 1 { 
  2.         foreach sf in -1 1 { 
  3.                 foreach cr in -1 1 { 
  4.                         forvalues cf=1/3 {
  5.                                 foreach X in -2.5 2.5 {
  6.                                         local k=`k'+1
  7.                                         quietly replace sr=`sr' in `k'
  8.                                         quietly replace sf=`sf' in `k'
  9.                                         quietly replace cr=`cr' in `k'
 10.                                         quietly replace cf=`cf' in `k'
 11.                                         quietly replace X=`X' in `k'
 12.                                 }
 13.                         }
 14.                 }
 15.         }
 16. }

Next I compute the intercept and slope as well as the predicted regression line for each combination of values of the school and child fixed and random effects

. sum lowinc if lev3
 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      lowinc |        60    73.73667    27.27405          0        100
 
. gen A = _b[math:_cons] + /// constant
>                 (75 + sf*25)*_b[math:lowinc] + /// school fixed
>                 sr*exp(_b[lns1_1_2:_cons]) +   /// school random
>                 _b[math:black]*(cf==2) + _b[math:hispanic]*(cf==3) + /// child fixed
>                 cr*exp(_b[lns2_1_2:_cons])                           // child random
(7182 missing values generated)
 
. capture drop B
 
. gen B = _b[math:year] + /// slope
>                 (75 + sf*25)*_b[math:yearLowinc] + /// school fixed
>                 sr*exp(_b[lns1_1_1:_cons]) +       /// school random
>                 _b[math:yearBlack]*(cf==2) + _b[math:yearHispanic]*(cf==3) + /// child fixed
>                 cr*exp(_b[lns2_1_1:_cons])                                   // child random
(7182 missing values generated)
 
. capture drop Y
 
. gen Y = A + B * X
(7182 missing values generated)

Finally I label the variables and plot the lines for the ethnic groups with individual random effects one standard deviation above and below the mean for each of the four combinations of school fixed and random effects:

 
. label define sr -1 "-1 school sigs" 1 "+1 school sigs"
 
. label define sf -1 "50% poor" 1 "100% poor"
 
. label values sr sr
 
. label values sf sf
 
. twoway (line Y X if cf==1 ,connect(ascending) lpat(solid)) ///
>        (line Y X if cf==2 ,connect(ascending) lpat(dash) ) ///
>        (line Y X if cf==3 ,connect(ascending) lpat(dot)  ) ///
>         , by(sf sr, legend(off) ///
>         note("Legend: solid=white, dash=black, dot=hispanic, for +1/-1 child sigmas") ///
>         title(Growth Curves by School and Child Characteristics) ) ///
>         xtitle(Year) ytitle(Math)
 
. graph export egmfig3.png, width(400) replace
(note: file egmfig3.png not found)
(file egmfig3.png written in PNG format)

The plot shows clearly the relative importance of fixed and random effects at the school and individual levels. At the school level the difference between 50% and 100% poor is smaller than the two-sigma difference in unobserved school characteristics. At the individual level the differences between whites, blacks and hispanics are dwarfed by the two-sigma differences in unobserved child characteristics. Note also that within each school type the average line for hispanics starts at about the average level of blacks in first grade but catches up with the average level of whites by sixth grade.