Germán Rodríguez
Multilevel Models Princeton University

The Stata manual [XT, p.249] analyzes data from the 1989 Bangladesh fertility survey, previously analyzed by Huq and Cleland, and by Ng et al.

```. webuse bangladesh, clear

. desc

obs:         1,934                          Bangladesh Fertility Survey, 1989
vars:             7                          28 May 2009 20:27
size:        27,076 (99.9% of memory free)   (_dta has notes)
---------------------------------------------------------------------------------------------------------------
storage  display     value
variable name   type   format      label      variable label
---------------------------------------------------------------------------------------------------------------
district        byte   %9.0g                  District
c_use           byte   %9.0g       yesno      Use contraception
urban           byte   %9.0g       urban      Urban or rural
age             float  %6.2f                  Age, mean centered
child1          byte   %9.0g                  1 child
child2          byte   %9.0g                  2 children
child3          byte   %9.0g                  3 or more children
---------------------------------------------------------------------------------------------------------------
Sorted by:  district
```

#### A Random-Intercept Model

We will fit a logit model predicting contraceptive use by urban residence, age and number of children, with a random intercept by district. We use xtmelogit to be able to obtain estimates of the district effect.

```. xtmelogit c_use urban age child1 child2 child3 || district: , mle

Refining starting values:

Iteration 0:   log likelihood = -1219.2682
Iteration 1:   log likelihood = -1209.3544
Iteration 2:   log likelihood = -1207.1895

Iteration 0:   log likelihood = -1207.1895
Iteration 1:   log likelihood = -1206.8323
Iteration 2:   log likelihood = -1206.8322
Iteration 3:   log likelihood = -1206.8322

Mixed-effects logistic regression               Number of obs      =      1934
Group variable: district                        Number of groups   =        60

Obs per group: min =         2
avg =      32.2
max =       118

Integration points =   7                        Wald chi2(5)       =    109.60
Log likelihood = -1206.8322                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
urban |   .7322764   .1194857     6.13   0.000     .4980887    .9664641
age |  -.0264982   .0078916    -3.36   0.001    -.0419654   -.0110309
child1 |   1.116002   .1580921     7.06   0.000     .8061466    1.425856
child2 |   1.365895    .174669     7.82   0.000      1.02355     1.70824
child3 |   1.344031   .1796549     7.48   0.000     .9919141    1.696148
_cons |   -1.68929   .1477592   -11.43   0.000    -1.978892   -1.399687
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
district: Identity           |
sd(_cons) |   .4643477   .0789531      .3327464    .6479975
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) =    43.39 Prob>=chibar2 = 0.0000

. estimates store xtme

. mata exp(st_matrix("e(b)"))
1             2             3             4             5             6             7
+---------------------------------------------------------------------------------------------------+
1 |   2.07980972   .9738498451   3.052623893   3.919229511   3.834469892   .1846506578   .4643477362  |
+---------------------------------------------------------------------------------------------------+
```

The estimates show that when we compare women with the same age and family size, contraceptive use is higher in urban than in rural areas, with double the odds of using contraception [exp(0.732)=2.08]. Within an area and family size, contraceptive use declines with age, with 2.6% lower odds per year of age [exp(-.026)-1 = -.026]. Within an area and age, contraceptive use is higher among women with a child, and much higher among women with two or more children, than among those with no children, with odds ratios of three and almost four [exp(1.116)=3.05, exp(1.366)=3.92 and exp(1.344)=3.834].

There is very subtantial variation in contraceptive use across districts. The standard deviation of 0.46 indicates that women in a district which is one standard deviation above the mean have odds of using contraception that are 59% higher than comparable women in an average district [exp(0.464) = 1.59]. The standard deviation is also equivalent to a correlation of 0.06 in the latent propensities to use contraception of comparable women in the same district [0.464^2/(0.464^2+ π^2/3) = 0.06].

#### ML and Empirical Bayes Estimates

We can identify districts where women are more or less likely to use contraception by "estimating" the random effects. We can obtain maximum likelihood estimates by treating the estimated linear predictor from this model as an offset and then fitting a separate regression model in each district, something easy to do with `statsby` as shown by Rabe-Hesketh and Skrondal(2008, p. 265). So here we go:

```. predict xb, xb

. statsby mle=_b[_cons], by(district) saving(ml,replace): ///
>   logit c_use, offset(xb)
(running logit on estimation sample)

command:  logit c_use, offset(xb)
mle:  _b[_cons]
by:  district

Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..x.......x.....................................x.    50
..........

. merge m:1 district using ml

Result                           # of obs.
-----------------------------------------
not matched                             0
matched                             1,934  (_merge==3)
-----------------------------------------
```

The mle is not defined when all women in a district have the same outcome, so everybody (or nobody) uses contraception. This occurs for 3 districts.

The empirical Bayes estimate, based on the mode of the posterior distribution of the random effects, can be obtained with `predict, ref`

```. estimates restore xtme
(results xtme are active now)

. predict re, ref

. bysort district: gen first=_n==1

. scatter re mle if first || function y=x, range(-1 1) ///
>         ,title(ML and EB Estimates of District Effects) ///
>         xtitle(Maximum likleihood) ytitle(Empirical Bayes) ///
>         legend(off)

. graph export bd1.png, width(400) replace
(file bd1.png written in PNG format)
``` Comparing the EB and ML estimates we see the usual shrinkage towards zero, which is particularly noticeable in four districts:

```. egen ni = count(1), by(district)

. egen pi = mean(c_use), by(district)

. list district ni pi mle re if first & mle < -1, clean

district   ni         pi         mle          re
352.         10   13   .0769231   -1.563583   -.3736235
764.         24   14   .0714286   -1.790981   -.4534406
1740.         55    6   .1666667   -1.609306   -.3196343
1851.         59   10         .1   -1.700366   -.3979423
```

These are all fairly small districts with less than 15 women each. However, there are five other districts with small sizes that do not exhibit this phenomenon. The reason is that in those cases the estimated random effect was close to zero anyway. (There are also two small districts with all zeroes or all ones.)

```. list district ni pi mle re if first & ni < 15 & mle > -1, clean

district   ni         pi         mle          re
138.          3    2          1           .    .2074448
845.         26   13   .3846154    .0580263    .0228793
1101.         33   14   .4285714   -.1926461   -.0801968
1215.         37   13   .5384616    .7878244    .3173832
1228.         38   14   .2857143   -.4925295   -.1871244
1335.         42   11   .5454546    .5789949    .2067375
1600.         49    4          0           .   -.1864481
```

One final word on EB estimates. If you use `gllamm` to fit this model you can use `gllapred` to obtain EB estimates. That procedure computes posterior means, whereas `predict, ref` computes posterior modes. Because the posterior distributions are not symmetric these two estimates can differ, but usually they are quite close.

#### Predicted Probabilities

Subject-specific probabilities can be computed easily from first principles by setting the fixed covariates and the random effects to illustrative values. In this dataset age is centered at the sample mean. Here are probabilities for a woman with one child and average age in urban and rural areas of the average district:

```. di invlogit(_b[_cons] + _b[child1])
.36047847

. di invlogit(_b[_cons] + _b[child1] + _b[urban])
.53966357
```

We see that the odds ratio of about two translates to probabilities of 36.0 and 54.0%, or 18 percentage points.

Population-average probabilities can also be computed, but they require integration over the distribution of the random effect. At this time these probabilities can only be computed by using `gllapred` after `gllamm`, as they have not (yet) been added to `predict` after `xtmelogit`. They can, however, be computed from first principles using Mata, as illustrated below.

I wrote a Mata function to compute abscissas and weights for integrating over a standard normal random variable by translating the C code for Gauss-Hermite quadrature in Numerical Recipes in C. The values for 12 quadrature points are also available for general use in a Stata file called `gauher12.dta` right here. (Gauss-Hermite integrates over exp(-t^2), but I changed variables to a standard normal, multipying the abcsissas by sqrt(2) and dividing the weights by sqrt(π).)

Here's the calculation of the rural and urban probabilities averaged over the distribution of the district effects. We list the estimates to find the location of the coefficients of interest: the constant is 6th, the urban coef is first, the one-child coef is 3rd, and the log of sigma is 7th:

```. mat list e(b)

e(b)[1,7]
eq1:        eq1:        eq1:        eq1:        eq1:        eq1:   lns1_1_1:
urban         age      child1      child2      child3       _cons       _cons
y1   .73227641  -.02649815   1.1160015   1.3658951   1.3440312  -1.6892896  -.76712158

. mata
------------------------------------------------- mata (type end to exit) -------------------------------------
: gh12 = gauher(12) // first column has abcissas, second weights

: b = st_matrix("e(b)")

: sig = exp(b)

: u = r = 0

: for (i=1; i <= 12; i++) {
>         r = r + invlogit(b + b + sig*gh12[i,1]) * gh12[i,2]
>         u = u + invlogit(b + b + sig*gh12[i,1] + b) * gh12[i,2]
> }

: r, u
1             2
+-----------------------------+
1 |  .3668268221    .537737042  |
+-----------------------------+

: end
---------------------------------------------------------------------------------------------------------------
```

The population-average effect of urban residence is a difference from 36.7 to 53.8 or 17.1 percentage points, averaged across all districts.

#### A Random Slope Model

The next model will treat the urban-rural difference as random at the district level. (Most districts have urban and rural areas.) This is somewhat equivalent to allowing an interation between urban residence and district, but instead of estimating a separate urban-rural difference for each district, we assume that the differences are drawn from a normal distribution:

```. xtmelogit c_use urban age child1 child2 child3 ///
>         || district: urban, covariance(unstructured) mle

Refining starting values:

Iteration 0:   log likelihood = -1215.8594  (not concave)
Iteration 1:   log likelihood = -1204.0802
Iteration 2:   log likelihood = -1199.7994

Iteration 0:   log likelihood = -1199.7994
Iteration 1:   log likelihood = -1199.4784
Iteration 2:   log likelihood = -1199.3158
Iteration 3:   log likelihood =  -1199.315
Iteration 4:   log likelihood =  -1199.315

Mixed-effects logistic regression               Number of obs      =      1934
Group variable: district                        Number of groups   =        60

Obs per group: min =         2
avg =      32.2
max =       118

Integration points =   7                        Wald chi2(5)       =     97.50
Log likelihood =  -1199.315                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
c_use |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
urban |   .8157872   .1715519     4.76   0.000     .4795516    1.152023
age |   -.026415    .008023    -3.29   0.001    -.0421398   -.0106902
child1 |    1.13252   .1603285     7.06   0.000      .818282    1.446758
child2 |   1.357739   .1770522     7.67   0.000     1.010724    1.704755
child3 |   1.353827   .1828801     7.40   0.000     .9953882    1.712265
_cons |   -1.71165   .1605617   -10.66   0.000    -2.026345   -1.396954
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
district: Unstructured       |
sd(urban) |   .8162856   .1975237      .5080068     1.31164
sd(_cons) |   .6242943   .1035136       .451079    .8640247
corr(urban,_cons) |  -.7964729   .1151556     -.9361775   -.4394904
------------------------------------------------------------------------------
LR test vs. logistic regression:     chi2(3) =    58.42   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. mata exp(st_matrix("e(b)"))
2.260954838
```

It is very important in this model to specify the covariance matrix of the random effects as unstructured, so we allow correlation between the level of contraceptive use and the urban-rural difference in each district.

The estimates show that the odds of using contraception on the average district are about double in urban compared to urban areas, but the difference varies substantially by district, as shown by the standard deviation of the urban coefficient, 0.82. For example the urban-rural difference in logits is essentially zero in a district with differential one standard deviation below the mean (0.8158-0.8163 = -0.001), and equivalent to an odds ratio of about five in a district with differential one standard deviation above the mean (0.8158+0.8163 = 1.632).

#### EB Estimates of the Intercept and Slope

We now compute Empirical Bayes (EB) estimates of the random intercept and random slop, representing the estimated district effects on the leel of contraceptive use and the urban-rural difference. (We could compute ML estimates as well, using a procedure similar to the previous section, but we will focus on the EB estimates.)

```. predict ri rs, reffects

. scatter rs ri, title(District Effects on Contraceptive Use) ///
>         xtitle(Effect on Level of Use) ///
>         ytitle(Effect on Urban-Rural Differential)

. graph export bd2.png, width(400) replace
(file bd2.png written in PNG format)
``` We see a clear negative correlation. Districts where contraceptive use is higher than average, after considering the age and number of children of the residents, tend to show a much smaller urban-rural difference, reflecting the correlation of -0.796 estimated above.

We could also estimate separate random effects for urban and rural areas in each district by including dummies for both urban and rural and omitting the constant in the fixed and random parts. If you do that you will find that the two effects are nearly independent.

Continue with