Germán Rodríguez
Multilevel Research Princeton University

These notes document the January 2001 version of lr3, a library of S (and R) functions for fitting two and three-level logistic regression models using Gauss-Hermite quadrature. I am keeping this page here for the record, but these functions are no longer needed in light of the excellent adaptive quadrature routines implemented in Stata.

### 1. Overview

Please refer to Section 4 for installation instructions. Section 2 describes fitting two-level models and serves to introduce the software. Section 3 deals with issues that are specific to three-level models and therefore builds on the previous section.

### 2. Two-Level Models

The first two sections describe the main fitting procedure and present an example. The remaining (starred) sections deal with more technical issues and may be omitted at first reading.

#### 2.1 Model Fitting with lr2

The top-level function here is `lr2`. It's structure is very similar to `lm` and `glm`:

```lr2(formula, level2.id, data = sys.parent(),
frame = F, quad.pts = 12, beta = NULL, sigma = NULL, ...)
```

You specify the model using a model formula, a variable that represents the level-2 id, and optionally a data frame to be used in evaluating the outcome, predictors and id. The data frame defaults to the session frame.

The `frame` parameter is used to indicate that the function should set up all data structures to be used in estimation but stop short of actually fitting the model. (This option is useful in complex problems when one may want to call the function optimizer directly, as described in Section 2.4 below.)

Note that the data must be sorted by id. The function will check that the id's are indeed sorted before constructing a frame for estimation.

If a model is to be fitted then one may specify the number of quadrature points to be used, which defaults to 12. One may use fewer points to obtain initial estimates. One may also fit the model with different numbers of points to check if the quadrature procedure is working, as done by Stata.

Optionally one may provide starting values for the fixed effects. If this parameter is omitted, the function will use glm to fit an ordinary logit model and use the resulting estimates as starting values.

One may also provide starting values for the standard deviation of the random effects. If this parameter is omitted the function uses 0.6. Note that 0 is not an acceptable starting value, this parameter must be non-negative. Also, the iterative procedure works internally with the log of sigma, to avoid problems with small values becoming negative in the course of iteration.

The final dieresis `...` refers to parameters that are passed directly to the iterative procedure. In S the procedure used is `ms`. I recommend setting `trace=T` to trace the iterations; it provides reassurance that the computer is actually doing something.

When the iterative procedure completes we make a final call to the evaluation routine to obtain not just the log-likelihood and its first derivatives, but also the second derivatives, which are needed for computing standard errors. This evaluation tends to take much longer than those that bypass the hessian calculation.

The return value of the function is an object of class "lr2". This is basically the object returned by `ms`, except that we try to tidy up the names of the parameters and add a couple of additional elements to the list, including the number of quadrature points used. The class has its own print and summary methods, as illustrated below.

#### 2.2 Example: Hospital Deliveries

The example we use is one of the samples in Lillard and Panis (2000), Chapter 3. The data are available as a data frame that can be read into S with `dget`, and has 5 variables: id, hospital, income, distance, and education (1= less than high school, 2= high school, 3 = college). Here is a listing of the first three lines:

```> hosp[1:3,]
id educ income distance hospital
1  4    3     76      1.7        0
2 17    2    275      7.9        0
3 17    2    200      1.8        1
```

The following instruction fits a two-level logit model using the natural log of income and introducing dummies for dropouts and college graduates, as done in Lillard and Panis (2000). We assign the resulting object to `h12`. I suggest you use the `trace=T` option when you try this.

```> h12 <- lr2(hospital~log(income)+distance+(educ==1)+(educ==3),
+ level2.id=id, data=hosp)
RELATIVE FUNCTION CONVERGENCE.
```

Typing the name of the object prints it:

```> h12
2-level logit model
observations:  1060 501
logL =  -522.654827451429
Coefficients:
(Intercept) log(income)    distance educ == 1 educ == 3 log(sigma)    sigma
-3.368158   0.5620118 -0.07661271 -1.997531  1.033676  0.2174697 1.242928
```

In the spirit of S, if you want a more conventional table with estimates and standard errors, use the `summary` method:

```> summary(h12)
n2-level logit model
observations:  1060 501
logL =  -522.654827451429
Coef Std. Error   t value
(Intercept) -3.36815845 0.47892802 -7.032703
log(income)  0.56201176 0.07269351  7.731251
distance -0.07661271 0.03234586 -2.368548
educ == 1 -1.99753084 0.25573359 -7.810983
educ == 3  1.03367611 0.38846467  2.660927
log(sigma)  0.21746975 0.15736486  1.381946
sigma  1.24292783 0.19559316  6.354659
```

The results agree with aML, and agree even better with the output of Stata's xtlogit procedure, shown below for the same data. Note that Stata uses log(sigma^2) as the parameter. Note also that aML uses a different method for estimating standard errors, which gives results close to, but distinct from ours and Stata's (not shown).

```Random-effects logit                            Number of obs      =      1060
Group variable (i) : id                         Number of groups   =       501

Random effects u_i ~ Gaussian                   Obs per group: min =         1
avg =       2.1
max =        10

Wald chi2(4)       =    110.08
Log likelihood  = -522.65483                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
hospital |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
login |   .5620118   .0726935      7.731   0.000       .4195351    .7044885
distance |  -.0766127   .0323459     -2.369   0.018      -.1400094    -.013216
dropout |  -1.997532   .2557338     -7.811   0.000      -2.498761   -1.496303
college |   1.033677   .3884649      2.661   0.008          .2723    1.795055
_cons |  -3.368159   .4789281     -7.033   0.000      -4.306841   -2.429477
---------+--------------------------------------------------------------------
/lnsig2u |   .4349417   .3147296      1.382   0.167      -.1819169      1.0518
---------+--------------------------------------------------------------------
sigma_u |   1.242929   .1955933                          .9130556    1.691981
rho |   .6070531   .0750755                          .4546458    .7411205
------------------------------------------------------------------------------
Likelihood ratio test of rho=0:      chi2(1) =    29.61   Prob > chi2 = 0.0000
```

#### *2.3 Strategies for Convergence

Good starting values can help. You may want to try starting the procedure using 6 quadrature points or so, perhaps imposing a limit of 10 iterations. You can do this with the parameters `control = ms.control(maxiter = 10), trace= T`, which are passed directly to `ms`.

How do you get the previous estimates, so you can start from there? You extract them from the "lr2" (or "ms") object using the `coef()` function, or directly as the `par` component.

For 2-level models, if `np` is the length of the parameter vector then `object\$par[-np]` gives the betas or fixed effects (everything but the last parameter), and `exp(object\$par[np])` gives the sigma.

#### *2.4 Data Structures

If you examine the code for `lr2` you will notice that it does two main things: it sets up some data structures and then calls S's function minimizer, which is turn calls back a function called `lr2.g` to compute (minus) the log-likelihood and its derivative.

The structures are simple: we need a vector y with the outcome and a model matrix X that codes all factors using dummy variables and has an extra column for the random effect. You could obtain a base X matrix by calling `lm` or `glm` with the model formula and `x=T`, and then use `cbind(X,0)` to add a new column.

The function also tabulates the number of level-1 units in each level-2 unit, something you could do yourself with `table(id2)`. Setting `frame=T` returns a list with y, X and the tabulation, called `n12`, which are what the criterion function actually needs.

The other thing this function does is calculate the quadrature abscissas and weights, by calling `gauher(quad.pts)`. You could do this yourself, and save often-used objects such as `gq12=gauher(12)` so you don't recalculate them all the time. A quadrature object is a data frame with a row for each quadrature point and two columns called x and w.

Once you have y, X, n12 and a quadrature object you can call the actual criterion function yourself:

```lr2.g( y, X, theta, n12, gq12)
```

Where ` theta` is a vector of parameter estimates with the betas followed by log(sigma). This function calls C code in the lr3 library, and returns the (negative) log-likelihood with the gradient as an attribute. It's twin `lr2.h` also calculates and returns the hessian as an attribute.

Minimizing the function is a simple call to `ms`, with an indication of the parameters and their starting values:

```ms(~lr2.g(y, X, par, n12, gq12), start=list(par=c(b,ls)))
```

Where `c(b, ls)` is a vector of starting values for the betas and log(sigma). The minimizer takes the names of the parameters from the name used in the call (in the example par) and the names, if any, of the starting values, which will be informative if they come from a glm fit.

Note that if you are close to the m.l.e.'s, you might try using second derivatives by calling `lr2.h` instead of `lr2.g`. In practice, however, you are unlikely to gain much: each iteration will take longer and you might not save as many iterations as you might expect.

When you call `ms` yourself the resulting object is of class "ms", not "lr2", so you get the print, summary and coef methods for ms objects. You can assign the "lr2" class to the object or call the `lr2.print` and `lr2.summary` methods, but this is not guaranteed to work.

### 3. Three-Level Models

Please read the discussion of two-level models if you haven't done so already, as we describe only what's new with three-level models.

#### 3.1 Model Fitting with lr3

The top-level function for fitting three-level models is `lr3`, with arguments as follows:

```lr3(formula, level2.id, level3.id, data = sys.parent(),
frame = F, quad.pts = 12, beta = NULL, sigma = NULL, ...)
```

The structure of the function and the arguments are eactly the same as for the two-level model, except that we now have a `level2.id` and a `level.3`, and obviously `sigma` must have two components, representing variation at levels 2 and 3 in that order.

#### 3.2 Example: Immunization in Guatemala

The example we will use is the model of complete immunization in Guatemala used in Rodríguez and Goldman(2001). The data are provided as an S data frame that can be read with `dget`. I called the data frame `guimr`. Here are the first three rows:

```> guimr[1:3,]
kid mom cluster numvac kid2p mom25p orderc    eth  wed  hed everwork rural   pcind81
1   2   2       1      1     T      F      1 Ladino Sec+ Sec+        F     F 0.1075042
2 269 185      36      0     T      F    2-3 Ladino Prim Prim        T     F 0.0437295
3 272 186      36      0     T      F      1 Ladino Prim Sec+        T     F 0.0437295
```

Note that in the spirit of S, categorical variables are coded as true/false when they are binary and using labels when they have more than two categories. Here is the model used in our paper:

```> g12 <- lr3( numvac ~ kid2p + mom25p + orderc + eth + wed + hed + everwork +
rural + pcind81, mom, cluster, data=guimr)
RELATIVE FUNCTION CONVERGENCE.
```

Note that we specify "mom" as our `level2.id` and "cluster" (or community) as our `level3.id`. Here are the results of printing the object:

```-level logit model
observations:  2159 1595 161
logL =  -1323.9618222914
Coefficients:
(Intercept)    kid2p     mom25p orderc2-3 orderc4-6 orderc7-16 ethIndNoSpa   ethIndSpa  wedPrim
-1.2334 1.716657 -0.2159945 -0.260719 0.1793735  0.4310829  -0.1724736 -0.08364888 0.433026
wedSec+   hedPrim   hedSec+        hedDK  everwork      rural   pcind81 log(sigma2) log(sigma3)
0.4200052 0.5393486 0.5051097 -0.007241173 0.3900436 -0.8882568 -1.149777   0.8397855  0.02431688
sigma2   sigma3
2.31587 1.024615
```

And here is the more detailed summary:

```> summary(g12)
3-level logit model
observations:  2159 1595 161
logL =  -1323.9618222914
Coef Std. Error     t value
(Intercept) -1.233400431  0.4831806 -2.55266939
kid2p  1.716657175  0.2173257  7.89900598
mom25p -0.215994520  0.2316288 -0.93250286
orderc2-3 -0.260719016  0.2319183 -1.12418458
orderc4-6  0.179373547  0.2944780  0.60912382
orderc7-16  0.431082926  0.3720310  1.15872859
ethIndNoSpa -0.172473621  0.4893662 -0.35244284
ethIndSpa -0.083648877  0.3633018 -0.23024623
wedPrim  0.433026006  0.2224304  1.94679307
wedSec+  0.420005173  0.4843674  0.86712108
hedPrim  0.539348585  0.2322901  2.32187475
hedSec+  0.505109742  0.4142667  1.21928633
hedDK -0.007241173  0.3568837 -0.02029001
everwork  0.390043560  0.2026816  1.92441556
rural -0.888256819  0.3062250 -2.90066765
pcind81 -1.149777002  0.5003787 -2.29781355
log(sigma2)  0.839785461  0.1129482  7.43513559
log(sigma3)  0.024316884  0.1562851  0.15559313
sigma2  2.315870079  0.2615734  8.85361314
sigma3  1.024614951  0.1601320  6.39856332
```

These results are actually quite close to the published results obtained with an earlier version of the program using 20 quadrature points.

We have also verified the estimates by comparison with aML runs with 5, 12 and 20 quadrature points. As noted earlier, aML uses a different method for estimating standard errors. We have verified ours by using numerical estimates of the hessian or matrix of second derivatives.

#### *3.3. Strategies for Convergence

Everything we said about two-level models applies here as well, except that there is some help for extracting the coefficients.

The `coef()` method for lr3 fits has two true/false parameters that are used to indicate whether one wants beta and sigma, respectively. Thus, the function has signature

```coef(object, beta=T, sigma=T)
```

The defaults ensure that the usual call, for example `coef(g12)`, returns all estimates, converting log(sigmas) to sigmas. You can also use

• `coef(g12,T,F)`, to get the fixed effects, and
• `coef(g12,F,T)`, to get the standard deviations of the random effects.

The next example shows how to estimate the model using 20 quadrature points starting from the results obtained using 12, which have been saved as `g12`

```
g20 <- lr3(guim.model, mom, kid, data=guimr, quad.pts=20,
beta=coef(g12,T,F), sigma=coef(g12,F,T))
```

#### 3.4 The Three-level Structures

The three-level function has to do all the work of its two-level counterpart, plus some. Here is what's new at this level.

In addition to checking that the data are sorted by level-2 id, we need to check that they are sorted by level-3 id. This is done by a helper function `is.sorted(id)`. Also, we need to check that the level-2 id is nested on level-3. What we mean by this is that if the level-3 id changes, the level-2 id must change as well, a fact checked by the function `is.nested(id2,id3)`.

If all goes well the id's are tabulated to count the number of level-1 units per level-2, which is a vector `n12`, and the number of level-1 units per level-3, which goes in `n13`. We also need the number of unit-2 units per level-3, which is calculated by the function `make.n23` and stored in `n23`.

The X matrix is expanded to add two columns. Note that forgetting to add room for the random effects will usually crash S. This is one reason why going through `lr3` pays.

The `frame` option returns a list with y, X and the vectors n12, n13 and n23, which together with a quadrature object are the needed input for the criterion function:

```lr3.g(y, X, theta, n12, n13, n23, gq)
```

This function calculates the (negative) log-likelihoood and its gradient by calling C code. The twin function `lr3.h` calculates the hessian as well.

### 4. Installation

The S-source for all the functions discussed here is available on a file called `lr3.s`, which can be sourced to create the necessary objects. If the file is in the working directory use

```source("lr3.s")
```

Otherwise you need a fully qualified path.

You also need the compiled C code, which is available as relocatable code `lr3.o` for our Unix system and as a dynamic link library `lr3.dll` for Windows systems.

This code has to be loaded before it can be used. On Unix use

```dyn.load("lr3.o")
```

On Splus for Windows use

```dll.load("lr3.dll",lr3.entry.points)
```

where `lr3.entry.points` is an object listing all entry points in the DLL. This object is created when you source "lr3.s".

The two sample datasets are available as "hospital.dat" and "guimr.dat".