Likelihood via Quadrature
These notes document the January 2001 version of lr3, a library of S (and R) functions for fitting two and three-level logistic regression models using Gauss-Hermite quadrature. I am keeping this page here for the record, but these functions are no longer needed in light of the excellent adaptive quadrature routines implemented in Stata.
Please refer to Section 4 for installation instructions. Section 2 describes fitting two-level models and serves to introduce the software. Section 3 deals with issues that are specific to three-level models and therefore builds on the previous section.
2. Two-Level Models
The first two sections describe the main fitting procedure and present an example. The remaining (starred) sections deal with more technical issues and may be omitted at first reading.
2.1 Model Fitting with lr2
The top-level function here is
It's structure is very similar to
lr2(formula, level2.id, data = sys.parent(), frame = F, quad.pts = 12, beta = NULL, sigma = NULL, ...)
You specify the model using a model formula, a variable that represents the level-2 id, and optionally a data frame to be used in evaluating the outcome, predictors and id. The data frame defaults to the session frame.
frame parameter is used to indicate that the
function should set up all data structures to be used in estimation
but stop short of actually fitting the model. (This option is
useful in complex problems when one may want to call the function
optimizer directly, as described in Section 2.4 below.)
Note that the data must be sorted by id. The function will check that the id's are indeed sorted before constructing a frame for estimation.
If a model is to be fitted then one may specify the number of quadrature points to be used, which defaults to 12. One may use fewer points to obtain initial estimates. One may also fit the model with different numbers of points to check if the quadrature procedure is working, as done by Stata.
Optionally one may provide starting values for the fixed effects. If this parameter is omitted, the function will use glm to fit an ordinary logit model and use the resulting estimates as starting values.
One may also provide starting values for the standard deviation of the random effects. If this parameter is omitted the function uses 0.6. Note that 0 is not an acceptable starting value, this parameter must be non-negative. Also, the iterative procedure works internally with the log of sigma, to avoid problems with small values becoming negative in the course of iteration.
The final dieresis
... refers to parameters that
are passed directly to the iterative procedure. In S the procedure
ms. I recommend setting
to trace the iterations; it provides reassurance that the computer
is actually doing something.
When the iterative procedure completes we make a final call to the evaluation routine to obtain not just the log-likelihood and its first derivatives, but also the second derivatives, which are needed for computing standard errors. This evaluation tends to take much longer than those that bypass the hessian calculation.
The return value of the function is an object of class "lr2".
This is basically the object returned by
that we try to tidy up the names of the parameters and add a
couple of additional elements to the list, including the
number of quadrature points used. The class has its own print
and summary methods, as illustrated below.
2.2 Example: Hospital Deliveries
The example we use is one of the samples in Lillard and Panis (2000),
Chapter 3. The data are available as a data frame that can be read
into S with
dget, and has 5 variables: id, hospital,
income, distance, and education (1= less than high school, 2= high school,
3 = college). Here is a listing of the first three lines:
> hosp[1:3,] id educ income distance hospital 1 4 3 76 1.7 0 2 17 2 275 7.9 0 3 17 2 200 1.8 1
The following instruction fits a two-level logit model using the
natural log of income and introducing dummies for dropouts and
college graduates, as done in Lillard and Panis (2000). We
assign the resulting object to
I suggest you use the
trace=T option when you
> h12 <- lr2(hospital~log(income)+distance+(educ==1)+(educ==3), + level2.id=id, data=hosp) RELATIVE FUNCTION CONVERGENCE.
Typing the name of the object prints it:
> h12 2-level logit model observations: 1060 501 logL = -522.654827451429 Coefficients: (Intercept) log(income) distance educ == 1 educ == 3 log(sigma) sigma -3.368158 0.5620118 -0.07661271 -1.997531 1.033676 0.2174697 1.242928
In the spirit of S, if you want a more conventional table with
estimates and standard errors, use the
> summary(h12) n2-level logit model observations: 1060 501 logL = -522.654827451429 Coef Std. Error t value (Intercept) -3.36815845 0.47892802 -7.032703 log(income) 0.56201176 0.07269351 7.731251 distance -0.07661271 0.03234586 -2.368548 educ == 1 -1.99753084 0.25573359 -7.810983 educ == 3 1.03367611 0.38846467 2.660927 log(sigma) 0.21746975 0.15736486 1.381946 sigma 1.24292783 0.19559316 6.354659
The results agree with aML, and agree even better with the output of Stata's xtlogit procedure, shown below for the same data. Note that Stata uses log(sigma^2) as the parameter. Note also that aML uses a different method for estimating standard errors, which gives results close to, but distinct from ours and Stata's (not shown).
Random-effects logit Number of obs = 1060 Group variable (i) : id Number of groups = 501 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 2.1 max = 10 Wald chi2(4) = 110.08 Log likelihood = -522.65483 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ hospital | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- login | .5620118 .0726935 7.731 0.000 .4195351 .7044885 distance | -.0766127 .0323459 -2.369 0.018 -.1400094 -.013216 dropout | -1.997532 .2557338 -7.811 0.000 -2.498761 -1.496303 college | 1.033677 .3884649 2.661 0.008 .2723 1.795055 _cons | -3.368159 .4789281 -7.033 0.000 -4.306841 -2.429477 ---------+-------------------------------------------------------------------- /lnsig2u | .4349417 .3147296 1.382 0.167 -.1819169 1.0518 ---------+-------------------------------------------------------------------- sigma_u | 1.242929 .1955933 .9130556 1.691981 rho | .6070531 .0750755 .4546458 .7411205 ------------------------------------------------------------------------------ Likelihood ratio test of rho=0: chi2(1) = 29.61 Prob > chi2 = 0.0000
*2.3 Strategies for Convergence
Good starting values can help. You may want to try
starting the procedure using 6 quadrature points or so,
perhaps imposing a limit of 10 iterations.
You can do this with the parameters
control = ms.control(maxiter = 10), trace= T,
which are passed directly to
How do you get the previous estimates, so you can
start from there? You extract them from the
"lr2" (or "ms") object using the
function, or directly as the
For 2-level models, if
np is the length
of the parameter vector then
gives the betas or fixed effects (everything but the
last parameter), and
gives the sigma.
*2.4 Data Structures
If you examine the code for
lr2 you will notice that it
does two main things: it sets up some data structures and then calls
S's function minimizer, which is turn calls back a function called
lr2.g to compute (minus) the log-likelihood and its
The structures are simple: we need a vector y with the outcome and
a model matrix X that codes all factors using dummy variables and
has an extra column for the random effect. You could obtain
a base X matrix by calling
the model formula and
x=T, and then use
cbind(X,0) to add a new column.
The function also tabulates the number of level-1 units in each
level-2 unit, something you could do yourself with
a list with y, X and the tabulation, called
which are what the criterion function actually needs.
The other thing this function does is calculate the quadrature
abscissas and weights, by calling
You could do this yourself, and save often-used objects
gq12=gauher(12) so you don't recalculate
them all the time. A quadrature object is a data frame with
a row for each quadrature point and two columns called x and w.
Once you have y, X, n12 and a quadrature object you can call the actual criterion function yourself:
lr2.g( y, X, theta, n12, gq12)
theta is a vector of parameter estimates with
the betas followed by log(sigma). This function calls C code in
the lr3 library, and returns the (negative) log-likelihood
with the gradient as an attribute. It's twin
also calculates and returns the hessian as an attribute.
Minimizing the function is a simple call to
an indication of the parameters and their starting values:
ms(~lr2.g(y, X, par, n12, gq12), start=list(par=c(b,ls)))
c(b, ls) is a vector of starting values for
the betas and log(sigma). The minimizer takes the names of the
parameters from the name used in the call (in the example par)
and the names, if any, of the starting values, which will be
informative if they come from a glm fit.
Note that if you are close to the m.l.e.'s, you might try
using second derivatives by calling
lr2.g. In practice, however, you
are unlikely to gain much: each iteration will take longer
and you might not save as many iterations as you might expect.
When you call
ms yourself the resulting object is
of class "ms", not "lr2",
so you get the print, summary and coef methods for ms objects.
You can assign the "lr2" class to the object or call
but this is not guaranteed to work.
3. Three-Level Models
Please read the discussion of two-level models if you haven't done so already, as we describe only what's new with three-level models.
3.1 Model Fitting with lr3
The top-level function for fitting three-level models is
with arguments as follows:
lr3(formula, level2.id, level3.id, data = sys.parent(), frame = F, quad.pts = 12, beta = NULL, sigma = NULL, ...)
The structure of the function and the arguments are eactly the same as
for the two-level model, except that we now have a
level.3, and obviously
sigma must have
two components, representing variation at levels 2 and 3 in that order.
3.2 Example: Immunization in Guatemala
The example we will use is the model of complete immunization
in Guatemala used in Rodríguez and Goldman(2001). The
data are provided as an S data frame that can be read with
dget. I called the data frame
Here are the first three rows:
> guimr[1:3,] kid mom cluster numvac kid2p mom25p orderc eth wed hed everwork rural pcind81 1 2 2 1 1 T F 1 Ladino Sec+ Sec+ F F 0.1075042 2 269 185 36 0 T F 2-3 Ladino Prim Prim T F 0.0437295 3 272 186 36 0 T F 1 Ladino Prim Sec+ T F 0.0437295
Note that in the spirit of S, categorical variables are coded as true/false when they are binary and using labels when they have more than two categories. Here is the model used in our paper:
> g12 <- lr3( numvac ~ kid2p + mom25p + orderc + eth + wed + hed + everwork + rural + pcind81, mom, cluster, data=guimr) RELATIVE FUNCTION CONVERGENCE.
Note that we specify "mom" as our
level2.id and "cluster"
(or community) as our
level3.id. Here are the results of
printing the object:
-level logit model observations: 2159 1595 161 logL = -1323.9618222914 Coefficients: (Intercept) kid2p mom25p orderc2-3 orderc4-6 orderc7-16 ethIndNoSpa ethIndSpa wedPrim -1.2334 1.716657 -0.2159945 -0.260719 0.1793735 0.4310829 -0.1724736 -0.08364888 0.433026 wedSec+ hedPrim hedSec+ hedDK everwork rural pcind81 log(sigma2) log(sigma3) 0.4200052 0.5393486 0.5051097 -0.007241173 0.3900436 -0.8882568 -1.149777 0.8397855 0.02431688 sigma2 sigma3 2.31587 1.024615
And here is the more detailed summary:
> summary(g12) 3-level logit model observations: 2159 1595 161 logL = -1323.9618222914 Coef Std. Error t value (Intercept) -1.233400431 0.4831806 -2.55266939 kid2p 1.716657175 0.2173257 7.89900598 mom25p -0.215994520 0.2316288 -0.93250286 orderc2-3 -0.260719016 0.2319183 -1.12418458 orderc4-6 0.179373547 0.2944780 0.60912382 orderc7-16 0.431082926 0.3720310 1.15872859 ethIndNoSpa -0.172473621 0.4893662 -0.35244284 ethIndSpa -0.083648877 0.3633018 -0.23024623 wedPrim 0.433026006 0.2224304 1.94679307 wedSec+ 0.420005173 0.4843674 0.86712108 hedPrim 0.539348585 0.2322901 2.32187475 hedSec+ 0.505109742 0.4142667 1.21928633 hedDK -0.007241173 0.3568837 -0.02029001 everwork 0.390043560 0.2026816 1.92441556 rural -0.888256819 0.3062250 -2.90066765 pcind81 -1.149777002 0.5003787 -2.29781355 log(sigma2) 0.839785461 0.1129482 7.43513559 log(sigma3) 0.024316884 0.1562851 0.15559313 sigma2 2.315870079 0.2615734 8.85361314 sigma3 1.024614951 0.1601320 6.39856332
These results are actually quite close to the published results obtained with an earlier version of the program using 20 quadrature points.
We have also verified the estimates by comparison with aML runs with 5, 12 and 20 quadrature points. As noted earlier, aML uses a different method for estimating standard errors. We have verified ours by using numerical estimates of the hessian or matrix of second derivatives.
*3.3. Strategies for Convergence
Everything we said about two-level models applies here as well, except that there is some help for extracting the coefficients.
coef() method for lr3 fits has
two true/false parameters that are used to indicate whether one
wants beta and sigma, respectively. Thus, the function has
coef(object, beta=T, sigma=T)
The defaults ensure that the usual call, for example
coef(g12), returns all estimates, converting
log(sigmas) to sigmas. You can also use
coef(g12,T,F), to get the fixed effects, and
coef(g12,F,T), to get the standard deviations of the random effects.
The next example shows how to estimate the model using 20
quadrature points starting from the results obtained using 12,
which have been saved as
g20 <- lr3(guim.model, mom, kid, data=guimr, quad.pts=20, beta=coef(g12,T,F), sigma=coef(g12,F,T))
3.4 The Three-level Structures
The three-level function has to do all the work of its two-level counterpart, plus some. Here is what's new at this level.
In addition to checking that the data are sorted by
level-2 id, we need to check that they are sorted by level-3
id. This is done by a helper function
Also, we need to check that the level-2 id is nested on
level-3. What we mean by this is that if the level-3 id
changes, the level-2 id must change as well, a fact checked
by the function
If all goes well the id's are tabulated to count the number
of level-1 units per level-2, which is a vector
and the number of level-1 units per level-3, which goes in
n13. We also need the number of unit-2 units
per level-3, which is calculated by the function
and stored in
The X matrix is expanded to add two columns. Note that
forgetting to add room for the random effects will usually
crash S. This is one reason why going through
frame option returns a list with y, X and the vectors
n12, n13 and n23, which together with a quadrature object are
the needed input for the criterion function:
lr3.g(y, X, theta, n12, n13, n23, gq)
This function calculates the (negative) log-likelihoood and its
gradient by calling C code.
The twin function
lr3.h calculates the hessian as well.
The S-source for all the functions discussed here is available on
a file called
lr3.s, which can be sourced to create
the necessary objects. If the file is in the working directory use
Otherwise you need a fully qualified path.
You also need the compiled C code, which is available as
lr3.o for our Unix system
and as a dynamic link library
This code has to be loaded before it can be used. On Unix use
On Splus for Windows use
lr3.entry.points is an object listing
all entry points in the DLL. This object is created when
you source "lr3.s".
The two sample datasets are available as "hospital.dat" and "guimr.dat".