Germán Rodríguez
Multilevel Models Princeton University

Random Effects Logistic Regression using WinBUGS

This page is an html version of a winBUGS compound document thatI first produced in the Spring of 2001. The document is available on this website, right click on hospBug.odc and select 'Save as ...' from the context menu. Then open it in WinBUGS. The file hospital.txt with the data is also available here. To see how to run WinBUGS in batch mode and import results into Stata for further analysis click here.

We will analyze the hospital delivery data from Lillard and Panis (2000) that we have already analyzed by approximate quasi-likelihood using MLwiN, and by maximum likelihood using Stata (and earlier using the R/S-Plus function lr2).

The Model

Recall that we have two-level data. In BUGS and WinBUGS it is simpler to work with a single subscript and use a vector to map level-1 units to level-2 units. We assume Yi ~ Binomial(pi,1) where

logit(pi) = b0+ b1 log(income) + b2 distance + b3 dropout + b4 college + uj(i)

We provide non-informative priors for all fixed effects, assuming bk ~ Normal(0,0.000001). The second parameter is the precision (the reciprocal of the variance), so the variance is one million. We assume that

uj(i) ~ N(0, t)

where the precision t has a gamma prior with parameters 0.001 and 0.001, so the mean is one and the variance is 1000.

To specify this model in WinBUGS we use a declarative language that lists deterministic and stochastic nodes. For each of the N observations we say the outcome is Bernoulli , and specify the logit of the probability, which depends on the x's and a random effect u. For each of the M groups we specify the random effect as normally distributed. We then specify the prior of each coefficient and the hyper prior of the precision.

model {
    # N observations
    for (i in 1:N) {
        hospital[i] ~ dbern(p[i])
        logit(p[i]) <- bcons + blonginc*loginc[i] + bdistance*distance[i] + 
            bdropout*dropout[i] + bcollege*college[i] + u[group[i]] 
    }
    # M groups
    for (j in 1:M) {
        u[j] ~ dnorm(0,tau)
    }
    # Priors
    bcons     ~ dnorm(0.0,1.0E-6)
    bloginc   ~ dnorm(0.0,1.0E-6)
    bdistance ~ dnorm(0.0,1.0E-6)
    bdropout  ~ dnorm(0.0,1.0E-6)
    bcollege  ~ dnorm(0.0,1.0E-6)
    # Hyperprior
    tau ~ dgamma(0.001,0.001)
}

To check this model first highlight the word model. Then go to the Model menu and select Specification. A small window known as the Specification Tool will pop up. Click on the button labelled "check model". The result will appear in the status bar (at the bottom of the main WinBUGS window). This model "is syntactically correct".

It is very important to realize that when you write a BUGS program you are not doing any calculations, you are just declaring the nodes in your model. The loops are simply a device to declare a large number of random variables in a few lines. Also, there are a few differences between Classic BUGS and WinBUGS.

Models can also be specified in graphical form using Doodles.

The Data

The data must be provided in R/S-Plus list format or as a rectangular matrix. We will use a combination of the two methods. First we declare the number of observations in a list:

list(N=1060,M=501)

Highlight the word list and then click the button labeled "load data" on the Specification Tool. The status bar should read "data loaded".

Next we read the data from a file. I first use R to do all necessary transformations, including calculating the log of income and the two dummies for education. We also need to calculate the group index. This can be done from the group Id, but these are not always consecutive numbers. Once you are done with the calculations you can dput or write the data to a file. I prefer write because the output is more readable, although you then have to add square brackets to denote vectors. For the record this is the R code I used:

hosp = read.table("filename")
attach(hosp)
loginc = log(income)
dropout = as.numeric(educ==1)
college = as.numeric(educ==3)
group = cumsum(c(1,diff(id)>0))
hosp = data.frame(hospital,loginc,distance,dropout,college,group)
write(t(hosp),"hosp.txt",ncolumns=6)

The file can be opened as another winBUGS window; make sure you specify the type as text. Insert a top line with the names of the variables, with each one followed by an empty pair of square brackets to indicate that these are arrays. WinBUGS now requires the keyword end to mark the end of the data. The test file should look as follows:

hospital[] loginc[] distance[] dropout[] college[] group[]
0 4.330733 1.7 0 1 1
0 5.616771 7.9 0 0 2
1 5.298317 1.8 0 0 2
0 3.850148 6.2 0 0 2
...
END

Now select the window with the data, highlight the name of the first variable (in this example the word hospital) and click on the "load data" button on the Specification Tool. The status bar should read "data loaded".

You can now compile the model by clicking on the "compile" button on the Specification Tool. If all goes well the status bar will read "model compiled". A common error is to omit one of the variables. If you make a mistake at this stage you have to go back to check the model again.

Initial Values

We also need initial values for the parameters. For the fixed effects we can use the estimates from an ordinary logistic regression. For tau we use the arbitrary value 1. These values must be on a list:

list(bcons=-2.69517150,bloginc=0.45852520,bdistance=-0.07627492,
    bdropout=-1.57007674,bcollege=0.82072827,tau=1)

To read the starting values into WinBUGS select the word list and then click the button labelled "load inits" in the specification tool. The status bar should read "initial values loaded: model contains uninitialized nodes". This occurs because we have provided initial values for the betas and tau but not for the random effects ui(j). Fortunately these can be generated by WinBUGS; click on the button labelled "gen inits". The status bar will now say "initial values generated: model initialized".

Gibbs Sampling

We are now ready to start our Gibbs sampler. Go to the model menu and select Update. A small dialog called the Update Tool will pop up. Often analysts start with a 'burn in' run intended to discard the first few hundred samples. If you want to do this enter 500 or 1000 on the updates box and click on "update". The status bar will read "updating". After a few seconds the message will change to indicate how long the updates took.

Now go to the Inference menu and select Samples... A small dialog called the Sample Monitor Tool will pop up. Under node type each of the parameters you want to monitor and click set. In our example you must type bcons, bloginc, ..., bcollege, eta. Remember to click on set after each one. WinBUGS will now store the sampled values of each parameter. Note that you can specify a beginning iteration, so this is another way of discarding a burn in. For purposes of this exercise I will store all.

Next type * in the node box and click on trace. A window will pop up with graphs that can show the values of all six parameters as they are generated. Now go to the update tool, select 1000 and click on "update". You may want to rearrange the windows so you have a good view. (Although I have noticed the traces don't always refresh as intended.) On my machine the updates took 50 seconds. Click on "history" on the Sample Monitor Tool to get the complete trace. You can also click on "density" to see the empirical posterior distribution of each parameter. I decided to run another 4000 iterations, which took 200 seconds.

Click on "stats" on the Sample Monitoring Tool to see statistics on the nodes you have monitored. In my run I obtained the results shown below but, since these are Monte Carlo Markov Chains, your results will be slighthly different. (Again, you can specify a beginning value in the tool, so we could exclude the first 1000 samples from the statistics. The results are very similar.)

 node	   mean	    sd       MC error    2.5%   median    97.5%     start sample
bcons	  -3.415    0.4591   0.03737   -4.312   -3.422    -2.497     1    5000
bloginc    0.569    0.06929  0.005454   0.4286   0.5691    0.702     1    5000
bdistance -0.07682  0.03361  0.001079  -0.1455  -0.07612  -0.01328   1    5000
bdropout  -2.037    0.2675   0.01387   -2.578   -2.029    -1.532     1    5000
bcollege   1.056    0.3994   0.01209    0.2857   1.045     1.841     1    5000
tau        0.6526   0.2497   0.02286    0.3505   0.5943    1.339     1    5000

It is interesting to compare these results with the maximum likelihod analysis. Note in particular that the estimate of s, calculated as the square root of 1/t, is 1.238, in close agreement with the m.l.e. of 1.243, and both larger than the MQL and PQL estimates. (Note, by the way, that transforming the mean t is not the same as computing the mean of the transformed t's. We should probably have monitored s as well.)

If you want to save the actual values for further analysis, click on "coda" on the Sample Monitor Tool. This will produce two new windows. One, labelled 'CODA for chain 1' contains two columns, one is an index and the other a value. The second window, labelled 'CODA indices' maps the values to the parameters. In my run, for example, the values of bcons are the first 5000 entries.