Germán Rodríguez
Multilevel Models Princeton University

Gibbs Sampling Using JAGS

JAGS is “Just Another Gibbs Sampler”, a program for the analysis of Bayesian models using MCMC. It uses essentially the same model language as WinBUGS or OpenBUGS. It works closely with R via rjags. Let’s try it. You can download it here.

We start by writing our model and saving it to a file, here hosp.bugs. This exactly the same code used for WinBUGS here.

model {
    # N observations
    for (i in 1:N) {
        hospital[i] ~ dbern(p[i])
        logit(p[i]) <- bcons + bloginc*loginc[i] + bdistance*distance[i] + 
            bdropout*dropout[i] + bcollege*college[i] + u[group[i]] 
    }
    # M groups
    for (j in 1:M) {
        u[j] ~ dnorm(0,tau)
    }
    # Priors
    bcons     ~ dnorm(0.0,1.0E-6)
    bloginc   ~ dnorm(0.0,1.0E-6)
    bdistance ~ dnorm(0.0,1.0E-6)
    bdropout  ~ dnorm(0.0,1.0E-6)
    bcollege  ~ dnorm(0.0,1.0E-6)
    # Hyperprior
    tau ~ dgamma(0.001,0.001)
}

Next we need the data, which need to be in a list. We will read them into R just like we did here, and then produce a list with the number of observations (N) and groups (M), the outcome hospital and the predictors loginc, distance, dropout and college. We also need mother which groups the births.

> hosp <- read.table("http://data.princeton.edu/pop510/hospital.dat", 
+     header = FALSE)
> names(hosp) <- c("hosp","loginc","distance","dropout","college","mother")
> hosp_data <- list(N = 1060, M = 501, hospital = hosp$hosp, 
+     loginc=hosp$loginc, distance=hosp$distance, dropout=hosp$dropout, 
+     college=hosp$college, group=hosp$mother)

The other thing we need are initial values. We use to list, one for each chain, with values scattered around the mle’s. The random effects are generated by JAGS.

> hosp_inits= list(
+   list(bcons=-2.5, bloginc=0.50, bdistance=-0.05, bdropout=-1.6, bcollege=0.7, tau=0.8),
+   list(bcons=-2.8, bloginc=0.40, bdistance=-0.08, bdropout=-1.4, bcollege=0.9, tau=1.2)
+ )

We are now ready. We load the rjags library and then compile the model by calling the jags.model() function

> library(rjags)
> model <- jags.model("hosp.bugs", data = hosp_data, inits =hosp_inits, n.chains=2)
Compiling model graph
   Resolving undeclared variables
   Allocating nodes
Graph information:
   Observed stochastic nodes: 1060
   Unobserved stochastic nodes: 507
   Total graph size: 9884

Initializing model

With the model ready we run a burn-in of 1000 iterations using update()

> update(model, n.iter = 1000)

We can now run a further 5000 iterations monitoring the parameters of interest. We use coda.samples() which is a wrapper around jags.samples().

> samples <- coda.samples(model, 
+   variable.names = c("bcons","bloginc","bdistance","bdropout","bcollege","tau"), 
+   n.iter = 5000)

We can then summarize our samples, an object of class mcmc.list.

> summary(samples)

Iterations = 2001:7000
Thinning interval = 1 
Number of chains = 2 
Sample size per chain = 5000 

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

              Mean      SD  Naive SE Time-series SE
bcollege   1.06253 0.39997 0.0039997      0.0099948
bcons     -3.46375 0.51629 0.0051629      0.0511755
bdistance -0.07609 0.03273 0.0003273      0.0009186
bdropout  -2.03040 0.26780 0.0026780      0.0124242
bloginc    0.57619 0.07844 0.0007844      0.0077968
tau        0.65677 0.22543 0.0022543      0.0211915

2. Quantiles for each variable:

             2.5%      25%      50%     75%  97.5%
bcollege   0.3100  0.78861  1.05059  1.3258  1.876
bcons     -4.4925 -3.82303 -3.44202 -3.0972 -2.505
bdistance -0.1418 -0.09771 -0.07619 -0.0538 -0.013
bdropout  -2.5790 -2.20816 -2.02336 -1.8454 -1.535
bloginc    0.4321  0.52144  0.57256  0.6293  0.734
tau        0.3530  0.49264  0.61235  0.7677  1.237

The class also has a plot() method to produce the usual trace and density plots. R will pause between plots but R Studio does not.

A good way to do a trace plot is to change the aspect ratio. I had good results with


mcmc <- data.frame(rbind(samples[[1]], samples[[2]]))
library(dplyr)
mcmc <- mutate(mcmc, 
  chain=factor(rep(c(1,2),c(5000,5000))),
  iteration = rep(1:5000, 2))
r <- 5000 / (5*range(mcmc$bdistance) %*% c(-1,1) )
ggplot(mcmc, aes(iteration, bdistance, color=chain)) + geom_line() + coord_fixed(ratio=r)