Germán Rodríguez
Multilevel Models Princeton University

## Gibbs Sampling Using JAGS

JAGS is “Just Another Gibbs Sampler”, a program for the analysis of Bayesian models using MCMC. It uses essentially the same model language as WinBUGS or OpenBUGS. It works closely with R via `rjags`. Let’s try it. You can download it here.

We start by writing our model and saving it to a file, here `hosp.bugs`. This exactly the same code used for WinBUGS here.

``````model {
# N observations
for (i in 1:N) {
hospital[i] ~ dbern(p[i])
bdropout*dropout[i] + bcollege*college[i] + u[group[i]]
}
# M groups
for (j in 1:M) {
u[j] ~ dnorm(0,tau)
}
# Priors
bcons     ~ dnorm(0.0,1.0E-6)
bdistance ~ dnorm(0.0,1.0E-6)
bdropout  ~ dnorm(0.0,1.0E-6)
bcollege  ~ dnorm(0.0,1.0E-6)
# Hyperprior
tau ~ dgamma(0.001,0.001)
}``````

Next we need the data, which need to be in a list. We will read them into R just like we did here, and then produce a list with the number of observations (N) and groups (M), the outcome `hospital` and the predictors `loginc`, `distance`, `dropout` and `college`. We also need `mother` which groups the births.

```> hosp <- read.table("https://data.princeton.edu/pop510/hospital.dat",
> hosp_data <- list(N = 1060, M = 501, hospital = hosp\$hosp,
+     college=hosp\$college, group=hosp\$mother)
```

The other thing we need are initial values. We use two lists, one for each chain, with values scattered around the mle’s. The random effects are generated by JAGS.

```> hosp_inits= list(
+   list(bcons=-2.5, bloginc=0.50, bdistance=-0.05, bdropout=-1.6, bcollege=0.7, tau=0.8),
+   list(bcons=-2.8, bloginc=0.40, bdistance=-0.08, bdropout=-1.4, bcollege=0.9, tau=1.2)
+ )
```

We are now ready. We load the `rjags` library and then compile the model by calling the `jags.model()` function

```> library(rjags)
> model <- jags.model("hosp.bugs", data = hosp_data, inits =hosp_inits, n.chains=2)
Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 1060
Unobserved stochastic nodes: 507
Total graph size: 9884

Initializing model
```

With the model ready we run a burn-in of 1000 iterations using `update()`

```> update(model, n.iter = 1000)
```

We can now run a further 5000 iterations monitoring the parameters of interest. We use `coda.samples()` which is a wrapper around `jags.samples()`.

```> samples <- coda.samples(model,
+   n.iter = 5000)
```

We can then summarize our samples, an object of class `mcmc.list`.

```> summary(samples)

Iterations = 2001:7000
Thinning interval = 1
Number of chains = 2
Sample size per chain = 5000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean      SD  Naive SE Time-series SE
bcollege   1.06253 0.39997 0.0039997      0.0099948
bcons     -3.46375 0.51629 0.0051629      0.0511755
bdistance -0.07609 0.03273 0.0003273      0.0009186
bdropout  -2.03040 0.26780 0.0026780      0.0124242
tau        0.65677 0.22543 0.0022543      0.0211915

2. Quantiles for each variable:

2.5%      25%      50%     75%  97.5%
bcollege   0.3100  0.78861  1.05059  1.3258  1.876
bcons     -4.4925 -3.82303 -3.44202 -3.0972 -2.505
bdistance -0.1418 -0.09771 -0.07619 -0.0538 -0.013
bdropout  -2.5790 -2.20816 -2.02336 -1.8454 -1.535
bloginc    0.4321  0.52144  0.57256  0.6293  0.734
tau        0.3530  0.49264  0.61235  0.7677  1.237
```

The class also has a `plot()` method to produce the usual trace and density plots. R will pause between plots but R Studio does not.

A good way to do a trace plot is to change the aspect ratio. I had good results with

``````
mcmc <- data.frame(rbind(samples[[1]], samples[[2]]))
library(dplyr)
mcmc <- mutate(mcmc,
chain=factor(rep(c(1,2),c(5000,5000))),
iteration = rep(1:5000, 2))
r <- 5000 / (5*range(mcmc\$bdistance) %*% c(-1,1) )
ggplot(mcmc, aes(iteration, bdistance, color=chain)) + geom_line() + coord_fixed(ratio=r)
``````

Continue with