Germán Rodríguez

Multilevel Models
Princeton University
We now illustrate the use of the latest Hamiltonian Monte Carlo (HMC)
algorithm, the No-U Turn Sampler (NUTS) of Hoffman and Gelman (2011), as implemented
in the package Stan, see mc-stan.org for
detailed information. Stan is written in C++ and can be run from the command line,
R or Python. Here we illustrate running it on the hospital data using the
R-interface `rstan`

.

A quick reminder of the data and model. We have information on hospital delivery
(yes or no) for 1060 pregnancies of 501 women. Predictors of interest are the
log of income, distance to the nearest hospital, and education, represented by
indicators of high school dropouts and college graduates. This is a simple
random-intercept logit model that can easily be fitted by maximum likelihood
using Stata's `xtlogit`

or `melogit`

, as well as
R's `glmer()`

in the `lme4`

package. We find comparison
with Bayesian estimates of interest.

Stan uses a language similar to Bugs, but is quite different under the hood. The model specification is translated into the C++ language, and the program is then compiled and run on your data. Sampling is very fast once the model has been compiled. The package is very well documented and installation proceeded uneventfully on my home and work machines. Once you have the necessary tools installed the process is fairly transparent.

Here's the Stan program for the hospital data, a long string assigned to an R object named`hosp_code`

hosp_code <- ' data { int N; // number of obs (pregnancies) int M; // number of groups (women) int K; // number of predictors int y[N]; // outcome row_vector[K] x[N]; // predictors int g[N]; // map obs to groups (pregnancies to women) } parameters { real alpha; real a[M]; vector[K] beta; real<lower=0,upper=10> sigma; } model { alpha ~ normal(0,100); a ~ normal(0,sigma); beta ~ normal(0,100); for(n in 1:N) { y[n] ~ bernoulli(inv_logit( alpha + a[g[n]] + x[n]*beta)); } } '

We start by specifying the data, which consist of an n-vector of outcomes (y), an n by k matrix of predictors (x), and an n-vector mapping pregnancies to women (g). The parameters are the constant (alpha), the woman-specific random effects (a), the coefficients of the k predictors (beta) and the standard deviation of the random effects (sigma) which is in (0,10).

The model provides non-informative normal priors for the fixed effects alpha and beta. Stan specifies normal distributions using the standard deviations, not the variance, nor the precision used by BUGS. The hyperprior for sigma is uniform(0,10) as is determined by the limits given. Finally the outcomes are Bernoulli with probability given by the inverse logit of the linear predictor.

To run the model in R we use the `rstan`

package.
First we create an object with the data using the same names as in the Stan
code. The data themselves are in an R object called `hosp`

.

> hosp_data <- list(N=nrow(hosp),M=501,K=4,y=hosp[,1],x=hosp[,2:5],g=hosp[,6])

We are now ready to call `stan()`

to run the model.
Here I specify 2 chains of 2000 observations each.

hfit <- stan(model_code=hosp_code, model_name="hospitals", data=hosp_data, iter=2000, chains=2) TRANSLATING MODEL 'hospitals' FROM Stan CODE TO C++ CODE NOW. COMPILING THE C++ CODE FOR MODEL 'hospitals' NOW. ... SAMPLING FOR MODEL 'hospitals' NOW (CHAIN 1). Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 58.065 seconds (Warm-up) 24.373 seconds (Sampling) 82.438 seconds (Total) SAMPLING FOR MODEL 'hospitals' NOW (CHAIN 2). Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 58.074 seconds (Warm-up) 23.186 seconds (Sampling) 81.26 seconds (Total)

Stan prints some useful information, including folder names I supressed, and provides feedback every 200 samples. As you can see it took less than three minutes to run the two chains.

The code below produces a summary of results.
The default `print()`

method in `rstan`

prints all
the woman-specific random effects, so I used the `pars`

parameter
to select the coefficients of primary interest. The default also prints
the 2.5, 25, 50, 75, and 97.5 percentiles but I trimmed the list to
obtain the median and a 95% credible interval. Finally the default prints
just one decimal place and I prefer a bit more. Results are very similar
to the Gibbs sampling estimates in hospBUGS

> print(hfit, pars=c("alpha","beta[1]","beta[2]","beta[3]","beta[4]","sigma"), + probs = c(0.025, 0.50, 0.975), digits_summary=3) Inference for Stan model: hospitals. 2 chains, each with iter=2000; warmup=1000; thin=1; post-warmup draws per chain=1000, total post-warmup draws=2000. mean se_mean sd 2.5% 50% 97.5% n_eff Rhat alpha -3.488 0.022 0.503 -4.495 -3.473 -2.575 508 1.001 beta[1] 0.581 0.003 0.074 0.446 0.579 0.736 544 1.000 beta[2] -0.077 0.001 0.033 -0.141 -0.077 -0.011 2000 0.999 beta[3] -2.088 0.016 0.279 -2.668 -2.078 -1.576 291 1.001 beta[4] 1.094 0.013 0.424 0.267 1.080 1.942 1139 1.000 sigma 1.385 0.023 0.199 1.039 1.364 1.813 76 1.012 Samples were drawn using NUTS(diag_e) at Mon Mar 24 21:12:06 2014. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1).

Finally, here is the tranditional trace plot for the same parameters. I stack them in one column with six rows, and exclude the warmup. Note that sigma exhibits slow mixing, also evident in the low effective sample size of 76.

> traceplot(hfit,c("alpha","beta[1]","beta[2]","beta[3]","beta[4]","sigma"), ncol=1,nrow=6,inc_warmup=F)

I realize the output would be better labelled if I had called the coefficients bcons, bloginc, bdistance, bdropout and bcollege, as I did with winBUGS, rather than alpha and beta, but I thought there was an advantage to keeping the code generic. In particular, the exact same Stan model (and compiled program) can be run on any two-level random-intercept logit model by changing just the data.

Continue with Metropolis Hastings