Police Stops in New York
Gelman and Hill have data on police stops by ethnicity for 75 precincts in New York City during a 15 month period. The dataset has 12 records for each precinct, one each for 12 combinations of ethnicity (black, hispanic, white) and type of crime, or "suspected charges" (violent, weapons, property, drug). The data are available in the course website as a Stata file frisk.dta. Here we will focus on stops for weapons offenses, which are the most common. Make sure you keep only these.
The outcome of interest is a count, the number of stops. To a first approximation this might be assumed to be proportional to the population in the precinct. HOwever, we will follow Gelman and Hill and use the number of arrests recorded by the Department of Criminal Justice Services (DCJS) for each ethnic group in the precinct in the previous year as our measure of exposure. (We also multiply by 15/12 to scale to a 15 month period to match the stops, but this is not essential.) For later comparison compute relative stop rate, defined as the ratio of stops to arrests per year, and take its log.
Fit a Poisson model and interpret the parameter estimates. You should find that blacks and hispanics are stopped much more often than whites, relative to previous year's arrests in the same precinct. Describe the predicted relative stop rates by ethnicity, noting that these are the same for all precincts.(This model exhibits overdispersion and a conventional analysis would probably move on to negative binomial models, but we will account for an important source of overdispersion through random precinct effects.)
Fit a random intercept Poisson model where the constant is allowed to vary randomly from one precinct to another. Interpret carefully the estimated coefficients in the fixed and random parts of the models, with special emphasis on the standard deviation of the random intercept. Compute best linear unbiased predictors (BLUPS) of the random intercepts and combine these with the fixed effects to predict the relative stop rate for each precinct and ethnic group. Use your estimates to identify any precincts that have disproportionately high or low stop rates. HOw do these estimates differ from the observed relative stop rates?
Fit a random coefficients Poisson model where both the constant and the dummy variables for blacks and hispanics are allowed to vary randomly from one precinct to another. Make sure you allow these random effects to be freely correlated. Interpret carefully the estimated coefficients in the fixed and random parts of the models, with special emphasis on the interpretation of the standard deviations of the random effects. Compute BLUPS of the random effects and use these to compute Bayesian estimates of the predicted relative stop rates by precinct and ethnicity. Comment on any precincts that have disproportionately high or low stop rates for a given ethnic group. How do these estimates compare with the observed relative stop rates?
How would you explore in a multilevel framework the hypothesis that a precinct's ethnic composition affects the relative stop rates? Of particular interest is the notion that blacks are more likely to be stopped a disproportionate number of times in precincts that have a higher proportion of black residents, even after taking into account last year's black arrests
