Germán Rodríguez

Generalized Linear Models
Princeton University
Due Friday, November 18, 2016

Cameron and Trivedi (2009) have some interesting data on the number
of office-based doctor visits by adults aged 25-64 based on the 2002
Medical Expenditure Panel Survey. We will use data for the most recent
wave, available in the datasets section of the website as
`docvis.dta`

.

(a) Fit a Poisson regression model with the number of doctor visits
(`docvis`

), as the outcome. We will use the same predictors
as Cameron and Trivedi, namely health insurance status
(`private`

), health status (`chronic`

), gender
(`female`

) and income (`income`

), but will add
two indicators of ethnicity (`black`

and
`hispanic`

). There are many more variables one could add,
but we'll keep things simple.

(b) Interpret the coefficient of `black`

and test its
significance using a Wald test and a likelihood ratio test.

(c) Compute a 95% confidence interval for the effect of private insurance and interpret this result in terms of doctor visits.

(d) Compute the deviance and Pearson chi-squared statistics for this model. Does the model fit the data? Is there evidence of overdispersion?

(e) Predict the proportion expected to have exactly zero doctor
visits and compare with the observed proportion. You will find the
formula for Poisson probabilities in the notes. The probability of zero
is simply *e*^{ − μ}.

(a) Suppose the variance is proportional to the mean rather than equal to the mean. Estimate the proportionality parameter using Pearson's chi-squared and use this estimate to correct the standard errors.

(b) What happens to the significance of the black coefficient once we allow for extra-Poisson variation? Could we test this coefficient using a likelihood ratio test? Explain.

(c) Compare the standard errors adjusted for over-dispersion with the robust or "sandwich" estimator of the standard errors.

(a) Fit a negative binomial regression model using the same outcome and predictors as in part 1.a. Comment on any remarkable changes in the coefficients.

(b) Interpret the coefficient of black and test its significance using a Wald test and a likelihood ratio test. Compare your results with parts 1.b and 2.b

(c) Predict the percent of respondents with zero doctor visits
according to this model and compare with part 1.e. You will find a
formula for negative binomial probabilities in the addendum to the
notes. The probability of zero is given by
[*β* / (*μ* + *β*)]^{α}
where
*α* = *β* = 1 / *σ*^{2}.

(d) Interpret the estimate of *σ*^{2} in this model and test its
significance, noting carefully the distribution of the criterion.

(e) Use predicted values from this model to divide the sample into
twenty groups of about equal size, compute the mean and variance of
`docvis`

in each group, and plot these values. Superimpose
curves representing the over-dispersed Poisson and negative binomial
variance functions and comment.

(a) Try a zero-inflated Poisson model with the same predictors of part 1a in both the Poisson and inflate equations.

(b) Predict the proportion of respondents with zero doctor visits according to this model and compare with 1.e and 3.c. (Don't forget that there are two ways of having an outcome of zero in this model.)

(c) Interpret the coefficients of black in the two equations. Is the effect related to whether blacks visit the doctor at all? To how often they visit?

Considering the results obtained so far and bearing in mind parsimony and goodness of fit, which of the models used here provides the best description of the data? Make sure you provide a clear justification of your choice.

Posted November 9, 2016