B. Generalized Linear Model Theory Table of Contents B.3. Tests of Hypotheses

B.2  Maximum Likelihood Estimation

An important practical feature of generalized linear models is that they can all be fit to data using the same algorithm, a form of iteratively re-weighted least squares. In this section we describe the algorithm.

Given a trial estimate of the parameters [^(b)], we calculate the estimated linear predictor [^(hi)] = xi[^(b)] and use that to obtain the fitted values [^(mi)] = g-1([^(hi)]). Using these quantities, we calculate the working dependent variable

zi = ^
hi
 
+ (yi - ^
mi
 
) d hi
d mi
,
(B.6)
where the rightmost term is the derivative of the link function evaluated at the trial estimate.

Next we calculate the iterative weights

wi = pi/[ b(qi) ( dhi
dmi
)2],
(B.7)
where b(qi) is the second derivative of b(qi) evaluated at the trial estimate and we have assumed that ai(f) has the usual form f/pi. This weight is inversely proportional to the variance of the working dependent variable zi given the current estimates of the parameters, with proportionality factor f.

Finally, we obtain an improved estimate of b regressing the working dependent variable zi on the predictors xi using the weights wi, i.e. we calculate the weighted least-squares estimate

^
b
 
= (XWX)-1 XWz,
(B.8)
where X is the model matrix, W is a diagonal matrix of weights with entries wi given by (B.7) and z is a response vector with entries zi given by (B.6).

The procedure is repeated until successive estimates change by less than a specified small amount. McCullagh and Nelder (1989) prove that this algorithm is equivalent to Fisher scoring and leads to maximum likelihood estimates. These authors consider the case of general ai(f) and include f in their expression for the iterative weight. In other words, they use w*i = fwi, where wi is the weight used here. The proportionality factor f cancels out when you calculate the weighted least-squares estimates using (B.8), so the estimator is exactly the same. I prefer to show f explicitly rather than include it in W.

Example: For normal data with identity link hi = mi, so the derivative is dhi/ dmi = 1 and the working dependent variable is yi itself. Since in addition b(qi) = 1 and pi = 1, the weights are constant and no iteration is required. [¯]

In Sections B.4 and B.5 we derive the working dependent variable and the iterative weights required for binomial data with link logit and for Poisson data with link log. In both cases iteration will usually be necessary.

Starting values may be obtained by applying the link to the data, i.e. we take [^(mi)] = yi and [^(hi)] = g([^(mi)]). Sometimes this requires a few adjustments, for example to avoid taking the log of zero, and we will discuss these at the appropriate time.


Continue with B.3. Tests of Hypotheses
Copyright © Germán Rodríguez, 1993-2000. Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.