B.3. Tests of Hypotheses Table of Contents B.5. Poisson Errors and Link Log

B.4  Binomial Errors and Link Logit

We apply the theory of generalized linear models to the case of binary data, and in particular to logistic regression models.

B.4.1  The Binomial Distribution

First we verify that the binomial distribution B(ni,pi) belongs to the exponential family of Nelder and Wedderburn (1972). The binomial probability distribution function (p.d.f.) is

fi(yi) =

ni
yi


piyi (1-pi)ni-yi.
(B.14)
Taking logs we find that
logfi(yi) = yi log(pi) + (ni-yi)log(1-pi) + log

ni
yi


.
Collecting terms on yi we can write
logfi(yi) = yi log( pi
1-pi
) + ni log(1-pi) + log

ni
yi


.

This expression has the general exponential form

logfi(yi) = yi qi - b(qi)
ai(f)
+ c(yi,f)
with the following equivalences: Looking first at the coefficient of yi we note that the canonical parameter is the logit of pi
qi = log( pi
1-pi
).
(B.15)

Solving for pi we see that

pi = eqi
1 + eqi
,     so       1-pi = 1
1 + eqi
.
If we rewrite the second term in the p.d.f. as a function of qi, so log(1-pi) = -log(1+eqi), we can identify the cumulant function b(qi) as
b(qi) = ni log(1+eqi).
The remaining term in the p.d.f. is a function of yi but not pi, leading to
c(yi,f) = log

ni
yi


.
Note finally that we may set ai(f) = f and f = 1.

Let us verify the mean and variance. Differentiating b(qi) with respect to qi we find that

mi = b(qi) = ni eqi
1+eqi
= ni pi,
in agreement with what we knew from elementary statistics. Differentiating again using the quotient rule, we find that
vi = ai(f) b(qi) = ni eqi
(1+eqi)2
= ni pi (1-pi),
again in agreement with what we knew before.

In this development I have worked with the binomial count yi, which takes values 0(1)ni. McCullagh and Nelder (1989) work with the proportion pi = yi/ni, which takes values 0(1/ni)1. This explains the differences between my results and their Table 2.1.

B.4.2  Fisher Scoring in Logistic Regression

Let us now find the working dependent variable and the iterative weight used in the Fisher scoring algorithm for estimating the parameters in logistic regression, where we model

hi = logit(pi).
(B.16)
It will be convenient to write the link function in terms of the mean mi, as:
hi = log( pi
1-pi
) = log( mi
ni-mi
),
which can also be written as hi = log(mi)-log(ni-mi).

Differentiating with respect to mi we find that

dhi
dmi
= 1
mi
+ 1
ni-mi
= ni
mi(ni-mi)
= 1
ni pi (1-pi)
.

The working dependent variable, which in general is

zi = hi + (yi-mi) dhi
dmi
,
turns out to be
zi = hi + yi-nipi
ni pi (1-pi)
.
(B.17)

The iterative weight turns out to be

wi
=
1 /

b(qi) ( dhi
dmi
)2

,
=
1
ni pi (1-pi)
[ ni pi (1-pi) ]2,
and simplifies to
wi = ni pi (1-pi).
(B.18)

Note that the weight is inversely proportional to the variance of the working dependent variable. The results here agree exactly with the results in Chapter 4 of McCullagh and Nelder (1989).

Exercise: Obtain analogous results for Probit analysis, where one models

hi = F-1(mi),
where F() is the standard normal cdf. Hint: To calculate the derivative of the link function find dmi/dhi and take reciprocals.[¯]

B.4.3  The Binomial Deviance

Finally, let us figure out the binomial deviance. Let [^(mi)] denote the m.l.e. of mi under the model of interest, and let [(mi)\tilde] = yi denote the m.l.e. under the saturated model. From first principles,

D
=
2
[ yi log( yi
ni
) + (ni-yi)log( ni-yi
ni
)
- yi log(
^
mi

ni
) - (ni-yi)log(
ni- ^
mi
 

ni
)].
Note that all terms involving log(ni) cancel out. Collecting terms on yi and on ni-yi we find that
D = 2
[ yi log( yi
^
mi
) +(ni-yi) log( ni-yi
ni-mi
)].
(B.19)

Alternatively, you may obtain this result from the general form of the deviance given in Section B.3.

Note that the binomial deviance has the form

D = 2
oi log( oi
ei
),
where oi denotes observed, ei denotes expected (under the model of interest) and the sum is over both ``successes'' and ``failures'' for each i (i.e. we have a contribution from yi and one from ni-yi).

For grouped data the deviance has an asymptotic chi-squared distribution as ni for all i, and can be used as a goodness of fit test.

More generally, the difference in deviances between nested models (i.e. the log of the likelihood ratio test criterion) has an asymptotic chi-squared distribution as the number of groups k or the size of each group ni , provided the number of parameters stays fixed.

As a general rule of thumb due to Cochrane (1950), the asymptotic chi-squared distribution provides a reasonable approximation when all expected frequencies (both [^(mi)] and ni-[^(mi)]) under the larger model exceed one, and at least 80% exceed five.


Continue with B.5. Poisson Errors and Link Log
Copyright © Germán Rodríguez, 1993-2000. Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.