B.4 Binomial Errors and Link Logit
We apply the theory of generalized linear models to
the case of binary data, and in particular to
logistic regression models.
B.4.1 The Binomial Distribution
First we verify that the binomial distribution B(ni,pi)
belongs to the exponential family of Nelder and Wedderburn (1972).
The binomial probability distribution function (p.d.f.) is
|
fi(yi) = |
|
|
|
piyi (1-pi)ni-yi. |
| (B.14) |
Taking logs we find that
|
logfi(yi) = yi log(pi) + (ni-yi)log(1-pi) + log |
|
|
|
. |
|
Collecting terms on yi we can write
|
logfi(yi) = yi log( |
pi 1-pi
|
) + ni log(1-pi) + log |
|
|
|
. |
|
This expression has the general exponential form
|
logfi(yi) = |
yi qi - b(qi) ai(f)
|
+ c(yi,f) |
|
with the following equivalences:
Looking first at the coefficient of yi we note that
the canonical parameter is the logit of pi
Solving for pi we see that
|
pi = |
eqi 1 + eqi
|
, so 1-pi = |
1 1 + eqi
|
. |
|
If we rewrite the second term in the p.d.f. as
a function of qi,
so log(1-pi) = -log(1+eqi),
we can identify the cumulant function b(qi) as
The remaining term in the p.d.f. is a function of yi but not pi,
leading to
Note finally that we may set ai(f) = f and f = 1.
Let us verify the mean and variance. Differentiating b(qi)
with respect to qi we find that
|
mi = b(qi) = ni |
eqi 1+eqi
|
= ni pi, |
|
in agreement with what we knew from elementary statistics.
Differentiating again using the quotient rule, we find that
|
vi = ai(f) b(qi) = ni |
eqi (1+eqi)2
|
= ni pi (1-pi), |
|
again in agreement with what we knew before.
In this development I have worked with the binomial count yi, which
takes values 0(1)ni. McCullagh and Nelder (1989) work with the proportion
pi = yi/ni, which takes values 0(1/ni)1. This explains the
differences between my results and their Table 2.1.
B.4.2 Fisher Scoring in Logistic Regression
Let us now find the working dependent variable and the iterative weight
used in the Fisher scoring algorithm for estimating the parameters in
logistic regression, where we model
It will be convenient to write the link function in terms of the
mean mi, as:
|
hi = log( |
pi 1-pi
|
) = log( |
mi ni-mi
|
), |
|
which can also be written as hi = log(mi)-log(ni-mi).
Differentiating with respect to mi we find that
|
|
dhi dmi
|
= |
1 mi
|
+ |
1 ni-mi
|
= |
ni mi(ni-mi)
|
= |
1 ni pi (1-pi)
|
. |
|
The working dependent variable, which in general is
|
zi = hi + (yi-mi) |
dhi dmi
|
, |
|
turns out to be
|
zi = hi + |
yi-nipi ni pi (1-pi)
|
. |
| (B.17) |
The iterative weight turns out to be
|
| |
|
|
| |
|
|
1 ni pi (1-pi)
|
[ ni pi (1-pi) ]2, |
|
| |
|
and simplifies to
Note that the weight is inversely proportional to the variance of the
working dependent variable. The results here agree exactly with the
results in Chapter 4 of McCullagh and Nelder (1989).
Exercise:
Obtain analogous results for Probit analysis,
where one models
where F() is the standard normal cdf. Hint: To calculate the
derivative of the link function find dmi/dhi
and take reciprocals.[¯]
B.4.3 The Binomial Deviance
Finally, let us figure out the binomial deviance. Let [^(mi)]
denote the m.l.e. of mi under the model of interest, and let
[(mi)\tilde] = yi denote the m.l.e. under the saturated model.
From first principles,
|
| |
|
|
|
2 |
| [ yi log( |
yi ni
|
) + (ni-yi)log( |
ni-yi ni
|
) |
| |
|
| - yi log( |
ni
|
) - (ni-yi)log( |
ni
|
)]. |
|
| |
|
Note that all terms involving log(ni) cancel out. Collecting
terms on yi and on ni-yi we find that
|
D = 2 |
| [ yi log( |
yi
|
) +(ni-yi) log( |
ni-yi ni-mi
|
)]. |
| (B.19) |
Alternatively, you may obtain this result from the general form
of the deviance given in Section B.3.
Note that the binomial deviance has the form
where oi denotes observed, ei denotes expected (under the model
of interest) and the sum is over both ``successes'' and ``failures''
for each i (i.e. we have a contribution from yi and one from
ni-yi).
For grouped data the deviance has an asymptotic chi-squared distribution
as ni for all i, and can be used as a goodness
of fit test.
More generally, the difference in deviances between nested models
(i.e. the log of the likelihood ratio test criterion) has an
asymptotic chi-squared distribution as the number of groups
k or the size of each group ni ,
provided the number of parameters stays fixed.
As a general rule of thumb due to Cochrane (1950),
the asymptotic chi-squared distribution provides a reasonable
approximation when all expected frequencies
(both [^(mi)] and ni-[^(mi)])
under the larger model exceed one, and at least
80% exceed five.
Continue with B.5. Poisson Errors and Link Log
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.