B.4 Binomial Errors and Link Logit
We apply the theory of generalized linear models to
the case of binary data, and in particular to
logistic regression models.
B.4.1 The Binomial Distribution
First we verify that the binomial distribution B(ni,pi)
belongs to the exponential family of Nelder and Wedderburn (1972).
The binomial probability distribution function (p.d.f.) is
|
fi(yi) = |
|
|
|
piyi (1-pi)ni-yi. |
| (B.14) |
Taking logs we find that
|
logfi(yi) = yi log(pi) + (ni-yi)log(1-pi) + log |
|
|
|
. |
|
Collecting terms on y
i we can write
|
logfi(yi) = yi log( |
pi 1-pi
|
) + ni log(1-pi) + log |
|
|
|
. |
|
This expression has the general exponential form
|
logfi(yi) = |
yi qi - b(qi) ai(f)
|
+ c(yi,f) |
|
with the following equivalences:
Looking first at the coefficient of y
i we note that
the canonical parameter is the logit of
pi
Solving for pi we see that
|
pi = |
eqi 1 + eqi
|
, so 1-pi = |
1 1 + eqi
|
. |
|
If we rewrite the second term in the p.d.f. as
a function of
qi,
so log(1-
pi) = -log(1+e
qi),
we can identify the cumulant function b(
qi) as
The remaining term in the p.d.f. is a function of y
i but not
pi,
leading to
Note finally that we may set a
i(
f) =
f and
f = 1.
Let us verify the mean and variance. Differentiating b(qi)
with respect to qi we find that
|
μi = b(qi) = ni |
eqi 1+eqi
|
= ni pi, |
|
in agreement with what we knew from elementary statistics.
Differentiating again using the quotient rule, we find that
|
vi = ai(f) b(qi) = ni |
eqi (1+eqi)2
|
= ni pi (1-pi), |
|
again in agreement with what we knew before.
In this development I have worked with the binomial count yi, which
takes values 0(1)ni. McCullagh and Nelder (1989) work with the proportion
pi = yi/ni, which takes values 0(1/ni)1. This explains the
differences between my results and their Table 2.1.
B.4.2 Fisher Scoring in Logistic Regression
Let us now find the working dependent variable and the iterative weight
used in the Fisher scoring algorithm for estimating the parameters in
logistic regression, where we model
It will be convenient to write the link function in terms of the
mean μ
i, as:
|
hi = log( |
pi 1-pi
|
) = log( |
μi ni-μi
|
), |
|
which can also be written as
hi = log(μ
i)-log(n
i-μ
i).
Differentiating with respect to μi we find that
|
|
dhi dμi
|
= |
1 μi
|
+ |
1 ni-μi
|
= |
ni μi(ni-μi)
|
= |
1 ni pi (1-pi)
|
. |
|
The working dependent variable, which in general is
|
zi = hi + (yi-μi) |
dhi dμi
|
, |
|
turns out to be
|
zi = hi + |
yi-nipi ni pi (1-pi)
|
. |
| (B.17) |
The iterative weight turns out to be
|
| |
|
|
| |
|
|
1 ni pi (1-pi)
|
[ ni pi (1-pi) ]2, |
|
| |
|
and simplifies to
Note that the weight is inversely proportional to the variance of the
working dependent variable. The results here agree exactly with the
results in Chapter 4 of McCullagh and Nelder (1989).
Exercise:
Obtain analogous results for Probit analysis,
where one models
where
F() is the standard normal cdf.
Hint: To calculate the
derivative of the link function find dμ
i/d
hi
and take reciprocals.
[¯]
B.4.3 The Binomial Deviance
Finally, let us figure out the binomial deviance. Let [^(μi)]
denote the m.l.e. of μi under the model of interest, and let
[(μi)\tilde] = yi denote the m.l.e. under the saturated model.
From first principles,
|
| |
|
|
|
2 |
| [ yi log( |
yi ni
|
) + (ni-yi)log( |
ni-yi ni
|
) |
| |
|
| - yi log( |
ni
|
) - (ni-yi)log( |
ni
|
)]. |
|
| |
|
Note that all terms involving log(n
i) cancel out. Collecting
terms on y
i and on n
i-y
i we find that
|
D = 2 |
| [ yi log( |
yi
|
) +(ni-yi) log( |
ni-yi ni-μi
|
)]. |
| (B.19) |
Alternatively, you may obtain this result from the general form
of the deviance given in Section B.3.
Note that the binomial deviance has the form
where o
i denotes observed, e
i denotes expected (under the model
of interest) and the sum is over both ``successes'' and ``failures''
for each I (i.e. we have a contribution from y
i and one from
n
i-y
i).
For grouped data the deviance has an asymptotic chi-squared distribution
as ni for all I, and can be used as a goodness
of fit test.
More generally, the difference in deviances between nested models
(i.e. the log of the likelihood ratio test criterion) has an
asymptotic chi-squared distribution as the number of groups
k or the size of each group ni ,
provided the number of parameters stays fixed.
As a general rule of thumb due to Cochrane (1950),
the asymptotic chi-squared distribution provides a reasonable
approximation when all expected frequencies
(both [^(μi)] and ni-[^(μi)])
under the larger model exceed one, and at least
80% exceed five.
Continue with B.5. Poisson Errors and Link Log