3.2 Estimation and Hypothesis Testing
The logistic regression model just developed is a
generalized linear model with binomial errors and link logit.
We can therefore rely on the general theory developed in
Appendix B to obtain estimates of the parameters and to test
hypotheses. In this section we summarize the most important
results needed in the applications.
3.2.1 Maximum Likelihood Estimation
Although you will probably use a statistical package to compute
the estimates, here is a brief description of the underlying
procedure.
The likelihood function for n independent binomial observations
is a product of densities given by Equation 3.3.
Taking logs we find that, except for a constant involving
the combinatorial terms, the log-likelihood function is
|
logL(b) = |
| { yi log(pi) + (ni-yi)log(1-pi)}, |
|
where pi depends on the covariates xi and
a vector of p parameters b through the logit transformation of
Equation 3.9.
At this point we could take first and expected second derivatives
to obtain the score and information matrix and develop a
Fisher scoring procedure for maximizing the log-likelihood.
As shown in Appendix B, the procedure is equivalent to
iteratively re-weighted least squares (IRLS).
Given a current estimate [^(b)] of the parameters,
we calculate the linear predictor [^(h)] = xi[^(b)]
and the fitted values [^(m)] = logit-1(h).
With these values we calculate the working dependent variable
z, which has elements
where ni are the binomial denominators.
We then regress
z on the covariates calculating the weighted least squares
estimate
where W is a diagonal matrix of weights with entries
|
wii = |
^ m
|
i
|
(ni- |
^ m
|
i
|
)/ni. |
|
(You may be interested to know that the weight is inversely proportional
to the estimated variance of the working dependent variable.)
The resulting estimate of b is used to obtain
improved fitted values and the procedure is iterated to convergence.
Suitable initial values can be obtained by applying the link
to the data. To avoid problems with counts of 0 or ni (which
is always the case with individual zero-one data), we calculate
empirical logits adding 1/2 to both the numerator and
denominator, i.e. we calculate
|
zi = log |
yi+1/2 ni-yi+1/2
|
, |
|
and then regress this quantity on xi to obtain an initial
estimate of b.
The resulting estimate is consistent and its large-sample
variance is given by
where W is the matrix of weights evaluated in the last iteration.
Alternatives to maximum likelihood estimation include
weighted least squares, which can be used with grouped data,
and a method that minimizes Pearson's chi-squared statistic,
which can be used with both grouped and individual data.
We will not consider these alternatives further.
3.2.2 Goodness of Fit Statistics
Suppose we have just fitted a model and want to assess how well
it fits the data.
A measure of discrepancy between observed and fitted values
is the deviance statistic, which is given by
|
D = 2 |
| { yi log( |
yi
|
) +(ni-yi)log( |
ni-yi
|
) }, |
| (3.13) |
where yi is the observed and [^(mi)] is the fitted
value for the i-th observation. Note that this statistic
is twice a sum of `observed times log of observed over expected',
where the sum is over both successes and failures (i.e. we
compare both yi and ni-yi with their expected values).
In a perfect fit the ratio observed over expected is one and its
logarithm is zero, so the deviance is zero.
In Appendix B we show that this statistic may be constructed
as a likelihood ratio test that compares the model of interest
with a saturated model that has one parameter for each
observation.
With grouped data, the distribution of the deviance statistic
as the group sizes ni for all i,
converges to a chi-squared distribution with n-p d.f.,
where n is the number of groups and p is the number
of parameters in the model, including the constant.
Thus, for reasonably large groups, the deviance provides a
goodness of fit test for the model.
With individual data the distribution of the deviance does
not converge to a chi-squared (or any other known) distribution,
and cannot be used as a goodness of fit test. We will, however,
consider other diagnostic tools that can be used with individual
data.
An alternative measure of goodness of fit is
Pearson's chi-squared statistic, which for binomial
data can be written as
Note that each term in the sum is the squared difference
between observed and fitted values yi and [^(m)]i,
divided by the variance of yi, which is
mi(ni-mi)/ni, estimated using [^(m)]i for
mi.
This statistic can also be derived as a sum of `observed
minus expected squared over expected', where the sum
is over both successes and failures.
With grouped data Pearson's statistic has approximately in
large samples a chi-squared distribution with n-p d.f.,
and is asymptotically equivalent to the deviance or likelihood-ratio
chi-squared statistic. The statistic can not be used as a goodness
of fit test with individual data, but provides a basis for
calculating residuals, as we shall see when we discuss
logistic regression diagnostics.
3.2.3 Tests of Hypotheses
Let us consider the problem of testing hypotheses in logit
models.
As usual, we can calculate Wald tests based on the large-sample
distribution of the m.l.e., which is approximately normal with
mean b and variance-covariance matrix as given in
Equation 3.12.
In particular, we can test the hypothesis
concerning the significance of a single coefficient by
calculating the ratio of the estimate to its standard error
This statistic has approximately a standard normal distribution
in large samples.
Alternatively, we can treat the square of this statistic
as approximately a chi-squared with one d.f.
The Wald test can be use to calculate a confidence interval
for bj. We can assert with 100(1-a)% confidence
that the true parameter lies in the interval with boundaries
where z1-a/2 is the normal critical value for a two-sided
test of size a. Confidence intervals for effects in the
logit scale can be translated into confidence intervals for
odds ratios by exponentiating the boundaries.
The Wald test can be applied to tests hypotheses concerning
several coefficients by calculating the usual quadratic form.
This test can also be inverted to obtain confidence regions for
vector-value parameters, but we will not consider this extension.
For more general problems we consider the likelihood ratio
test. A key to construct these tests is the deviance statistic
introduced in the previous subsection. In a nutshell, the
likelihood ratio test to compare two nested models is based
on the difference between their deviances.
To fix ideas, consider partitioning the model matrix
and the vector of coefficients into two components
with p1 and p2 elements, respectively. Consider
testing the hypothesis
that the variables in X2 have no effect on the response,
i.e. the joint significance of the coefficients in b2.
Let D(X1) denote the deviance of a model that includes
only the variables in X1 and let D(X1+X2) denote
the deviance of a model that includes all variables in X.
Then the difference
has approximately in large samples a chi-squared distribution
with p2 d.f. Note that p2 is the difference in the number
of parameters between the two models being compared.
The deviance plays a role similar to the residual sum of squares.
In fact, in Appendix B we show that in models for normally
distributed data the deviance is the residual sum of squares.
Likelihood ratio tests in generalized linear models are based on
scaled deviances, obtained by dividing the deviance by a scale factor.
In linear models the scale factor was s2, and
we had to divide the RSS's (or their difference) by an estimate
of s2 in order to calculate the test criterion.
With binomial data the scale factor is one, and there is no need
to scale the deviances.
The Pearson chi-squared statistic in the previous subsection,
while providing an alternative goodness of fit test for grouped
data, cannot be used in general to compare nested models.
In particular, differences in deviances have chi-squared
distributions but differences in Pearson chi-squared statistics
do not. This is the main reason why in statistical modelling
we use the deviance or likelihood ratio chi-squared statistic
rather than the more traditional Pearson chi-squared of
elementary statistics.
Continue with 3.3. The Comparison of Two Groups
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.