A. Review of Likelihood Theory Table of Contents B. Generalized Linear Model Theory

A.2  Tests of Hypotheses

We consider three different types of tests of hypotheses.

A.2.1  Wald Tests

Under certain regularity conditions, the maximum likelihood estimator [^(q)] has approximately in large samples a (multivariate) normal distribution with mean equal to the true parameter value and variance-covariance matrix given by the inverse of the information matrix, so that

^
q
 
~ Np( q, I-1(q)).
(A.20)

The regularity conditions include the following: the true parameter value q must be interior to the parameter space, the log-likelihood function must be thrice differentiable, and the third derivatives must be bounded.

This result provides a basis for constructing tests of hypotheses and confidence regions. For example under the hypothesis

H0: q = q0
(A.21)
for a fixed value q0, the quadratic form
W = ( ^
q
 
-q0)var-1( ^
q
 
) ( ^
q
 
-q0)
(A.22)
has approximately in large samples a chi-squared distribution with p degrees of freedom.

This result can be extended to arbitrary linear combinations of q, including sets of elements of q. For example if we partition q = (q1,q2), where q2 has p2 elements,then we can test the hypothesis that the last p2 parameters are zero

Ho : q2 = 0,
by treating the quadratic form
W = ^
q2
 
 var-1( ^
q2
 
)   ^
q2
 
as a chi-squared statistic with p2 degrees of freedom. When the subset has only one element we usually take the square root of the Wald statistic and treat the ratio
z =
^
qj

  


var( ^
q
 

j 
)
 
as a z-statistic (or a t-ratio).

These results can be modified by replacing the variance-covariance matrix of the mlewith any consistent estimator. In particular, we often use the inverse of the expected information matrix evaluated at the mle

^
var
 
( ^
q
 
) = I-1( ^
q
 
).

Sometimes calculation of the expected information is difficult, and we use the observed information instead.

Example: Wald Test in the Geometric Distribution. Consider again our sample of n = 20 observations from a geometric distribution with sample mean [`y] = 3. The mlewas [^(p)] = 0.25 and its variance, using the estimated expected information, is 1/426.67 = 0.00234. Testing the hypothesis that the true probability is p = 0.15 gives

c2 = (0.25-0.15)2/0.00234 = 4.27
with one degree of freedom. The associated p-value is 0.039, so we would reject H0 at the 5% significance level. [¯]

A.2.2  Score Tests

Under some regularity conditions the score itself has an asymptotic normal distribution with mean 0 and variance-covariance matrix equal to the information matrix, so that

u(q) ~ Np(0,I(q)).
(A.23)
This result provides another basis for constructing tests of hypotheses and confidence regions. For example under
H0: q = q0
the quadratic form
Q = u(q0) I-1(q0)  u(q0)
has approximately in large samples a chi-squared distribution with p degrees of freedom.

The information matrix may be evaluated at the hypothesized value q0 or at the mle[^(q)]. Under H0 both versions of the test are valid; in fact, they are asymptotically equivalent. One advantage of using q0 is that calculation of the mlemay be bypassed. In spite of their simplicity, score tests are rarely used.

Example: Score Test in the Geometric Distribution. Continuing with our example, let us calculate the score test of H0: p = 0.15 when n = 20 and [`y] = 3. The score evaluated at 0.15 is u(0.15) = -62.7, and the expected information evaluated at 0.15 is I(0.15) = 1045.8, leading to

c2 = 62.72/1045.8 = 3.76
with one degree of freedom. Since the 5% critical value is c21,0.95 = 3.84 we would accept H0 (just). [¯]

A.2.3  Likelihood Ratio Tests

The third type of test is based on a comparison of maximized likelihoods for nested models. Suppose we are considering two models, w1 and w2, such that w1 w2. In words, w1 is a subset of (or can be considered a special case of) w2. For example, one may obtain the simpler model w1 by setting some of the parameters in w2 to zero, and we want to test the hypothesis that those elements are indeed zero.

The basic idea is to compare the maximized likelihoods of the two models. The maximized likelihood under the smaller model w1 is


max
q w1 
L(q, y) = L( ^
q
 

w1 
,y),
(A.24)
where [^(q)]w1 denotes the mleof q under model w1.

The maximized likelihood under the larger model w2 has the same form


max
q w2 
L(q, y) = L( ^
q
 

w2 
,y),
(A.25)
where [^(q)]w2 denotes the mleof q under model w2.

The ratio of these two quantities,

l =
L( ^
q
 

w1 
,y)

L( ^
q
 

w2 
,y)
,
(A.26)
is bound to be between 0 (likelihoods are non-negative) and 1 (the likelihood of the smaller model can't exceed that of the larger model because it is nested on it). Values close to 0 indicate that the smaller model is not acceptable, compared to the larger model, because it would make the observed data very unlikely. Values close to 1 indicate that the smaller model is almost as good as the large model, making the data just as likely.

Under certain regularity conditions, minus twice the log of the likelihood ratio has approximately in large samples a chi-square distribution with degrees of freedom equal to the difference in the number of parameters between the two models. Thus,

-2logl = 2logL( ^
q
 

w2 
,y) - 2logL( ^
q
 

w1 
,y) c2n,
(A.27)
where the degrees of freedom are n = dim(w2)-dim(w1), the number of parameters in the larger model w2 minus the number of parameters in the smaller model w1.

Note that calculation of a likelihood ratio test requires fitting two models (w1 and w2), compared to only one model for the Wald test (w2) and sometimes no model at all for the score test.

Example: Likelihood Ratio Test in the Geometric Distribution. Consider testing H0: p = 0.15 with a sample of n = 20 observations from a geometric distribution, and suppose the sample mean is [`y] = 3. The value of the likelihood under H0 is logL(0.15) = -47.69. Its unrestricted maximum value, attained at the mle, is logL(0.25) = -44.98. Minus twice the difference between these values is

c2 = 2(47.69-44.99) = 5.4
with one degree of freedom. This value is significant at the 5% level and we would reject H0. Note that in our example the Wald, score and likelihood ratio tests give similar, but not identical, results. [¯]

The three tests discussed in this section are asymptotically equivalent, and are therefore expected to give similar results in large samples. Their small-sample properties are not known, but some simulation studies suggest that the likelihood ratio test may be better that its competitors in small samples.


Continue with B. Generalized Linear Model Theory
Copyright © Germán Rodríguez, 1993-2000. Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.