A.2. Tests of Hypotheses Table of Contents B.2. Maximum Likelihood Estimation

B.1  The Model

Let y1, , yn denote n independent observations on a response. We treat yi as a realization of a random variable Yi. In the general linear model we assume that Yi has a normal distribution with mean mi and variance s2

Yi ~ N(mi, s2),
and we further assume that the expected value mi is a linear function of p predictors that take values xi = (xi1, , xip) for the i-th case, so that

mi = xib,
where b is a vector of unknown parameters.

We will generalize this in two steps, dealing with the stochastic and systematic components of the model.

B.1.1  The Exponential Family

We will assume that the observations come from a distribution in the exponential family with probability density function

f(yi) = exp{ yi qi - b(qi)
ai(f)
+c(yi, f) }.
(B.1)
Here qi and f are parameters and ai(f), b(qi) and c(yi, f) are known functions. In all models considered in these notes the function ai(f) has the form
ai(f) = f/pi,
where pi is a known prior weight, usually 1.

The parameters qi and f are essentially location and scale parameters. It can be shown that if Yi has a distribution in the exponential family then it has mean and variance

E(Yi)
=
mi = b(qi)
(B.2)
var(Yi)
=
s2i = b(qi) ai(f),
(B.3)
where b(qi) and b(qi) are the first and second derivatives of b(qi). When ai(f) = f/pi the variance has the simpler form
var(Yi) = s2i = fb(qi)/pi.

The exponential family just defined includes as special cases the normal, binomial, Poisson, exponential, gamma and inverse Gaussian distributions.

Example: The normal distribution has density

f(yi) = 1



2ps2
exp{- 1
2
(yi-mi)2
s2
}.
Expanding the square in the exponent we get (yi-mi)2 = yi2 + mi2 - 2 yi mi, so the coefficient of yi is mi/s2. This result identifies qi as mi and f as s2, with ai(f) = f. Now write
f(yi) = exp{
yi mi- 1
2
mi2

s2
- yi2
2s2
- 1
2
log(2ps2)}.
This shows that b(qi) = [1/2]qi2 (recall that qi = mi). Let us check the mean and variance:
E(Yi) = b(qi) = qi = mi,
var(Yi) = b(qi)ai(f) = s2.

Try to generalize this result to the case where Yi has a normal distribution with mean mi and variance s2/ni for known constants ni, as would be the case if the Yi represented sample means. [¯]

Example: In Problem Set 1 you will show that the exponential distribution with density

f(yi) = li exp{ -li yi}
belongs to the exponential family. [¯]

In Sections B.4 and B.5 we verify that the binomial and Poisson distributions also belong to this family.

B.1.2  The Link Function

The second element of the generalization is that instead of modeling the mean, as before, we will introduce a one-to-one continuous differentiable transformation g(mi) and focus on

hi = g(mi).
(B.4)
The function g(mi) will be called the link function. Examples of link functions include the identity, log, reciprocal, logit and probit.

We further assume that the transformed mean follows a linear model, so that

hi = xib.
(B.5)

The quantity hi is called the linear predictor. Note that the model for hi is pleasantly simple. Since the link function is one-to-one we can invert it to obtain

mi = g-1(xib).
The model for mi is usually more complicated than the model for hi.

Note that we do not transform the response yi, but rather its expected value mi. A model where logyi is linear on xi, for example, is not the same as a generalized linear model where logmi is linear on xi.

Example: The standard linear model we have studied so far can be described as a generalized linear model with normal errors and identity link, so that

hi = mi.
It also happens that mi, and therefore hi, is the same as qi, the parameter in the exponential family density. [¯]

When the link function makes the linear predictor hi the same as the canonical parameter qi, we say that we have a canonical link . The identity is the canonical link for the normal distribution. In later sections we will see that the logit is the canonical link for the binomial distribution and the log is the canonical link for the Poisson distribution. This leads to some natural pairings:



ErrorLink
NormalIdentity
BinomialLogit
PoissonLog



However, other combinations are also possible. An advantage of canonical links is that a minimal sufficient statistic for b exists, i.e. all the information about b is contained in a function of the data of the same dimensionality as b.


Continue with B.2. Maximum Likelihood Estimation
Copyright © Germán Rodríguez, 1993-2000. Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.