B.1 The Model
Let y1, , yn denote n independent observations
on a response.
We treat yi as a realization of a random variable
Yi. In the general linear model we assume that Yi
has a normal distribution with mean mi and variance
s2
and we further assume that the expected value mi is
a linear function of p predictors that take values
xi = (xi1, , xip) for the i-th case, so that
where b is a vector of unknown parameters.
We will generalize this in two steps, dealing with
the stochastic and systematic components of the model.
B.1.1 The Exponential Family
We will assume that the observations come from a distribution
in the exponential family with probability density function
|
f(yi) = exp{ |
yi qi - b(qi) ai(f)
|
+c(yi, f) }. |
| (B.1) |
Here qi and f are parameters and
ai(f), b(qi) and c(yi, f) are known
functions.
In all models considered in these notes the function
ai(f) has the form
where pi is a known prior weight, usually 1.
The parameters qi and f are essentially
location and scale parameters. It can be shown that if
Yi has a distribution in the exponential family then it
has mean and variance
where b(qi) and b(qi) are the first and
second derivatives of b(qi).
When ai(f) = f/pi the variance has the simpler form
|
var(Yi) = s2i = fb(qi)/pi. |
|
The exponential family just defined includes as special
cases the normal, binomial, Poisson, exponential, gamma
and inverse Gaussian distributions.
Example: The normal distribution has density
|
f(yi) = |
1
|
exp{- |
1 2
|
|
(yi-mi)2 s2
|
}. |
|
Expanding the square in the exponent we get
(yi-mi)2 = yi2 + mi2 - 2 yi mi,
so the coefficient of
yi is mi/s2.
This result identifies qi
as mi and f as s2, with ai(f) = f.
Now write
|
f(yi) = exp{ |
s2
|
- |
yi2 2s2
|
- |
1 2
|
log(2ps2)}. |
|
This shows that b(qi) = [1/2]qi2 (recall that
qi = mi). Let us check the mean and variance:
|
| |
|
| var(Yi) = b(qi)ai(f) = s2. |
|
| |
|
Try to generalize this result to the case where Yi has
a normal distribution with mean mi and variance s2/ni
for known constants ni, as would be the case if the Yi
represented sample means. [¯]
Example: In Problem Set 1 you will show that the
exponential distribution with density
belongs to the exponential family. [¯]
In Sections B.4 and B.5 we verify that
the binomial and Poisson distributions
also belong to this family.
B.1.2 The Link Function
The second element of the generalization is that
instead of modeling the mean, as before, we will introduce a
one-to-one continuous differentiable transformation
g(mi) and focus on
The function g(mi) will be called the link
function. Examples of link functions include the
identity, log, reciprocal, logit and probit.
We further assume that the transformed mean follows a linear
model, so that
The quantity hi is called the linear predictor.
Note that the model for hi is pleasantly simple.
Since the link function is one-to-one we can invert it to
obtain
The model for mi is usually more complicated than the
model for hi.
Note that we do not transform the response yi, but rather its
expected value mi. A model where logyi is linear on
xi, for example, is not the same as a generalized linear
model where logmi is linear on xi.
Example:
The standard linear model we have studied
so far can be described as a generalized linear model with
normal errors and identity link, so that
It also happens that mi, and therefore hi, is the
same as qi, the parameter in the exponential family
density. [¯]
When the link function makes the linear predictor hi
the same as the canonical parameter qi, we say that we have
a canonical link . The identity is the canonical link
for the normal distribution. In later sections we will see
that the logit is the canonical link for the binomial
distribution and the log is the canonical link for the
Poisson distribution. This leads to some natural pairings:
| Error | Link |
| Normal | Identity |
| Binomial | Logit |
| Poisson | Log |
However, other combinations are also possible.
An advantage of canonical links is that a minimal sufficient
statistic for b exists, i.e. all the information about
b is contained in a function of the data of the
same dimensionality as b.
Continue with B.2. Maximum Likelihood Estimation
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.