![]() |
|
![]() | |||
|
|
|||||
Consider for now a rather abstract model where mi = xib for some predictors xi. How do we estimate the parameters b and s2?
The likelihood principle instructs us to pick the values of the parameters that maximize the likelihood, or equivalently, the logarithm of the likelihood function. If the observations are independent, then the likelihood function is a product of normal densities of the form given in Equation 2.1. Taking logarithms we obtain the normal log-likelihood
| (2.5) |
| (2.6) |
Taking derivatives of the residual sum of squares with respect to b and setting the derivative equal to zero leads to the so-called normal equations for the maximum-likelihood estimator [^(b)]
|
| (2.7) |
If X is not of full column rank one can use generalized inverses, but interpretation of the results is much more straightforward if one simple eliminates redundant columns. Most current statistical packages are smart enough to detect and omit redundancies automatically.
There are several numerical methods for solving the normal equations, including methods that operate on XX, such as Gaussian elimination or the Choleski decomposition, and methods that attempt to simplify the calculations by factoring the model matrix X, including Householder reflections, Givens rotations and the Gram-Schmidt orthogonalization. We will not discuss these methods here, assuming that you will trust the calculations to a reliable statistical package. For further details see McCullagh and Nelder (1989, Section 3.8) and the references therein.
The foregoing results were obtained by maximizing the log-likelihood with respect to b for a fixed value of s2. The result obtained in Equation 2.7 does not depend on s2, and is therefore a global maximum.
For the null model X is a vector of ones, XX = n and Xy = yi are scalars and [^(b)] = [`y], the sample mean. For our sample data [`y] = 14.3. Thus, the calculation of a sample mean can be viewed as the simplest case of maximum likelihood estimation in a linear model.
The least squares estimator [^(b)] of Equation 2.7 has several interesting properties. If the model is correct, in the (weak) sense that the expected value of the response Yi given the predictors xi is indeed xib, then the OLS estimator is unbiased, its expected value equals the true parameter value:
| (2.8) |
| (2.9) |
A further property of the estimator is that it has minimum variance among all unbiased estimators that are linear functions of the data, i.e. it is the best linear unbiased estimator (BLUE). Since no other unbiased estimator can have lower variance for a fixed sample size, we say that OLS estimators are fully efficient.
Finally, it can be shown that the sampling distribution of the OLS estimator [^(b)] in large samples is approximately multivariate normal with the mean and variance given above, i.e.
|
Applying these results to the null model we see that the sample mean [`y] is an unbiased estimator of m, has variance s2/n, and is approximately normally distributed in large samples.
All of these results depend only on second-order assumptions concerning the mean, variance and covariance of the observations, namely the assumption that E(Y) = Xb and var(Y) = s2 I.
Of course, [^(b)] is also a maximum likelihood estimator under the assumption of normality of the observations. If Y ~ Nn(Xb, s2I) then the sampling distribution of [^(b)] is exactly multivariate normal with the indicated mean and variance.
The significance of these results cannot be overstated: the assumption of normality of the observations is required only for inference in small samples. The really important assumption is that the observations are uncorrelated and have constant variance, and this is sufficient for inference in large samples.
Substituting the OLS estimator of b into the log-likelihood in Equation 2.5 gives a profile likelihood for s2
|
|
Under the assumption of normality, the ratio RSS/s2 of the residual sum of squares to the true parameter value has a chi-squared distribution with n-p degrees of freedom and is independent of the estimator of the linear parameters. You might be interested to know that using the chi-squared distribution as a likelihood to estimate s2 (instead of the normal likelihood to estimate both b and s2) leads to the unbiased estimator.
For the sample data the RSS for the null model is 2650.2 on 19 d.f. and therefore [^(s)] = 11.81, the sample standard deviation.
Continue with 2.3. Tests of Hypotheses
Copyright © Germán Rodríguez, 1993-2000.
Please send feedback to grodri@princeton.edu
Conversion from LaTeX was done using TTH, version 2.34.