The Lee-Carter Model
In this unit we will examine some features of the Lee-Carter approach to forecasting U.S. mortality.
The Mortality Surface
The basic model seeks to summarize an age-period surface of log-rates in terms of vectors a and b along the age dimension and k along the time dimension so that
with restrictions such that the b's are normalized to sum to unity and the k's sum to zero, so the a's are average log-rates.
The vector a can be interpreted as an average age-profile, the vector k tracks changes over time, and the vector b determines how much each age group changes when k changes. When k is linear on time each age group changes at its own exponential rate, but this is not a requirement of the model. The error term reflects age-period effects not captured by the model.
Lee and Carter estimated the a's, b's and k's with U.S. mortality data for 1933 to 1987 using a singular value decomposition that will be illustrated below. In a second step they re-estimate k so it predicts the correct total number of deaths each year. (This essentially changes the weight of each age group.) Estimates of k are obtained for each year from 1900 to 1989 (and then forecast into the future).
The basic data used in the original paper consisted of rates up to ages 80-84 and 85+. Because a large fraction of the population survives to age 80 (30% in 1987) they extended the model to older ages on the basis of work by Coale and Kisker and Coale and Guo showing that after age 80 mortality increases at a linearly declining rate, rather than a constant rate as implied in a Gompertz model.
The Singular Value Decomposition
The Human Mortality Database (HMD) has U.S. mortality rates for five-year
age groups (0,1-4,5-9,...,105-109,110+) and single calendar years for
1933 to 2001. We will use these data to examine the model. The file
usm3301.dat has rates for both sexes combined for each year.
Note that age is coded using the starting value to simplify numerical
manipulation. We will reshape this file to a year by age matrix.
. infile year age m using ///
> http://data.princeton.edu/eco572/datasets/usm3301.dat, clear
(1656 observations read)
. reshape wide m, i(year) j(age)
(note: j = 0 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 1656 -> 69
Number of variables 3 -> 25
j variable (24 values) age -> (dropped)
xij variables:
m -> m0 m1 ... m110
-----------------------------------------------------------------------------
The next step is to move the data to Mata, compute logs, and get the mean of each rate over the period 1933-1987. The resulting values are remarkably close to the a values in Table 1 of the original paper. (The data have undergone revisions over the years, including a breakdown of the 85+ age group into six categories from 85-89 to 110+. I get similar but not identical results using the Berkeley Mortality Database (BMD), the precursor to the HMD.)
. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: m = st_data(.,2..25)
: logm = log(m)
: a = J(55,1,1)'logm[1::55,]/55
: a
1 2 3 4 5 6 7
+----------------------------------------------------------------------------------------------------------
1 | -3.643170505 -6.700417965 -7.512215746 -7.564984417 -6.76168763 -6.447905619 -6.405776592
+----------------------------------------------------------------------------------------------------------
8 9 10 11 12 13 14
----------------------------------------------------------------------------------------------------------
1 -6.2290075 -5.908802131 -5.515643379 -5.088885493 -4.653696046 -4.262810817 -3.859119136
----------------------------------------------------------------------------------------------------------
15 16 17 18 19 20 21
----------------------------------------------------------------------------------------------------------
1 -3.477254832 -3.063920697 -2.645334242 -2.224175214 -1.799165973 -1.43413825 -1.172633239
----------------------------------------------------------------------------------------------------------
22 23 24
----------------------------------------------+
1 -.9799156522 -1.046744126 -1.391566259 |
----------------------------------------------+
: end
---------------------------------------------------------------------------------------------------------------------
Next we center the matrix of log-rates by subtracting the means for
each age. We create working vectors u, s
and vt and carry out a singular value decomposition
using only ages up to 85-89. (Not quite the same as 85+ in the
original.)
. mata: ------------------------------------------------- mata (type end to exit) ------------------------------------------- : u = J(0,0,.) : s = J(0,0,.) : vt = J(0,0,.) : y = logm[1::55,1..19] :- a[1..19] : svd(y,u,s,vt) : end ---------------------------------------------------------------------------------------------------------------------
The first column of u times the first row of vt
scaled by s[1] are the best rank-1 approximation to the
input matrix. The first row of vt scaled to sum to unity
provides our estimate of b:
. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: b = vt[1,]/sum(vt[1,])
: b
1 2 3 4 5 6 7
+---------------------------------------------------------------------------------------------------
1 | .0904425587 .1106737347 .0931384583 .0827091973 .0492225803 .0538767039 .0596364163
+---------------------------------------------------------------------------------------------------
8 9 10 11 12 13 14
---------------------------------------------------------------------------------------------------
1 .0617492195 .0606307977 .0521365879 .0441875056 .0386793121 .032573913 .0287306923
---------------------------------------------------------------------------------------------------
15 16 17 18 19
-----------------------------------------------------------------------+
1 .0292107126 .0299360984 .0308602801 .027456992 .0241482393 |
-----------------------------------------------------------------------+
: end
---------------------------------------------------------------------------------------------------------------------
The values are again, remarkably close to the b's published in Table 2,
considering that the data have changed a bit. The k's would be given
by the first column of u, which needs to be multiplied by
s[1] and sum(vt[,1]) (to cancel the division).
. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: k = u[,1] * s[1] * sum(vt[1,])
: st_store(1::55,st_addvar("float","k"),k)
: end
---------------------------------------------------------------------------------------------------------------------
. line k year, title("The Mortality Index, 1933-1987")
. graph export leeCarter1.png, replace
(file leeCarter1.png written in PNG format)

These are not quite the same k's as the original because we haven't carried out the second step and our values sum to 0 for 1933-1987, the years used here, rather than 1900-1989. But the pattern of fairly linear decline over time is exactly the same, as can be seen by comparing the figure above with Figure 2 in the paper.
Goodness of Fit
How well does the model fit? We will reproduce part of Figure 4 in the paper using the previous results.
. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: fit = a[1..19] :+ k*b
: j = st_addvar("float",("age","obs33","obs87","fit33","fit87"))
: d = (0\1\range(5,85,5)), logm[(1\55), 1..19]' , fit[(1\55),.]'
: st_store(1::19, j, d)
: end
---------------------------------------------------------------------------------------------------------------------
. scatter obs33 obs87 age || line fit33 fit87 age , ///
> title("Fit to Age-Specific Mortality") ///
> legend(order(1 "1933" 2 "1987") ring(0) pos(5))
. graph export leeCarter2.png, replace
(file leeCarter2.png written in PNG format)

An alternative to using the a's, b's and k's estimated here,
which give, of course, the best possible fit, is to use
the published values of a and b, which are
available in a file called LeeCarterAge.dat
and estimate k for 1933 and 1987.
The Time Series Model
The second distinguishing feature of the Lee-Carter approach is that, having reduced the time dimension of mortality to a single index k, they use statistical time series methods to model and forecast this index.
In their application to U.S. mortality they discover that, except for the flu epidemic of 1918, the index behaves as a simple random walk with drift,
where d is the drift, estimated as -0.365, and the et are independent error terms with variance v, estimated as 0.651^2. Note that the k's are not independent, it is only successive differences (or innovations) that are independent.
Also, the variance of kt increases with t. Using the law of iterated expectations it is easy to show that starting from a fixed value at t0, the variance of kt is (t-t0)v. This is important because it gives us the standard error of a forecast.
Simulating the Random Walk
Perhaps the best way to understand the stochastic nature
of the projection is to do a bit of simulation.
The code below sets a 50 year horizon.
We use a variable e to simulate the errors,
and generate 50 random 50-year trajectories,
starting with a value of k=-11.05,
which is my estimate of k for 1989.
. clear
. set obs 50
obs was 0, now 50
. gen year = 1989 + _n
. gen e = .
(50 missing values generated)
. forvalues i=1/50 {
2. qui replace e = invnormal(uniform())*0.651
3. qui gen k`i' = -11.41
4. qui replace k`i' = k`i'[_n-1] - 0.365 + e if _n > 1
5. }
Now let's see what we got. The forecast for year n is -11.05 - 0.365*n and has a standard deviation of .651 sqrt(n). Below we compute these values. You may want to list them and compare with Table 2 in the original paper. We can also compute the mean and standard deviation of each row, as a check on our simulation. I'll show results for 2025
. gen kf = -11.05 - 0.365 * _n
. gen skf = 0.651 * sqrt(_n)
. egen ko = rowmean(k1-k50)
. egen sko = rowsd(k1-k50)
. list kf ko skf sko if year == 2025
+---------------------------------------+
| kf ko skf sko |
|---------------------------------------|
36. | -24.19 -24.40414 3.906 3.943747 |
+---------------------------------------+
So we forecast a k of -24.19 for 2025 with a standard deviation of 3.96. Our 50 simulations have a mean of -24.40 and a sd of 3.94, so they look pretty reasonable.
I will use the theoretical results to construct a confidence band, which I will plot below using grey shading, and I will then superimpose all trajectories
. gen bot = kf - 1.96 * skf . gen top = kf + 1.96 * skf . qui sum bot . local min = r(min) . twoway (area top year, color(gs12) base(`min')) /// > (area bot year, color(white) base(`min')) /// > (line k1-k50 year) /// > (line kf year, lw(thick) color(black)) , /// > legend(off) title(Stochastic Forecasts) . graph export leeCarter3.png, replace (file leeCarter3.png written in PNG format)

The key thing to note here is how our uncertainty increases as the projection horizon gets longer.
Forecasting Age-Specific Mortality
Once we have forecast a value of k, we combine it with a and b to produce a forecast of age specific mortality. Our central forecast for 2025 uses the published values k=-24.19, which has a standard deviation of 3.906. This can be used to produce a low and a high profile, reflecting our uncertainty.
. use leeCarterAB, clear
. scalar ktop = -24.19 + 1.96 * 3.906
. scalar kbot = -24.19 - 1.96 * 3.906
. gen top = a + ktop * b
. gen bot = a + kbot * b
. twoway (area bot age )(area top age, color(white)) ///
> , title("Age-Specific Forecast for 2025") legend(off)
. graph export leeCarter4.png, replace
(file leeCarter4.png written in PNG format)

In Problem Set 4 you will compare forecasts such as these with the observed results for recent years.
