Germán Rodríguez
Demographic Methods Princeton University

Model Mortality Schedules

The state-of the art model mortality schedules are the modified logit relational system of Murray et al (2003) and the log-quadratic model of Wilmoth et al (2012). Let's see how well they do with the life table in Box 3.1 of the textbook, for Austrian males in 1992.

The Code

I wrote some code to facilitate fitting these models, so they first step will be to load it. (I suggest you first download the Stata or R code, which is available on this website if you add the extension .txt. For example mrlogit.R is at http://data.princeton.edu/eco572/mrlogit.R.txt.)

. clear mata

. quietly do mrlogit.mata 

. quietly do logquad.mata
> library(dplyr)
> library(ggplot2)
> source("mrlogit.R")
> source("logquad.R")

The Data

I have also saved the complete life table in a plain text format as b31.dat, so you can easily read it in Stata or R.

. cd d:\onedrive\pop502
d:\onedrive\pop502

. import delimited using http://data.princeton.edu/eco572/b31.dat, ///
>     delim(" ", collapse) varn(1) case(preserve) clear
(15 vars, 19 obs)

. quietly replace lx = lx/100000 //!

. mata b31 = st_data(., ("age", "a", "q", "m", "lx"))

. mata b31[1::6,]
                 1             2             3             4             5
    ┌───────────────────────────────────────────────────────────────────────┐
  1 │            0   .0680000037   .0086719999   .0087430002             1  │
  2 │            1   1.626000047       .001479        .00037   .9913278222  │
  3 │            5           2.5       .000766       .000153   .9898614287  │
  4 │           10   3.142999887       .000963       .000193      .9891029  │
  5 │           15   2.723999977   .0048719998       .000976   .9881504774  │
  6 │           20   2.519999981   .0064050001       .001285   .9833365679  │
    └───────────────────────────────────────────────────────────────────────┘
>   b31 <- read.table("http://data.princeton.edu/eco572/b31.dat", header=TRUE)
>   b31x <- mutate(b31, lx=lx/100000) %>% select(age, a, q, m, lx)
>   head(b31x)
  age     a        q        m        lx
1   0 0.068 0.008672 0.008743 1.0000000
2   1 1.626 0.001479 0.000370 0.9913278
3   5 2.500 0.000766 0.000153 0.9898614
4  10 3.143 0.000963 0.000193 0.9891029
5  15 2.724 0.004872 0.000976 0.9881505
6  20 2.520 0.006405 0.001285 0.9833366

The Modified Logit System

You may want to try taking logits of the survival function and plotting it against the Brass standard. The relationship doesn't look very linear. So let's try instead the modified transformation, which uses corrections based on survival to ages 5 and 60.

. mata zx = mrlogit_zx(b31[2::19, 5], "male") //transform data

. mata zs = mrlogit_zs("male")                // transform standard

. getmata zx, force replace

. getmata zs, force replace

. // scatter zx zs
. quietly regress zx zs                       // fit by ols

. predict zfit
(option xb assumed; fitted values)
(1 missing value generated)

. mata b = st_matrix("e(b)") // constant last!

. mata lfit = mrlogit_lx(b[2], b[1], "male")

.     // might want to plot
>   zx <- mrlogit_zx(b31x$lx[-1], "male") # transform data
>   zs <- mrlogit_zs("male")              # transform standard
>   # data.frame(zx, zs) %>%
>   #  ggplot(aes(zs, zx)) + geom_point() + geom_smooth(method="lm")
>   b <- coef(lm(zx ~ zs))               # ols fit
>   lfit<- mrlogit_lx(b[1], b[2], "male")   # convert to survival
>   # data.frame(age=b31$age, lx=b31$lx/100000, fit=c(1, lfit)) %>%
>   #  ggplot(aes(age, lx)) + geom_point() + geom_line(aes(age, fit))
>     

Of course the fit is going to look good on the survival scale, but let's get a closer look by computing rates. I first compute death probabilities and then convert to rates with the same assumptions used to construct the table.

. mata qfit = 1 :- (lfit\0) :/ (1\lfit)      // convert to probabilities

. mata n = 1\ 4\ J(17, 1, 5)

. mata mfit = qfit :/ (n - qfit :* (n - b31[,2]))     // and then to rates    

. getmata mfit, replace

. twoway scatter m age, yscale(log) ///      // plot
>   ||   line mfit age, yscale(log) legend(off)

.  graph export mrlogits.png, width(500) replace
(file mrlogits.png written in PNG format)

Modified Logit Fit

> qfit <- 1 - c(lfit, 0)/c(1, lfit)  # death probabilities
> a <- b31$a                         # nax
> n <- c(1, 4, rep(5,17))            # width of intervals
> mfit <- qfit/(n - qfit*(n - a))    # probs to rates
> data.frame(age = b31$agem, mobs = b31$m, mfit) %>%
+   ggplot(aes(age, log(mobs))) + geom_point() +
+     geom_line(aes(age,log(mfit)))
> ggsave("mrlogit.png", width=500/72, height=400/72, dpi=72)

Modified Lofit Fit
The model gets infant mortality right and does an excellent job at age 30 and above, but has a bit of trouble in the 5 to 25 range.

The Log-Quadratic System

Because this model is non-linear on the parameters we need to use an optimization routine, R's optim() or Stata's optimize(). In both cases we write a simple "residual sum of squares" function. Our function is "hardwired" to use b31 and the male model. Note that the data go up to age 85+ but the model continues to 110+. For simplicity I will only use ages 0 to 85 and skip the open-ended group. I use as starting values log(q10) and zero.

. mata:
───────────────────────────────────────────────── mata (type end to exit) ──────
:     obs = log(b31[1::18, 4])        

:     void rss(todo, b, f, g, H) {
>         external obs
>         fit = logquad(b[1], b[2], "male")[1::18]
>         f = -sum((obs - fit):^2)
>     }
note: argument todo unused
note: argument g unused
note: argument H unused

:     S = optimize_init()

:     optimize_init_evaluator(S, &rss())

:     h = log(1 - b31[3, 5])  // log(5q0)

:     optimize_init_params(S, (h, .2))

:     optimize(S)
Iteration 0:   f(p) = -.42176202  
Iteration 1:   f(p) = -.28965267  
Iteration 2:   f(p) = -.28942899  
Iteration 3:   f(p) = -.28942899  
                  1              2
    ┌───────────────────────────────┐
  1 │  -4.801873991    .3079241028  │
    └───────────────────────────────┘

: end
────────────────────────────────────────────────────────────────────────────────
>   obs <- log(b31$m[-19])
>   rss <- function(b) {
+   fit <- logquad(b[1], b[2], "male")$logmx[1:18]
+   sum((obs - fit)^2)
+   }
>   b <- c(log(1- b31$lx[3]/100000), 0)
>   r <- optim(b, rss)
>   r
$par
[1] -4.8019124  0.3079723

$value
[1] 0.289429

$counts
function gradient 
      55       NA 

$convergence
[1] 0

$message
NULL

The optimization procedure has converged. All we need now is to plot the result.

. mata b = optimize_result_params(S)

. mata lqfit = logquad(b[1], b[2], "male")[1::18]

. getmata lqfit, force replace

. gen logm = log(m)

. twoway scatter logm age ///
>    ||  line lqfit age,  legend(off)

. graph export logquads.png, width(500) replace
(file logquads.png written in PNG format)

Log_Quadratic Model

>   fit <- logquad(r$par[1], r$par[2], "male")$logmx[1:18]
>   lqfit <- data.frame(age = b31$agem[-19], obs, fit)
>   p <- ggplot(lqfit, aes(age, obs)) + geom_point() + geom_line(aes(age, fit))
>   ggsave("logquad.png", plot=p, width=500/72, height=400/72, dpi=72)

Log_Quadratic Model
The fit is pretty good overall but, not unlike the previous model, it seems to have some difficulty at ages 10 to 25.

Comparison

Here is a graph that overlays the two fits

. gen lmfit = log(mfit)

. twoway scatter logm age ///
>     ||  line lmfit age , lpat(dash) ///
>     ||  line lqfit age , legend(ring(0) pos(5) cols(1) ///
>         order(2 "Modified Logit" 3 "Log Quadratic"))

. graph export mortmods.png, width(500) replace
(file mortmods.png written in PNG format)

Model Comparison

> fits <- mutate(lqfit, fitm = log(mfit[-19]))
> p <- ggplot(fits, aes(age, obs)) + geom_point() +  
+   geom_line(aes(age, fit, color="c1")) + 
+   geom_line(aes(age, fitm, color="c2")) +
+   scale_color_manual(name="Model", values=c("c1" = "blue", "c2" = "red"), 
+   labels=c("Log-quadratic", "Modified logit")) + 
+   theme(legend.position =c(.9, .12))
> ggsave("mormodsr.png", plot=p, width=500/72, height=400/72, dpi=72)

Model Comparison
There seems to be some room for improvement, but in my opinion it will be hard to do much better without adding a third parameter.