Germán Rodríguez
Demographic Methods Princeton University

Coale's Marital Fertility Model

Here's a quick run showing how to fit Coale's model of marital fertility using Poisson regression. This is the original example in Brostrom (1985), Demography, 22:625-631. File brostrom.dat in the datasets section has the births, exposure, and the five-year standards na and va in comma-separated format.

. insheet  using http://data.princeton.edu/eco572/datasets/brostrom.dat, comma
(5 vars, 6 obs)
> br <- read.csv("http://data.princeton.edu/eco572/datasets/brostrom.dat", header=TRUE)

We treat births as Poisson and the log of exposure time times natural fertility as an offset:

. gen os = log(expo * na)

. poisson births va, offset(os)

Iteration 0:   log likelihood = -22.290459  
Iteration 1:   log likelihood = -19.308134  
Iteration 2:   log likelihood = -19.303226  
Iteration 3:   log likelihood = -19.303226  

Poisson regression                              Number of obs     =          6
                                                LR chi2(1)        =      19.05
                                                Prob > chi2       =     0.0000
Log likelihood = -19.303226                     Pseudo R2         =     0.3304

------------------------------------------------------------------------------
      births |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          va |   .3584112   .0827943     4.33   0.000     .1961373     .520685
       _cons |  -.0274854   .0599075    -0.46   0.646     -.144902    .0899312
          os |          1  (offset)
------------------------------------------------------------------------------
> pr <- glm(births ~ va + offset(log(exposure * na)), data=br, family=poisson)

> pr

Call:  glm(formula = births ~ va + offset(log(exposure * na)), family = poisson, 
    data = br)

Coefficients:
(Intercept)           va  
   -0.02749      0.35841  

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:      20.19 
Residual Deviance: 1.146    AIC: 42.61

We find that log M = -0.027 (so M = 0.973) and m = 0.358, indicating substantial control of fertility. Brostrom obtained log M = -0.026 and m = 0.361 using GLIM. Stata and R usually go an extra iteration.

To check model fit we can compare observed and fitted rates or use a formal chi-squared test:

. estat gof

         Deviance goodness-of-fit =   1.14649
         Prob > chi2(4)           =    0.8868

         Pearson goodness-of-fit  =  1.133856
         Prob > chi2(4)           =    0.8889

. gen am = age + 2.5

. gen obs = births/expo

. predict fv
(option n assumed; predicted number of events)

. gen fit = fv/expo

. scatter obs am || line fit am, title(Brostrom's Example) ///
>    legend(off) xtitle(age) ytitle(marital fertility)

. graph export brostrom.png, width(500) replace
(file brostrom.png written in PNG format)

Brostrom

> br <- mutate(br, obs = births/exposure, fit = fitted(pr)/exposure)

> ggplot(br, aes(age, obs)) + geom_point() + geom_line(aes(age,fit))

> ggsave("brostromr.png", width=500/72, height=400/72, dpi=72)

Brostrom The model fits pretty well, although it slightly overestimates fertility in the youngest age group, 20-24. For comparison, here are the estimates one would obtain using OLS as in the textbook (page 206).

. gen y = log(obs/na)

. reg y va, noheader
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          va |   .3930404   .0391997    10.03   0.001     .2842045    .5018763
       _cons |  -.0288292   .0404857    -0.71   0.516    -.1412356    .0835772
------------------------------------------------------------------------------
> lm(log(obs/na) ~ va, data=br)

Call:
lm(formula = log(obs/na) ~ va, data = br)

Coefficients:
(Intercept)           va  
   -0.02883      0.39304  

The results are similar, with logM = -0.29 and m = 0.393.

A useful diagnostic plot is log(f(a)/n(a)) versus v(a), as that relationship should be a straight line, which is then estimated by OLS or Poisson maximum likelihood. For an example see Box 9.3 in the textbook.