Germán Rodríguez
Demographic Methods Princeton University

Age-Specific Fertility Rates

I will illustrate the computation of single-year fertility rates from survey data using two approaches, one based on an exact tally of events and exposure by age, and a simple approximate method.

We will use WFS data from Colombia and compute rates for the three-year period before the survey.

Here's our final product

As usual, we start by reading the data, an extract already prepared.

. use http://data.princeton.edu/eco572/datasets/cofertx, clear
(COSR02 extract)
> library(foreign)

> co <- read.dta("http://data.princeton.edu/eco572/datasets/cofertx.dta")

This file is in "wide" format, with one record per woman having her data followed by data for up to 24 births, one after the other. The layout of the file is as follows

v007 v008 m012 [bn1 bn2 bn3 bn4 bn5] v702 v703

where n goes from 01 to 24 but at most 19 slots are used, the rest are filled with 8's for "not used". We are particularly interested in bn2, the date of birth.

We can work with the file in "wide" format or create separate files for women and births, with the births in "long" format". I'll use both approaches, wide with Stata and long with R. The variables m012, v702 and v703 are date of first union, type of place of residence, and childhood place of residence, respectively.

> co <- mutate(co, id = row_number())

> cow <- select(co, id, v007, v008, m012, v702, v703)

> bvars <- c(paste("b0", 1:9, 2, sep=""), paste("b", 10:24, 2, sep=""))

> bwide <- co[, c("v007", "v008", bvars)]

> cob <- reshape(bwide, direction="long", varying = bvars, v.names = "bdate") %>%
+   filter(bdate < 8888)

Tallying Events and Exposure

I will create variables called bot and top to define the window of observation. Exposure starts 36 months before the survey or when the woman turns 15, whichever is later. The date of interview is v007 and the date ot birth of the woman is v008.

. gen top = v007 - 1

. gen bot = v007 - 36

. gen turn15 = v008 + 180

. replace bot = turn15 if turn15 > bot
(895 real changes made)

. drop if bot > top // 15 on month of interview
(15 observations deleted)
> cow <- mutate(cow,
+   top = v007 - 1,     
+   turn15 = v008 + 180,
+   bot = ifelse(turn15 > v007 - 36, turn15, v007 - 36)) %>%
+   filter(bot <= top) # exclude 15 on month of interview           

A woman may contribute events and exposure to up to four different ages. The easiest way to handle this is to create a separate record for each year of age

. gen agebot = int((bot - v008)/12)

. gen agetop = int((top - v008)/12) // same as current age

. gen nages = agetop - agebot + 1

. gen id = _n
> cow <- mutate(cow, 
+   agebot = floor((bot - v008)/12),
+   agetop = floor((top - v008)/12),
+   nages = agetop - agebot + 1)

To show exactly what's going on I'll list case 1 before and after the split

. list v007 v008 bot top agebot agetop nages b012 b022 in 1

     +---------------------------------------------------------------------+
     | v007   v008   bot   top   agebot   agetop   nages   b012       b022 |
     |---------------------------------------------------------------------|
  1. |  917    532   881   916       29       32       4    890   No birth |
     +---------------------------------------------------------------------+

. expand nages
(13,841 observations created)

. bysort id: gen age = agebot + _n - 1

. list v007 v008 id age bot top b012 b022 if id==1

       +------------------------------------------------------+
       | v007   v008   id   age   bot   top   b012       b022 |
       |------------------------------------------------------|
    1. |  917    532    1    29   881   916    890   No birth |
    2. |  917    532    1    30   881   916    890   No birth |
    3. |  917    532    1    31   881   916    890   No birth |
    4. |  917    532    1    32   881   916    890   No birth |
       +------------------------------------------------------+
> head(select(cow, id, v007, v008, bot, top, agebot, agetop, nages), 1)
  id v007 v008 bot top agebot agetop nages
1  1  917  532 881 916     29     32     4

> i <- rep(1:nrow(cow), cow$nages)

> cow <- cow[i, ]

> cow <- group_by(cow, id) %>% mutate(age = agebot + row_number() - 1)  

> filter(cow, id==1) %>% 
+ select(id, v007, v008, bot, top, agebot, agetop, nages, age)
Source: local data frame [4 x 9]
Groups: id [1]

     id  v007  v008   bot   top agebot agetop nages   age
  (int) (int) (int) (dbl) (dbl)  (dbl)  (dbl) (dbl) (dbl)
1     1   917   532   881   916     29     32     4    29
2     1   917   532   881   916     29     32     4    30
3     1   917   532   881   916     29     32     4    31
4     1   917   532   881   916     29     32     4    32

Now we have a record for each woman-year, with the age of the woman that year. We just have to fix the start and end date of each segment. A segment starts at bot or at a birthday, and ends a year later or at top.

. gen bday = v008 + 12*age

. replace bot = bday if bday > bot
(13,841 real changes made)

. replace top = bday + 11 if bday + 11 < top
(13,841 real changes made)

. gen expo = top - bot + 1 // in months for now

. list v007 v008 id age bot top expo b012 b022 if id==1

       +-------------------------------------------------------------+
       | v007   v008   id   age   bot   top   expo   b012       b022 |
       |-------------------------------------------------------------|
    1. |  917    532    1    29   881   891     11    890   No birth |
    2. |  917    532    1    30   892   903     12    890   No birth |
    3. |  917    532    1    31   904   915     12    890   No birth |
    4. |  917    532    1    32   916   916      1    890   No birth |
       +-------------------------------------------------------------+
> cow <- mutate(cow, 
+   bday = v008 + 12 * age,
+   bot = ifelse(bday > bot, bday, bot),
+   top = ifelse(bday + 11 < top, bday + 11, top),
+   expo = top - bot + 1) 

> filter(select(cow, v007, v008, age, bot, top, expo), id==1)       
Source: local data frame [4 x 7]
Groups: id [1]

     id  v007  v008   age   bot   top  expo
  (int) (int) (int) (dbl) (dbl) (dbl) (dbl)
1     1   917   532    29   881   891    11
2     1   917   532    30   892   903    12
3     1   917   532    31   904   915    12
4     1   917   532    32   916   916     1

All that remains is to count births in each age segment, which we do by looping over the variables b012 ... b242. which we do by joining births by id and age of mother.

. gen births = 0

. forvalues i=1/24 {
  2.       local n = "`i'"
  3.       if `i' < 10 local n = "0`i'"
  4.       qui replace births = births + 1 if b`n'2 >= bot & b`n'2 <= top
  5. }

. tab births

     births |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     17,102       89.05       89.05
          1 |      2,081       10.84       99.89
          2 |         21        0.11      100.00
------------+-----------------------------------
      Total |     19,204      100.00

. list v007 v008 id age bot top expo births if id==1

       +----------------------------------------------------+
       | v007   v008   id   age   bot   top   expo   births |
       |----------------------------------------------------|
    1. |  917    532    1    29   881   891     11        1 |
    2. |  917    532    1    30   892   903     12        0 |
    3. |  917    532    1    31   904   915     12        0 |
    4. |  917    532    1    32   916   916      1        0 |
       +----------------------------------------------------+
> cob <- mutate(cob, 
+   age = floor((bdate - v008)/12)) %>%
+   filter(bdate >= v007 - 36 & bdate < v007 & age >= 15)

> coba <- group_by(cob, id, age) %>% summarize(births = n())        

> cowb <- left_join(cow, coba, by = c("id", "age")) 

> cowb$births[is.na(cowb$births)] <- 0

Finally we collapse the dataset by age (and any additional variables of interest, such as residence or education), express exposure in years rather than months, and compute the rates

. collapse (sum) births (sum) expo, by(age)

. replace expo=expo/12
(35 real changes made)

. gen asfr = births/expo
> cofr <- group_by(cowb, age) %>% 
+   summarize(
+       births = sum(births),
+       expo = sum(expo)/12,
+       asfr = births/expo)

Let us compute the Total Fertility rate (TFR) and the mean age of the fertility schedule, for wich we need the midpoints of the age groups.

. sum asfr

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        asfr |         35    .1293665    .0811364          0   .2645395

. di r(sum)
4.527827

. gen agem = age + 0.5

. sum agem [aw=asfr]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
        agem |      33  4.52782704     28.6831   7.300789       15.5       47.5
> cofr <- mutate(cofr, agem = age + 0.5)

> summarize(cofr, 
+   tfr = sum(asfr), 
+   mac = weighted.mean(agem, asfr))
Source: local data frame [1 x 2]

       tfr     mac
     (dbl)   (dbl)
1 4.527827 28.6831

The TFR is 4.53 and the mean age of childbearing is 28.7. To plot the rates we use the midpoints of the age groups,

. scatter asfr agem, xtitle(age)
> ggplot(cofr, aes(agem, asfr)) + geom_line()

The pattern looks quite reasonable, except perhaps for the rates at ages 22 and 29 which seem a bit out of line. I'll save these results for later use

. save coasfr, replace
file coasfr.dta saved

To compute rates for five-year age groups one can simply recode age and collapse again. You might find it instructive to do the calculation for five-year groups from scratch.

Bruno Schoumaker (2013) has written a Stata command called tfr2 to implement the procedures described here, see Demographic Research, vol 28, article 38. His figure 5 should look familiar.

A Simple Approximation

A much simple approach is to attribute events and exposure to the age of each woman in the middle of her observation period. Results are often very similar. (This is my preferred approach for fitting regression models, among other things because it keeps a single observation per woman.)

We start by defining the observation window just as before

. use http://data.princeton.edu/eco572/datasets/cofertx, clear
(COSR02 extract)

. gen top = v007 - 1

. gen bot = v007 - 36

. gen turn15 = v008 + 180

. replace bot = turn15 if turn15 > bot
(895 real changes made)

. drop if bot > top // 15 on month of interview
(15 observations deleted)
> cos <- mutate(co,
+   turn15 = v008 + 180,
+   top = v007 - 1,
+   bot = ifelse(turn15 > v007 - 36, turn15, v007 - 36)) %>%
+   filter( bot <= top)

But we then simply counts events and exposure in the window and attribute them to age at the midpoint,

. gen age = int( ((bot + top)/2 - v008)/12)

. gen expo = top - bot + 1

. gen births = 0

. forvalues i=1/24 {
  2.       local n = "`i'"
  3.       if `i' < 10 local n = "0`i'"
  4.       qui replace births = births+1 if b`n'2 >= bot & b`n'2 <= top
  5. }

. tab births

     births |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      3,720       69.36       69.36
          1 |      1,210       22.56       91.93
          2 |        386        7.20       99.12
          3 |         47        0.88      100.00
------------+-----------------------------------
      Total |      5,363      100.00
> cos$births <- 0

> for(j in 1:24) {
+   name <- paste(ifelse(j < 10, "b0", "b"), j, "2", sep="")
+   cos$births <- cos$births + (cos[,name] >= cos$bot & cos[,name] <= cos$top)
+ }

> cos <- mutate(cos, 
+   age = floor(((bot + top)/2 - v008)/12),
+   expo = top - bot + 1)

We now collapse and compute rates, as well as the TFR and mean age of childbearing

. collapse (sum) births (sum) expo, by(age)

. replace expo = expo/12
(34 real changes made)

. gen asfr = births/expo

. sum asfr

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        asfr |         34    .1323788     .077477          0    .245098

. di r(sum)
4.5008806

. gen agem = age + 0.5

. sum agem [aw=asfr]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
        agem |      32  4.50088064    28.62576   7.279611       15.5       46.5
> cofers <- group_by(cos, age) %>%
+   summarize(
+       births = sum(births),
+       expo = sum(expo)/12) %>%
+   mutate( 
+       asfr = births/expo,
+       agem = age + 0.5)

> summarize(cofers,
+   tfr = sum(asfr),
+   mac = weighted.mean(agem, asfr))
Source: local data frame [1 x 2]

       tfr      mac
     (dbl)    (dbl)
1 4.500881 28.62576

The TFR is 4.50 and the mean age of childbearing is 28.6.

Let me merge the previous results to compare the exact and approximate estimates up to age 48. I will rename the rates and drop births, exposure, and the age midpoints, to avoid name conflicts.

. rename asfr asfra

. drop births expo agem

. merge 1:1 age using coasfr

    Result                           # of obs.
    -----------------------------------------
    not matched                             1
        from master                         0  (_merge==1)
        from using                          1  (_merge==2)

    matched                                34  (_merge==3)
    -----------------------------------------

. twoway (line asfr agem ) (line asfra agem, lp(dash)) , ///
>    title("Age-Specific Fertility Rates") xtitle(age)  ///
>    subtitle("Colombia WFS, 1976") note(3 Years Preceding the Survey) ///
>    legend(ring(0) pos(1) order(1 "Exact" 2 "Approx.") cols(1) size(small))

. graph export coasfr.png, width(500) replace
(file coasfr.png written in PNG format)
> codf <- data.frame( 
+   age = rep(15:48, 2),
+   asfr = c(cofr$asfr[-35], cofers$asfr),
+   method = factor(rep(c("exact","approx"),rep(34,2)))
+ )

> ggplot(codf, aes(age, asfr, color=method)) + geom_line() +
+   ggtitle("Age-Specific Fertility Rates, Colombia WFS, 1976")

> ggsave("coasfrr.png", width = 500/72, height = 400/72, dpi = 72)

This, of course, is the figure at the top of this page.