Home | GLMs | Multilevel | Survival | Demography | Stata | R

Age-Specific Fertility Rates

I will illustrate the computation of single-year fertility rates from survey data using two approaches, one based on an exact tally of events and exposure by age, and a simple approximate method.

We will use WFS data from Colombia and compute rates for the three-year period before the survey. Here's our final product:

As usual, we start by reading the data, an extract already prepared.

. use http://data.princeton.edu/eco572/datasets/cofertx, clear
(COSR02 extract)

Tallying Events and Exposure

We create variables bot and top to define the window of observation. Exposure starts 36 months before the survey or when the woman turns 15, whichever is later.

. gen top = v007-1

. gen bot = v007-36

. gen turn15 = v008 + 180

. replace bot = turn15 if turn15 > bot
(895 real changes made)

. drop if bot > top // 15 on month of interview
(15 observations deleted)

A woman may contribute events and exposure to up to four different ages. The easiest way to handle this is to create a separate record for each year of age

. gen agebot = int((bot-v008)/12)

. gen agetop = int((top-v008)/12) // same as current age

. gen nages = agetop-agebot+1

. gen id = _n

To show exactly what's going on I'll list case 1 before and after the split

. list v007 v008 agebot bot top agetop nages b012 b022 in 1

     +---------------------------------------------------------------------+
     | v007   v008   agebot   bot   top   agetop   nages   b012       b022 |
     |---------------------------------------------------------------------|
  1. |  917    532       29   881   916       32       4    890   No birth |
     +---------------------------------------------------------------------+

. expand nages
(13841 observations created)

. bysort id: gen age = agebot + _n - 1

. list v007 v008 id age bot top b012 b022 if id==1

       +------------------------------------------------------+
       | v007   v008   id   age   bot   top   b012       b022 |
       |------------------------------------------------------|
    1. |  917    532    1    29   881   916    890   No birth |
    2. |  917    532    1    30   881   916    890   No birth |
    3. |  917    532    1    31   881   916    890   No birth |
    4. |  917    532    1    32   881   916    890   No birth |
       +------------------------------------------------------+

Now I have a record for each woman-year, with the age of the woman that year. I will now fix the start and end date of each segment. A segment starts at bot or at a birthday, and ends a year later or at top.

. gen bday = v008 + 12*age

. replace bot = bday if bday > bot
(13841 real changes made)

. replace top = bday+11 if bday+11 < top
(13841 real changes made)

. gen expo = top-bot+1 // in months for now

. list v007 v008 id age bot top expo b012 b022 if id==1

       +-------------------------------------------------------------+
       | v007   v008   id   age   bot   top   expo   b012       b022 |
       |-------------------------------------------------------------|
    1. |  917    532    1    29   881   891     11    890   No birth |
    2. |  917    532    1    30   892   903     12    890   No birth |
    3. |  917    532    1    31   904   915     12    890   No birth |
    4. |  917    532    1    32   916   916      1    890   No birth |
       +-------------------------------------------------------------+

All that remains is to count births in the segment

. gen births = 0

. forvalues i=1/24 {
  2.         local n = "`i'"
  3.         if `i' < 10 local n = "0`i'"
  4.         qui replace births = births+1 if b`n'2 >= bot & b`n'2 <= top
  5. }

. tab births

     births |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     17,102       89.05       89.05
          1 |      2,081       10.84       99.89
          2 |         21        0.11      100.00
------------+-----------------------------------
      Total |     19,204      100.00

. list v007 v008 id age bot top expo births if id==1

       +----------------------------------------------------+
       | v007   v008   id   age   bot   top   expo   births |
       |----------------------------------------------------|
    1. |  917    532    1    29   881   891     11        1 |
    2. |  917    532    1    30   892   903     12        0 |
    3. |  917    532    1    31   904   915     12        0 |
    4. |  917    532    1    32   916   916      1        0 |
       +----------------------------------------------------+

Finally we collapse the dataset by age (and any additional variables of interest, such as residence or education).

. collapse (sum) births (sum) expo, by(age)

. replace expo=expo/12
(35 real changes made)

. gen asfr = births/expo

Let us compute the Total Fertility rate (TFR) and the mean age of the fertility schedule, for wich we need the midpoints of the age groups. (Note that Stata's analytic weights are just what we need, as they are standardized to average one.)

. sum asfr

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        asfr |        35    .1293665    .0811364          0   .2645395

. di r(sum)
4.527827

. gen agem = age + 0.5

. sum agem [aw=asfr]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
        agem |      33  4.52782704     28.6831   7.300789       15.5       47.5

The TFR is 4.53 and the mean age of childbearing is 28.7. To plot the rates we use the midpoints of the age groups,

. scatter asfr agem, xtitle(age)

The pattern looks quite reasonable, except perhaps for the rates at ages 22 and 29 which seem a bit out of line. I'll save these results for later use

. save coasfr, replace
(note: file coasfr.dta not found)
file coasfr.dta saved

To compute rates for five-year age groups one can simply recode age and colllapse again. You might find it instructive to do the calculation for five-year groups from scratch.

A Simple Approximation

A much simple approach is to attribute events and exposure to the age of each woman in the middle of her observation period. Results are often very similar. (This is my preferred approach for fitting regression models, among other things because it keeps a single observation per woman.)

We start by defining the observation window just as before

. use http://data.princeton.edu/eco572/datasets/cofertx, clear
(COSR02 extract)

. gen top = v007-1

. gen bot = v007-36

. gen turn15 = v008 + 180

. replace bot = turn15 if turn15 > bot
(895 real changes made)

. drop if bot > top // 15 on month of interview
(15 observations deleted)

But we then simply counts events and exposure in the window and attribute them to age at the midpoint,

. gen age = int( ((bot+top)/2 - v008)/12)

. gen expo = top-bot+1

. gen births = 0

. forvalues i=1/24 {
  2.         local n = "`i'"
  3.         if `i' < 10 local n = "0`i'"
  4.         qui replace births = births+1 if b`n'2 >= bot & b`n'2 <= top
  5. }

. tab births

     births |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      3,720       69.36       69.36
          1 |      1,210       22.56       91.93
          2 |        386        7.20       99.12
          3 |         47        0.88      100.00
------------+-----------------------------------
      Total |      5,363      100.00

We now collapse and compute rates, as well as the TFR and mean age of childbearing

. collapse (sum) births (sum) expo, by(age)

. replace expo=expo/12
(34 real changes made)

. gen asfr = births/expo

. sum asfr

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        asfr |        34    .1323788     .077477          0    .245098

. di r(sum)
4.5008806

. gen agem = age + 0.5

. sum agem [aw=asfr]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
        agem |      32  4.50088064    28.62576   7.279611       15.5       46.5

The TFR is 4.50 and the mean age of childbearing is 28.6.

Let me merge the previous results to compare the exact and approximate results. I will rename the rates and drop births, exposure, and the age midpoints, to avoid name conflicts.

. rename asfr asfra

. drop births expo agem

. merge age using coasfr, sort

. twoway (line asfr agem ) (line asfra agem, lp(dash)) , ///
>         title("Age-Specific Fertility Rates") xtitle(age)  ///
>         subtitle("Colombia WFS, 1976") note(3 Years Preceding the Survey) ///
>         legend(ring(0) pos(1) order(1 "Exact" 2 "Approx.") cols(1) size(small))

. graph export coasfr.png
(file coasfr.png written in PNG format)

This, of course, is the figure at the top of this page.