Eco572: Research Methods in Demography

Solutions to Problem Set 4

[1] Marriage in France

(a) We compute and plot the period total marriage rate (TMR) and the period mean age at first marriage. (The plot of TMR is shown further below, together with the tempo-adjusted adjusted TMR.)

. global DATASETS http://data.princeton.edu/eco572/datasets

. infile age15-age50 using $DATASETS/FranceAfmPeriodAge.dat, clear
(33 observations read)

. gen year = 1967 + _n

. egen tmr = rowtotal(age15-age50)

. line tmr year, title(Period Total Marriage Rate)

. gen afm = 0

. forvalues age=15/50 {
  2.         quietly replace afm = afm + `age' * age`age'
  3. }

. quietly replace afm = afm/tmr

. line afm year, title(Period Mean Age at First Marriage) ///
>         subtitle(France 1968-200)

. graph export ps4Fig1.png, replace
(file ps4Fig1.png written in PNG format)

We see a steady decline in period total marriage rates, reaching a low just below 50% in 1993-95, followed by a sligh increase to reach about 60% at the turn of the century. In the meantime period mean age at marriage increased steadily in the last quarter century, climbing from historic levels of 22-23 years to 28 years of age in 2000.

(b) To compute the Bongaarts-Feeney adjustment we estimate r, the rate of change in period mean age at first marriage, using half the change from the year before to the year after each calendar year, and then simply divide the TMR by 1-r.

. gen r = (afm[_n+1]-afm[_n-1])/2
(2 missing values generated)

. gen bftmr = tmr/(1-r)
(2 missing values generated)

. line tmr bftmr year, lpat(solid dash) ///
>         title(Observed and Tempo-Adjusted Total Marriage Rate) ///
>         subtitle(France 1969-1999) ///
>         legend( ring(0) pos(1) cols(1) order(1 "Observed" 2 "Adjusted"))

. graph export ps4Fig2.png, replace
(file ps4Fig2.png written in PNG format)

Not surprinsingly, the tempo-adjusted total marriage rate exceeds the observed rate for the last quarter of the century, when period mean age at marriage was increasing, with some large fluctuations in recent years.

This is a counterfactual estimate of how many women would have married each year if they had not postponed marriage that year, under the assumption that women of all ages delay marriage by the same amount of time in any given year. The implication is that as much as 70% of women will eventually marry.

(c) To do this part we turn to a dataset showing marriage frequencies by age and cohort (the diagonals of the current data)

. infile coh1968-coh2000 using $DATASETS/FranceAfmAgeCohort.dat, clear
(36 observations read)

. drop in 34/36 // always missing
(3 observations deleted)

. gen age = 14 + _n

. forvalues year=1968/1987 {
  2.         gen cum`year' = sum(coh`year')/10000
  3.         qui replace cum`year' = . if missing(coh`year')
  4. }

. gen agep = age+0.5 // at end of year of age

. twoway (line cum1968-cum1986 agep ) ///
>         (line cum1987 agep, lwidth(thick) ) , xtitle(age) ///
>         title(Cumulative Proportions Married by Age) ///
>         subtitle(French Cohorts Turning 15 in 1968-1987) legend(off)

. graph export ps4Fig3.png, replace
(file ps4Fig3.png written in PNG format)

We see that each cohort has followed a progressively lower curve, with fewer women marrying and those who marry doing it later in life. The thick line is the cohort that turned 15 in 1987. It is following a lower course but the curve in the late twenties is steeper than for older cohorts.

(d) We now compute discrete hazards, dividing the marriage frequencies by the complements of the cumulative proportions married before each year of age

. forvalues year=1968/1987 {
  2.         qui gen haz`year' = coh`year'/(1-cum`year'[_n-1])
  3.         qui replace haz`year' = coh`year' in 1
  4. }

. twoway (line haz1968-haz1986 age ) ///
>         (line haz1987 agep, lwidth(thick) ) , xtitle(age) ///
>         title(Proportions Marrying Among Single Women by Age) ///
>         subtitle(French Cohorts Turning 15 in 1968-1987) legend(off)

. graph export ps4Fig4.png, replace
(file ps4Fig4.png written in PNG format)

We see that the hazard of first marriage has declined substantially at younger ages, say up to the historic mean age at marriage of 22 or 23. It also has increased a little bit in the late twenties, but remains substantially unchanged past age 30.

(e) This is anybody's guess because we don't know the future. It is clear, however, that each cohort has shown fewer women married at each age. There is a clear indication that this is due at least in part to marriage delay, as the hazard has increased at older ages. But it also seems clear that fewer women will eventually marry, as the hazard in the late twenties doesn't appear to have risen enough to make up for the delay. The cohort that turned 15 in 1987 was only 28 in 2000, but with only 41% married it doesn't seem likely that as many as 70% will eventually marry.

[2] Mortality in the U.S.

We read the data and merge the Lee-Carter parameters. Note that we need only 3 years, 1989, 2000, and 2001, and that we will treat 105+ as the highest age group.

. infile year age mx using $DATASETS/usm3301.dat, clear
(1656 observations read)

. keep if (year == 1989 | year >= 2000) & age < 110
(1587 observations deleted)

. sort age

. merge age using $DATASETS/LeeCarterAb
variable age does not uniquely identify observations in the master data

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |         69      100.00      100.00
------------+-----------------------------------
      Total |         69      100.00

. sort year age

(a) The published forecast of k for 2001 is -15.43 with a standard deviation of 2.26 (from the JASA article). Below I compute a 95% confidence interval and then estimate the observed k.

. scalar k = -15.43

. scalar se = 2.26

. di k - 1.96*se, k + 1.96*se
-19.8596 -11.0004

. gen y = log(mx) - a if year == 2001
(46 missing values generated)

. regress y b if age < 85 , noconstant 

      Source |       SS       df       MS              Number of obs =      18
-------------+------------------------------           F(  1,    17) =  426.01
       Model |  9.67047781     1  9.67047781           Prob > F      =  0.0000
    Residual |    .3859054    17  .022700318           R-squared     =  0.9616
-------------+------------------------------           Adj R-squared =  0.9594
       Total |  10.0563832    18  .558687956           Root MSE      =  .15067

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           b |  -12.31804   .5968066   -20.64   0.000    -13.57719   -11.05889
------------------------------------------------------------------------------

The estimated value of k is -12.09, well within the 95% confidence interval.

(b) We now plot the 2001 rates on top of a 95% confidence region based on the 1989 forecast

. gen low  = a + (k-1.96*se) * b  if year == 2001 
(46 missing values generated)

. gen high = a + (k+1.96*se) * b  if year == 2001
(46 missing values generated)

. gen obs = log(mx) if year == 2001
(46 missing values generated)

. gen agem = age + 2.5 if age < 85
(15 missing values generated)

. twoway (area low agem, color(gs12)) ///
>         (area high agem, color(white)) ///
>         (scatter obs agem, color(black)), ///
>         legend(order(1 "Forecast" 3 "Observed") ring(0) pos(5)) ///
>         title(U.S. Age-Specific Mortality Rates in 2001) xtitle(age) ///
>         subtitle(and 95% Confidence Region from 1989 Forecast)

. graph export ps4Fig5.png, replace
(file ps4Fig5.png written in PNG format)

We see that the forecast was a bit too optimistic for ages between 20 and 50, which tend to fall just outside the confidence region.

(c) We revise the forecast to use the correct jump off in 1989. This corrects for the fact than when the forecast was made the age pattern of mortality differed from the average in the 1900s. We recompute k so it is 0 in 1989. The standard error doesn't change. (With 23 age groups, the values for 1989 are 46 lines before those for 2001.)

. scalar k = -12*0.365

. gen a89 = log(mx) if year==1989
(46 missing values generated)

. gen lowj  = a89[_n-46] + (k-1.96*se) * b  if year == 2001 
(46 missing values generated)

. gen highj = a89[_n-46] + (k+1.96*se) * b  if year == 2001
(46 missing values generated)

. twoway (area lowj agem, color(gs12)) ///
>         (area highj agem, color(white)) ///
>         (scatter obs agem, color(black)), ///
>         legend(order(1 "Forecast" 3 "Observed") ring(0) pos(5)) ///
>         title(U.S. Age-Specific Mortality Rates in 2001) xtitle(age) ///
>         subtitle(and 95% Forecast Region With 1989 Correct Jump-Off)

. graph export ps4Fig6.png, replace
(file ps4Fig6.png written in PNG format)

We see that with the correct jump-off all the observed values lie inside the 95% confidence region based on a forecast with a 12-year time horizon.

(d) The next task is to project to 2050, this time using 2000 as the jump-off. I'll keep just the data for 2000. We need to reset k and the standard error to reflect the 50-year projection horizon.

. keep if year==2000
(46 observations deleted)

. gen a2000 = log(mx)

. scalar k = -0.365*50

. scalar se = sqrt(50)*0.651

. gen mid = a2000 + k*b

. replace low = a2000 + (k-1.96*se) * b
(23 real changes made)

. replace high= a2000 + (k+1.96*se) * b
(23 real changes made)

We are going to have to compute three life tables to get the expectation of life for each forecast. Obviously it helps to encapsulate these calculations. The program below computes the life table and stores the life expectancy at birth in r(e0). The question called for assuming constant risk except for the first three age groups. Few did this. I will assume constant risk at all ages to emphasize the procedure.

. capture program drop e0

. program define e0, rclass
  1.         syntax varlist 
  2.         tempvar mx nx ax qx lx Lx
  3. qui {
  4.         gen `nx' = 5
  5.         replace `nx' = 1 in 1
  6.         replace `nx' = 4 in 2
  7.         gen `mx' = exp(`varlist') // argument is log rates
  8.         gen `qx' = 1-exp(-`nx'*`mx')            
  9.         gen `lx' = 1    
 10.         replace `lx' = `lx'[_n-1] * (1-`qx'[_n-1]) if _n > 1
 11.         gen `Lx' = (`lx' - `lx'[_n+1])/`mx'
 12.         replace `Lx' = `lx'/`mx' in -1
 13.         summarize `Lx'
 14. }
 15.         display "Life expectancy at birth = ", r(sum)
 16.         //list `mx' `qx' `lx' `Lx' 
.         return scalar e0 = r(sum)
 17. end

. e0 low
Life expectancy at birth =  87.86055

. e0 mid
Life expectancy at birth =  84.500317

. e0 high
Life expectancy at birth =  80.966722

So our point estimate is 84.50 with a 95% confidence interval going from 80.97 to 87.86. (Using the nax values given, with 2.5 at ages above 10, gives 84.55 with bounds 81.02 and 87.90.) The results are essentially the same.

(e) The Social Security Administration has their own data and they produce forecasts which usually are more pessimistic than the Lee-Carter estimates. For 2050 they forecast a life expectancy of 80.2, with lower and upper bounds of 77.9 and 83.8. These differences were also apparent at the time the original paper was published, and Lee and Carter comment on them. They express concern that the SSA will be unprepared for the high dependency ratios that will accompany life expectancy that substantially exceeds their forecasts.

[3] The Population of Malaysia

The last question deals with the population of (peninsular) Malaysia. We start by reading the data

. infile age pop deaths births Lx ///
>     using $DATASETS/malaysia85rev.dat, clear
(19 observations read)

. replace pop = pop[1] + pop[2] in 2
(1 real change made)

. replace births = births[1] + births[2] in 2
(0 real changes made)

. replace Lx = Lx[1] + Lx[2] in 2
(1 real change made)

. drop in 1
(1 observation deleted)

(a) I will project the population using the Leslie matrix that we programmed earlier. The question can also be answered in Stata as shown in the projection handout.

. quietly do http://data.princeton.edu/eco572/Leslie

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: pop = st_data(.,"pop")

: sum(pop)
  6468454

: L  = st_data(.,"Lx")

: s = 198384/(198384+208422)

: m  = s * st_data(.,"births") :/ pop

: M = Leslie(L,m)

: pop2000 = M * M * M * pop

: sum(pop2000)
  9435114.731

: pop2015 = M * M * M * pop2000

: sum(pop2015)
  13134842.96

: end
---------------------------------------------------------------------------------------------------------------------

So at current rates we expect Malaysia's population of 6,468,454 to become 9.435,114 by 2000 and 13,134,843 by 2015. According to the U.N. (or the Malaysian Census of 2000) the female population of Malaysia was about 11.317 million in 2000. It turns out the data in Watcher pertain only to peninsular Malaysia, whereas the U.N. data include the entire federation. Once you allow for this fact, the projection is quite reasonable.

(b) We now compute Lotka's r and the stable equivalent age distribution. I estimate the mean age of the net maternity schedule, but using a value such as 27 or 29 works too.

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: nrr = sum(m :* L)

: nrr
  1.822210393

: a = range(0,85,5) :+ 2.5

: a[18] = 85 + 6.803

: ma = sum(a :* m :* L)/nrr

: ma
  29.76516892

: // initial value
: mata

: r = log(nrr)/ma

: r
  .0201594779

: le = sum( exp(-r*a) :* m :* L)

: le
  1.007762626

: // two iterations
: for (ni=1; ni <= 2; ni++) {
> r = r + (le-1)/ma
> r
> le = sum( exp(-r*a) :* m :* L)
> le
> }
  .0204202736
  1.000169999
  .0204259849
  1.000004393

: na = exp(-r*a) :* L

: stable = na/sum(na)

: a, stable
                  1             2
     +-----------------------------+
   1 |          2.5   .1270117557  |
   2 |          7.5   .1138140308  |
   3 |         12.5    .102496592  |
   4 |         17.5   .0922870005  |
   5 |         22.5     .08303487  |
   6 |         27.5   .0746494583  |
   7 |         32.5   .0670347346  |
   8 |         37.5   .0600971822  |
   9 |         42.5   .0536901591  |
  10 |         47.5   .0477183224  |
  11 |         52.5   .0419878143  |
  12 |         57.5   .0363537628  |
  13 |         62.5   .0305660588  |
  14 |         67.5   .0246641674  |
  15 |         72.5    .018458545  |
  16 |         77.5   .0124334607  |
  17 |         82.5   .0074336777  |
  18 |       91.803   .0062684078  |
     +-----------------------------+

: end
---------------------------------------------------------------------------------------------------------------------

A plot wasn't required, but this distribution is not too different from the observed age distribution, suggesting that peninsular Malaysia wasn't too far from stable in 1985.

(c) We now look at momentum using three methods. The first task is to reduce fertility and get a new Leslie matrix:

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: ms = m/nrr

: sum(ms :* L) // check it is 1
  1

: S = Leslie(L,ms)

: ps = pop

: for (i=1; i <= 20; i++) {
>         ps = S * ps
> }

: sum(ps)/sum(pop)
  1.624122438

: end
---------------------------------------------------------------------------------------------------------------------

So even if fertility dropped to replacement level the population of peninsular Malaysia would still grow by 62.4% in the next 100 years. Next we try the Preston-Guillot method:

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: ba = ms :* L

: as = sum( a :* ba)

: tail = sum(ba)

: w = J(length(ba),1,0)

: for (i=1; i <= length(ba); i++) {
>         tail = tail - ba[i]
>         w[i] = 5*(ba[i]/2 + tail)/as
> }

: current = pop/sum(pop)

: stationary = L/sum(L)

: sum( w :* current :/ stationary)
  1.623385173

: end
---------------------------------------------------------------------------------------------------------------------

We get an estimate of 1.623, or additional growth of 62.3%, in excellent agreement with our projection. (The method is exact, so any difference would be due to lack of stationarity after 100 years.) Finally we compute Keyfitz's estimate

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: b = sum(m:*pop)/sum(pop)

: e0 = sum(L)

: b * e0 / sqrt(nrr)
  1.656044615

: end
---------------------------------------------------------------------------------------------------------------------

We get 1.656 or 65.6% additional growth. The method works well because in 1985 Malaysia was not foo far from stable.

(d) The following graph shows the current and stationary age distributions

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: st_store(.,st_addvar("float","current"),current)

: st_store(.,st_addvar("float","stationary"),stationary)

: end
---------------------------------------------------------------------------------------------------------------------

. gen agem = age + 2.5

. quietly replace agem = 91.8 in -1

. line current stationary agem, xtitle(age) lpat(solid dash) ///
>         title(Current and Stationary Age Distributions) ///
>         subtitle(Peninsular Malaysia 1985) ///
>         legend(ring(0) pos(1) col(1))

. graph export ps4fig7.png, replace
(file ps4fig7.png written in PNG format)

We see that the age distribution of peninsular Malaysia, compared to the stationary distribution, is heavily weighted towards the young ages. The substantial momentum we have calculated results from the large numbers of young women in or about the enter the reproductive ages; just replacing themselves generates large numbers of births for a number of years.