Eco572: Research Methods in Demography

Solutions to Problem Set 3

[1] Nuptiality in the U.S.

We start by reading the data from the website.

. infile age n using ///
>   http://data.princeton.edu/eco572/datasets/sipp01w3741.dat, clear
(31 observations read)

(a) Since these are cohort data and we are only interested in the experience up to age 37, which is complete for women aged 37-41 at interview, we can compute dx directly, just dividing the frequencies by the total number of women. The other life table functions follow directly. The only time we need to make an assumption about the distribution of events in a year of age is in computing Lx, and we assume a uniform distribution.

. quietly summarize n

. gen dx = n/r(sum) if age < 37
(6 missing values generated)

. gen lx = 1

. replace lx = lx[_n-1] - dx[_n-1] if _n > 1
(30 real changes made, 5 to missing)

. gen qx = dx/lx
(6 missing values generated)

. gen Lx = (lx + lx[_n+1])/2
(6 missing values generated)

. format %8.6f dx qx lx Lx

. list age dx qx lx Lx if age <= 37

     +-------------------------------------------------+
     | age         dx         qx         lx         Lx |
     |-------------------------------------------------|
  1. |  12   0.000654   0.000654   1.000000   0.999673 |
  2. |  13   0.000654   0.000655   0.999346   0.999018 |
  3. |  14   0.002618   0.002621   0.998691   0.997382 |
  4. |  15   0.008181   0.008213   0.996073   0.991983 |
  5. |  16   0.025851   0.026168   0.987893   0.974967 |
     |-------------------------------------------------|
  6. |  17   0.040903   0.042517   0.962042   0.941590 |
  7. |  18   0.077552   0.084192   0.921139   0.882363 |
  8. |  19   0.081152   0.096199   0.843586   0.803010 |
  9. |  20   0.070353   0.092275   0.762435   0.727258 |
 10. |  21   0.062500   0.090307   0.692081   0.660831 |
     |-------------------------------------------------|
 11. |  22   0.068390   0.108628   0.629581   0.595386 |
 12. |  23   0.064463   0.114869   0.561191   0.528959 |
 13. |  24   0.050393   0.101449   0.496728   0.471531 |
 14. |  25   0.051374   0.115103   0.446335   0.420648 |
 15. |  26   0.040903   0.103563   0.394961   0.374509 |
     |-------------------------------------------------|
 16. |  27   0.035013   0.098891   0.354058   0.336551 |
 17. |  28   0.031414   0.098462   0.319045   0.303338 |
 18. |  29   0.022906   0.079636   0.287631   0.276178 |
 19. |  30   0.017016   0.064277   0.264725   0.256217 |
 20. |  31   0.022579   0.091149   0.247709   0.236420 |
     |-------------------------------------------------|
 21. |  32   0.018652   0.082849   0.225131   0.215805 |
 22. |  33   0.013416   0.064976   0.206479   0.199771 |
 23. |  34   0.013089   0.067797   0.193063   0.186518 |
 24. |  35   0.012762   0.070909   0.179974   0.173593 |
 25. |  36   0.010144   0.060665   0.167212   0.162140 |
     |-------------------------------------------------|
 26. |  37          .          .   0.157068          . |
     +-------------------------------------------------+

. quietly sum Lx

. di 12 + r(sum)
25.715641

. drop if age > 37
(5 observations deleted)

The average time lived in the single state by age 37.0 is 25.7.

(b) To answer these questions we need l20, l25, and l37. I'll store these in scalars for clarity

. scalar lx20 = lx[9]

. scalar lx25 = lx[14]

. scalar lx37 = lx[26]

. di 1 - lx20
.23756546

. di 1 - lx25
.55366492

. di (lx20-lx25)/lx20
.41459227

. di (lx20-lx25)/(lx20-lx37)
.52216214

(c) We fit a Hernes model following the suggested procedure.

. gen y = log(qx/(1-lx))
(2 missing values generated)

. gen am = (age+0.5)

. gen am15 = am - 15

. reg y am15

      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F(  1,    22) =  196.19
       Model |  26.4196017     1  26.4196017           Prob > F      =  0.0000
    Residual |  2.96252171    22  .134660078           R-squared     =  0.8992
-------------+------------------------------           Adj R-squared =  0.8946
       Total |  29.3821234    23  1.27748363           Root MSE      =  .36696

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        am15 |  -.1515703   .0108211   -14.01   0.000    -.1740119   -.1291288
       _cons |   .2402122    .131607     1.83   0.082     -.032724    .5131485
------------------------------------------------------------------------------

. scalar r = _b[am15]

. gen z = logit(1-lx)
(1 missing value generated)

. gen x = exp(r*(age-15))

. reg z x

      Source |       SS       df       MS              Number of obs =      25
-------------+------------------------------           F(  1,    23) = 5546.13
       Model |  169.889412     1  169.889412           Prob > F      =  0.0000
    Residual |  .704536948    23  .030632041           R-squared     =  0.9959
-------------+------------------------------           Adj R-squared =  0.9957
       Total |  170.593949    24  7.10808121           Root MSE      =  .17502

------------------------------------------------------------------------------
           z |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -7.001133   .0940098   -74.47   0.000    -7.195607   -6.806658
       _cons |   1.828685   .0497742    36.74   0.000     1.725719     1.93165
------------------------------------------------------------------------------

. di invlogit(_b[_cons])
.86160494

We then predict the survival function, difference it to obtain marriage frequencies, and construct the plot.

. predict zfit
(option xb assumed; fitted values)

. gen lfit = 1 - invlogit(zfit)

. format %3.1f lx dx

. scatter lx age || line lfit age, ylabels(0(0.2)1) ///
>   title(Proportion Single) legend(off) name(A,replace)

. gen dfit = lfit - lfit[_n+1]
(1 missing value generated)

. scatter dx am || line dfit am, xtitle(age) ///
>   title(Marriage Frequency) legend(off) name(B,replace)

. graph combine A B, title(Hernes Fit) subtitle(U.S. Women 37-41 in SIPP)

. graph export ps3fig1.png, replace
(file ps3fig1.png written in PNG format)

The fit is not great, suggesting that forecasts based on the model would not work very well.

(c) We fit a Coale-McNeil model. For simplicity we work with marriages by age 37 but you could, if you wanted, adjust the parameters slightly to allow for marriages after age 37.0. This doesn't make a huge difference.

. gen w = dx/(1-lx[26])
(1 missing value generated)

. sum am [aw=w]

    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
          am |      25  .999999968    23.61297   5.077051       12.5       36.5

. egen pcm = pnupt(age), mean(23.613) stdev(5.077) pem(.8429)

. gen lcm = 1-pcm

. gen dcm = lcm - lcm[_n+1]
(1 missing value generated)

. scatter lx age || line lcm age, ylabels(0(0.2)1) ///
>   title(Proportion Single) legend(off) name(A,replace)

. scatter dx am || line dcm am, xtitle(age) ///
>         title(Marriage Frequency) legend(off) name(B,replace)

. graph combine A B, xsize(6) ysize(3) /// 
>         title(Coale-McNeil Fit) subtitle(U.S. Women 37-41 in SIPP)

. graph export ps3fig2.png, replace
(file ps3fig2.png written in PNG format)

The fit of the Coale-McNeil is comparable to that of the Hernes model. Note that we could improve the fit of both models by estimating the parameters using maximum likelihood.

[2] Birth Rates in Dominican Republic

We have data from two surveys. I have stored the commands needed to do the analysis in a do file which follows the handout on rates. (See it here.) Note that I pass the year as a parameter to the do file.

. quietly do do\DrFertRates 75

. poisson // replay last estimates

Poisson regression                                Number of obs   =       2254
                                                  LR chi2(3)      =     146.31
                                                  Prob > chi2     =     0.0000
Log likelihood = -2101.5858                       Pseudo R2       =     0.0336

------------------------------------------------------------------------------
      births |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         dur |  -.0212533   .0048206    -4.41   0.000    -.0307015   -.0118051
       urban |  -.1565696   .0860627    -1.82   0.069    -.3252495    .0121102
    urbanDur |  -.0309201    .008294    -3.73   0.000     -.047176   -.0146641
       _cons |  -2.509833   .0558191   -44.96   0.000    -2.619237    -2.40043
          os |   (offset)
------------------------------------------------------------------------------

. display exp(_b[_cons]),exp(_b[_cons]+_b[urban])
.08128177 .06950177

. display 1-exp(_b[dur]), 1-exp(_b[dur]+_b[urbanDur])
.02102907 .05083573

. */**
> 
> The estimates show similar levels of natural fertility (at duration 0) > in rural an urban areas (.0813 and .0695, or 14% lower in urban), > but subtantially higher control of fertility in urban than rural > areas, with declines of 2.10 and 5.08 per year of marriage, > respectively. > The figure illustrates the duration profile (not required). > To obtain predicted levels one would need to specify age as well. > Now for 1980: > */ . quietly do do\DrFertRates 80 . poisson // replay last estimates Poisson regression Number of obs = 3600 LR chi2(3) = 287.43 Prob > chi2 = 0.0000 Log likelihood = -3078.2992 Pseudo R2 = 0.0446 ------------------------------------------------------------------------------ births | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- dur | -.0374255 .0042582 -8.79 0.000 -.0457714 -.0290795 urban | -.1565444 .0719618 -2.18 0.030 -.2975869 -.0155019 urbanDur | -.0282096 .00734 -3.84 0.000 -.0425958 -.0138235 _cons | -2.537071 .0458957 -55.28 0.000 -2.627025 -2.447117 os | (offset) ------------------------------------------------------------------------------ . display exp(_b[_cons]),exp(_b[_cons]+_b[urban]) .07909777 .06763599 . display 1-exp(_b[dur]), 1-exp(_b[dur]+_b[urbanDur]) .0367338 .06352749

Again we see similar levels of natural fertility (at duration 0) in rural an urban areas (.0791 and .0676, or 14% lower in urban), but subtantially higher control of fertility in urban than rural areas, with declines of 3.67 and 6.35 per year of marriage, respectively. The figure illustrates the duration profile for 1980.

The main change between 1975 and 1980 is increased control of fertility, particularly in rural areas which went from duration slopes of 2.1 to 3.67 percent per year, compared to 5.08 to 6.35 percent per year in urban areas. So the gap has narrowed somewhat.

[3] Birth Intervals in the D.R.

Again I have stored the commands needed to do the analysis in a do file, so I can call it for each survey. (See it here.) I store the quintums, quartiles, and trimean in a matrix. Here are the results for 1975:

. quietly do do\DrBirthInts 75

. mat list q75

q75[3,5]
         Quintum    Trimean         Q1         Q2         Q3
Rural  .88741291  19.783598  12.757429  19.440124  27.496717
 Town  .81248641   21.40552  13.979024  20.189366  31.264324
 City   .7389999  20.049581  13.131694  18.600382  29.865864

We see a monotonic decrease in quantum as we move from women who grew up in rural areas to those who grew up in small towns and cities, with very small differences in tempo. Curiously, it is women from towns who have slightly longer births intervals, with no difference between cities and rural areas. Note, however, that only 74% of women who grew up in cities move on from parity two to three, it's just that those who do don't wait any longer than their rural counterparts.

We now look at the 1980 survey.

. quietly do do\DrBirthInts 80

. mat list q80

q80[3,5]
         Quintum    Trimean         Q1         Q2         Q3
Rural  .84864891  20.051037  13.164762  20.100345  26.838696
 Town  .74025041  21.003418  12.467427  20.180363   31.18552
 City  .54947275  23.025648  13.383775  21.120355  36.478106

We see much larger differences in 1980. The proportion who are moving on to have a third child is now only 55% for women who grew up in cities, and for those who grew up in small towns it is now 74%, the value we saw for city folk five years earlier. We also see the emergence of differences in tempo, with women who grew up in small towns and particularly in cities showing somewhat longer birth intervals that women of rural origin.

. mat q7580 = q75[1..3,1],q80[1..3,1],q75[1..3,2],q80[1..3,2]

. mat colnames q7580 = Q75 Q80 T75 T80

. mat list q7580

q7580[3,4]
             Q75        Q80        T75        T80
Rural  .88741291  .84864891  19.783598  20.051037
 Town  .81248641  .74025041   21.40552  21.003418
 City   .7389999  .54947275  20.049581  23.025648

When we compare results over time we see very large changes in quantum in towns and particularly in cities, and a modest increase in birth interval length that is largely confined to cities.

[4] Proximate Determinants

We use the reports for the DHS surveys in the Philippiness in 1993, 1998 and 2003. I will use Mata for my calculations, but they can all be done in Excel.

Cm. This is usually estimated as the ratio of the TFR to the TMFR, but the DHS doesn't publish marital fertility rates by age. We can get a rough estimate of these rates, however, by inflating the age-specific fertility rates by the proportion married at each age, assuming that all births occur within marriage. (An alternative would be to use proportion of time spent within marriage.)

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: asfr = ((50\190\217\181\120\51\8) ,     // 1993 Tab 3.1 p 26
>           (46\177\210\155\111\40\7) ,   // 1998 Tab 3.1 p 32
>           (53\178\191\142\ 95\43\5) )   // 2003 Tab 4.1 p 41

: tfr = J(1,7,.005) * asfr

: tfr
           1       2       3
    +-------------------------+
  1 |  4.085    3.73   3.535  |
    +-------------------------+

: m = (
>         (.047\.384\.663\.779\.816\.81\.773)     // 1993 Tab 5.1 p 59    
>   :+  (.027\.067\.063\.058\.059\.054\.056) ,
>         (.048\.345\.643\.769\.797\.775\.783)    // 1998 Tab 5.1 p 75
>   :+  (.036\.076\.075\.072\.073\.064\.041) ,
>         (.039\.369\.664\.771\.800\.790\.787)    // 2003 Tab 6.1 p 79
>   :+  (.051\.127\.097\.080\.072\.068\.066) )

: m
          1      2      3
    +----------------------+
  1 |  .074   .084    .09  |
  2 |  .451   .421   .496  |
  3 |  .726   .718   .761  |
  4 |  .837   .841   .851  |
  5 |  .875    .87   .872  |
  6 |  .864   .839   .858  |
  7 |  .829   .824   .853  |
    +----------------------+

: round(asfr :/ m)
         1     2     3
    +-------------------+
  1 |  676   548   589  |
  2 |  421   420   359  |
  3 |  299   292   251  |
  4 |  216   184   167  |
  5 |  137   128   109  |
  6 |   59    48    50  |
  7 |   10     8     6  |
    +-------------------+

: tmfr = J(1,7,.005) * (asfr :/ m)

: tmfr
                 1             2             3
    +-------------------------------------------+
  1 |  9.089645504   8.142936331   7.652655428  |
    +-------------------------------------------+

: end
---------------------------------------------------------------------------------------------------------------------

Unfortunately the age-specific marital fertility rates for the two youngest age groups, where a large proportion of births may in fact, be born outside marriage, are just not credible. Taking these rates at face value would overestimate the effect of marriage and would require implausibly high estimates of natural fertility to match the observed TFRs. Clearly some adjustment is necessary.

Fortunately the proportions married have not changed much over time, if anything they may have increased slightly, so the choice of weights is not likely to obscure trends, even if the level of the marriage index may be off. While fairly sophisticated adjustments are possible, I'll simply set the first two age groups so their weight is the same as 25-29. I also show the unweighted average proportions married for comparison.

. mata
------------------------------------------------- mata (type end to exit) -------------------------------------------
: mfr = asfr :/ m

: mfr[1,] = mfr[3,]

: mfr[2,] = mfr[3,]

: cm = tfr :/ (J(1,7,.005) * mfr)

: cm
                 1             2             3
    +-------------------------------------------+
  1 |  .6195197068   .5989567799   .6517676938  |
    +-------------------------------------------+

: J(1,7,1/7) * m
                 1             2             3
    +-------------------------------------------+
  1 |  .6651428571   .6567142857          .683  |
    +-------------------------------------------+

: end
---------------------------------------------------------------------------------------------------------------------

Cc. We estimate a measure of contraceptive use u as the average of the proportions using in each age group, and a measure of efficiency e as the average effectiveness of the methods used by married women. Below I store proportions using contraception among married women in each age group, ready to average. I also store the method mix and efficiencies borrowed from Bongaarts's paper

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: cuse = ((.172\.319\.391\.458\.482\.431\.272) , // 1993 Tab 4.4 p 43
>      (.183\.374\.486\.521\.541\.486\.343) ,    // 1998 Tab 4.4 p 54
>      (.256\.427\.513\.534\.566\.499\.377) )    // 2003 Tab 5.4 p 59

: u = J(1,7,1/7) * cuse

: u
                 1             2             3
    +-------------------------------------------+
  1 |  .3607142857   .4191428571   .4531428571  |
    +-------------------------------------------+

: //  methods are pill iud inj condon fster mster nat with other
: eff = (.98\.96\.98\.91\1\1\.82\.8\.9) // From Bongaarts p 112

: mix = ((.085\.030\.001\.010\.119\.040\.073\.074\.040),  // 1993 Tab 4.4 p 43
>          (.099\.037\.024\.016\.103\.001\.089\.089\.008),  // 1998 Tab 4.4 p 54
>          (.132\.041\.031\.019\.105\.001\.068\.082\.009))  // 2003 Tab 5.4 p 59

: pmix = mix :/ (J(1,9,1)*mix)    // divide by sums

: e = J(1,9,1) * (pmix :* eff)  // average effectivenes

: e
                 1             2             3
    +-------------------------------------------+
  1 |  .9242372881   .9141630901   .9259221311  |
    +-------------------------------------------+

: cc = 1 :- 1.18 * u :* e

: cc
                 1             2             3
    +-------------------------------------------+
  1 |      .606605   .5478653832      .5049015  |
    +-------------------------------------------+

: end
---------------------------------------------------------------------------------------------------------------------

We see that contraceptive use has increased markedly while efficiency has essentially remained constant.

Ci. The DHS has direct estimates of the length of the post-partum non-susceptible period, so we use those

. mata:
------------------------------------------------- mata (type end to exit) -------------------------------------------
: i =  (8.8,  // 1993 Tab 5.8 p 66
>         9.0,  // 1998 Tab 5.7 p 83
>         10.0) // 2003 tAB 6.8 P 86

: ci = 20 :/ (18.5 :+ i)

: ci
                 1             2             3
    +-------------------------------------------+
  1 |  .7326007326   .7272727273    .701754386  |
    +-------------------------------------------+

: end
---------------------------------------------------------------------------------------------------------------------

We see that post-partum insusceptibility has increased slightly over time.

As a check on the model we can divide the TFR by the product of the indices. The result is an estimate of total natural fertility assuming no abortion, and should be in the neighborhood of 15.3, or less if there is abortion.

. mata tfr :/ (cm :* cc :* ci)
                 1             2             3
    +-------------------------------------------+
  1 |  14.83759801   15.62939561   15.30751845  |
    +-------------------------------------------+

The values are not unreasonable, except perhaps for 1993, unless there was some abortion which was replaced by contraceptive use. I would not put much credence on these absolute levels anyway, considering our difficulty estimating total marital fertility.

It should be clear, however, that the fertility decline has been driven almost entirely by increased contraceptive use, as the only index that shows substantial change over time is the index of contraception

. mata cm\cc\ci
                 1             2             3
    +-------------------------------------------+
  1 |  .6195197068   .5989567799   .6517676938  |
  2 |      .606605   .5478653832      .5049015  |
  3 |  .7326007326   .7272727273    .701754386  |
    +-------------------------------------------+

The marriage indices show, if anything, more exposure in 2003 than ten years earlier, while post-partum insusceptibility shows only a modest three-point change.