Nuptiality Models
We apply nuptiality models to age at marriage in Colombia using WFS data. The main objective is to show the use of models to extrapolate from incomplete cohort experience.
The Data
We read an extract with age at interview and age at marriage for the cohort 35-39 in the Colombia WFS.
. use http://data.princeton.edu/eco572/datasets/co3539, clear
(COSR02 extract)
. desc
Contains data from http://data.princeton.edu/eco572/datasets/co3539.dta
obs: 579 COSR02 extract
vars: 2 26 Mar 2006 12:53
size: 3,474 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
v010 byte %8.0g Age in completed years
v109 byte %8.0g v109 Age at first union
-------------------------------------------------------------------------------
Sorted by:
We treat women married at their current age as single at the start of the year, thereby avoiding assumptions about partial exposure, and then group the data
. replace v109 = . if v109 >= v010 (74 real changes made, 74 to missing) . gen n=1 . rename v010 age . rename v109 agemar . collapse (sum) n, by (age agemar)
Modeling the Complete Experience
We want to plot proportions ever married and first marriage frequencies for each of the single-year cohorts aged 35 to 39, together with fitted Coale-McNeil and Hernes models.
We start by computing the observed frequencies
. egen N = sum(n), by(age) . gen f = n/N if !missing(agemar) (5 missing values generated) . bysort age (agemar): gen F = sum(f)-f (5 missing values generated)
Next we compute cumulative model schedules. The estimates for the Coale-McNeil model were obtained from the paper by Rodriguez and Trussell (1980), who report a mean of 20.4, a standard deviation of 5.4, and a proportion eventually marrying of 89%. The estimates for the Hernes model were computed using a similar maximum likelihood procedure, and yield an "attractiveness" parameter of 0.65 (at age 15), a decay rate of 0.15, and a proportion eventually marrying of 89%.
The fitted curves were computed using egen-extension commands
available in the course Stata site. (In Stata type
net from http://data.princeton.edu/eco572/stata
and install the nuptfer package.)
. egen CM = pnupt(agemar) if age==39, /// > mean(20.436) s(5.377) p(.885) (89 missing values generated) . egen H = phernes(agemar) if age==39, /// > a(0.6526) r(.1506) p(.8873) (89 missing values generated)
Finally, we difference these to obtain actual marriage frequencies by single years of age
. sort age agemar // to be sure . gen cm = (CM[_n+1] - CM)/(agem[_n+1] - agem) (90 missing values generated) . gen h = (H[_n+1] -H)/(agem[_n+1] - agem) (90 missing values generated)
We are ready to plot our results
. twoway (scatter F agem )(line CM H agem, lp(solid dash)) /// > , title(Proportion Ever Married) xtitle(age) name(F,replace) /// > legend(order(2 "Coale-McNeil" 3 "Hernes") ring(0) pos(5) col(1)) . gen agep = agem+0.5 (5 missing values generated) . twoway (scatter f agep )(line cm h agep, lp(solid dash)) /// > , title(First Marriage Frequencies) xtitle(age) /// > legend(off) name(f,replace) . graph combine F f, xsize(6) ysize(3) . graph export co3539a.png, replace (file co3539a.png written in PNG format)

The general fit of the models to noisy data is excellent. As usual, it is easier to appreciate differences working with densities than with cumulative distributions. The Hernes model leads to a distribution a bit more peaked than the Coale-McNeil.
Predicting 15 Years Earlier
We will now simulate what would have happened if we had interviewed these cohorts 15 years earlier. We assume women who had married by then would have reported the same age at marriage, but the rest would have reported themselves as single. To emphasize how little data we would have, I have redraw the plot below using light gray for data points "in the future".
Rodrguez and Trussel (1980) fitted the Coale-McNeil to the censored cohort and obtained a mean of 20.2, a standard deviation of 5.1 and an estimated 87% eventually marrying. The fact that the estimates obtained when the cohort was 20-24 are so close to the estimates obtained when it was 35-39 is nothing short of remarkable.
I have gone ahead and fitted the Hernes model to the same censored experience and obtained an attractiveness parameter of 0.73 (at age 15), a rate of decay of 0.18 per year, and a proportion eventually marrying of 83%.
Here's the code used for the first panel
. gen observed = agem < age-15 . egen CMc = pnupt(agem) if age==39, mean(20.1528) s(5.0955) p(.8740) (89 missing values generated) . egen Hc = phernes(agem) if age==39, a(0.7321) r(.1875) p(.8296) (89 missing values generated) . twoway (scatter F agem if obs) (scatter F agem if !obs, color(ltblue)) /// > (line CMc Hc agem, lp(solid dash) ) , /// > title(Proportion Ever Married) subtitle(Fitted to Experience at 20-24) // > / > legend(order(3 "Coale-McNeil" 4 "Hernes") ring(0) pos(5) col(1)) /// > xtitle(age) name(Fc,replace)
And here are the calculations for the second:
. gen cmc = (CMc[_n+1]-CMc)/(agem[_n+1] - agem) (90 missing values generated) . gen hc = (Hc[_n+1] - Hc)/(agem[_n+1] - agem) (90 missing values generated) . twoway (scatter f agep if obs) (scatter f agep if !obs, color(ltblue)) /// > (line cmc hc agep, lp(solid dash)) /// > , title(First Marriage Frequencies) subtitle(Fitted to Experience at 20-2 > 4) /// > xtitle(age) legend(off) name(fc,replace) . graph combine Fc fc, xsize(6) ysize(3) . graph export co3539c.png, replace (file co3539c.png written in PNG format)

It seems clear that the Coale-McNeil does a better job predicting in this particular case. The Hernes model appears to adapt too well to the data at the younger ages and as a result doesn't predict as well for the older ages.
Things don't always work this well. As shown in my paper with Trussell, the data for the oldest cohort, 45-49, are a bit more noisy and harder to predict from the cohort's early experience. Trying to predict the future course of nuptiality is a risky business, and all models rely on the basic assumption that the past says something about the future.
