![]() |
Eco572: Research Methods in Demography | ![]() |
![]() | ||
(a) Calculate age-specific period mortality rates for 1992 for blacks and whites
. insheet using http://data.princeton.edu/eco572/datasets/us1992a.dat
(5 vars, 19 obs)
. gen mw = deathsw/expow
. gen mb = deathsb/expob
. list age mw mb
+--------------------------+
| age mw mb |
|--------------------------|
1. | 0 .007809 .019579 |
2. | 1 .000426 .000776 |
3. | 5 .000213 .000375 |
4. | 10 .000282 .000449 |
5. | 15 .00106 .002184 |
|--------------------------|
6. | 20 .001354 .00321 |
7. | 25 .001533 .003617 |
8. | 30 .001958 .004644 |
9. | 35 .002455 .006096 |
10. | 40 .003122 .008032 |
|--------------------------|
11. | 45 .004325 .010657 |
12. | 50 .006634 .014193 |
13. | 55 .010715 .021036 |
14. | 60 .017297 .029243 |
15. | 65 .026885 .040291 |
|--------------------------|
16. | 70 .040124 .0572491 |
17. | 75 .061488 .07502 |
18. | 80 .097005 .1096976 |
19. | 85 .179562 .1671715 |
+--------------------------+
(b) Plot the rates against age on a log scale. Exclude the last group, which is actually 85+.
. gen agem = (age+age[_n+1])/2 // excludes 85+ (1 missing value generated) . line mw mb agem, title(U.S. Males 1992) /// > yscale(log) ytitle(asmr) xtitle(age) /// > legend(order(1 "White" 2 "Black") ring(0) pos(5) cols(1)) . graph export ps2fig1.png, replace (file ps2fig1.png written in PNG format)

(c) Fit a Gompertz model over an appropriate age range.
Seems clear that black mortality rates are Gompertz over a wider range than those for whites, which seeem to show a change of slope around age 50. To avoid biases I decided to start at 40-44. To simplify interpretation of the intercept I subtract 40 from age. (You could plot the fits but this was not required.)
. gen lmw=log(mw)
. gen lmb=log(mb)
. gen lb40 = agem-40
(1 missing value generated)
. reg lmw lb40 if age >= 40
Source | SS df MS Number of obs = 9
-------------+------------------------------ F( 1, 7) = 6841.82
Model | 11.4650463 1 11.4650463 Prob > F = 0.0000
Residual | .011730118 7 .001675731 R-squared = 0.9990
-------------+------------------------------ Adj R-squared = 0.9988
Total | 11.4767764 8 1.43459705 Root MSE = .04094
------------------------------------------------------------------------------
lmw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lb40 | .0874263 .001057 82.72 0.000 .084927 .0899256
_cons | -6.053246 .0274181 -220.78 0.000 -6.118079 -5.988412
------------------------------------------------------------------------------
. di exp(_b[_cons]) ", " exp(_b[lb40])
.00235022, 1.0913619
. reg lmb lb40 if age >= 40
Source | SS df MS Number of obs = 9
-------------+------------------------------ F( 1, 7) = 6962.81
Model | 6.50170578 1 6.50170578 Prob > F = 0.0000
Residual | .006536431 7 .000933776 R-squared = 0.9990
-------------+------------------------------ Adj R-squared = 0.9989
Total | 6.50824221 8 .813530276 Root MSE = .03056
------------------------------------------------------------------------------
lmb | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lb40 | .0658367 .000789 83.44 0.000 .063971 .0677024
_cons | -5.02427 .0204671 -245.48 0.000 -5.072667 -4.975873
------------------------------------------------------------------------------
. di exp(_b[_cons]) ", " exp(_b[lb40])
.00657639, 1.0680523
(d) Interpret the resulting parameters in terms of the underlying hazard rate and the rate of aging.
The hazard for blacks is almost triple that of whites at age 40 (.00658 versus .00235, from exponentiating the intercepts), but blacks "age" more slowly as the hazard increases only 6.8% per year, compared to 9.1% for whites (from exponentiating the slopes).
(e) Could one include the 85+ group? What assumption(s) might be needed?
Yes, one could, but assumptions are required. We attributed other hazards to the midpoint of the age group. The question is what to do with 85+. (1) Use 87.5, as if the group was 85-90, or maybe 90. (2) Assume constant hazard after 85. Then expectation of life at 85 is the reciprocal of the death rate, and adding 85 gives us midpoints of 90.6 for whites and 91 for blacks. (3) Assume Gompertz hazards. Use the model fitted to ages below 85 to see what age would correspond to the observed mortality rate. This gives 102 for whites and 95.3 for blacks. Given the uncertainty, the choice of omitting 85+ seems prudent. (Also, there is some evidence that after age 80 the hazard increases more slowly than in a Gompertz model, but that's a different issue.)
Consider the life table of South African females you estimated in problem set 1. (I reestimate the table here using the latest data, which include an extra column for AIDS deaths to be used in [5].)
. drop _all
. set type double
. infile age expo deaths aids using ///
> http://data.princeton.edu/eco572/datasets/safeda.dat
(18 observations read)
. gen m = deaths/expo
. // interval widths
. gen n = age[_n+1]-age // interval width
(1 missing value generated)
. // time lived by deaths
. gen a = 2.5
. replace a = .200 in 1 // or .16 if you prefer
(1 real change made)
. replace a = 1.442 in 2 // or 1.5
(1 real change made)
. replace a = 1/m in -1
(1 real change made)
. // life table
. gen q = n*m/(1+(n-a)*m)
(1 missing value generated)
. replace q = 1 in -1
(1 real change made)
. gen lx = 100000
. replace lx = lx[_n-1] * (1-q[_n-1]) in 2/-1
(17 real changes made)
. gen d = lx * q
. gen Lx = lx*n - d*a
(1 missing value generated)
. replace L = lx/m in -1
(1 real change made)
. quietly summarize L
. gen T = r(sum) - sum(L) + L
. gen e = T/lx
. format %9.0fc lx d L T
. format %6.3f a e
. list age a lx d L T e
+---------------------------------------------------------------+
| age a lx d Lx T e |
|---------------------------------------------------------------|
1. | 0 0.200 100,000 5,051 98,990 5,686,750 56.867 |
2. | 1 1.442 94,949 5,246 372,233 5,587,760 58.850 |
3. | 5 2.500 89,703 674 446,831 5,215,527 58.142 |
4. | 10 2.500 89,029 327 444,329 4,768,696 53.563 |
5. | 15 2.500 88,703 1,036 440,922 4,324,366 48.751 |
|---------------------------------------------------------------|
6. | 20 2.500 87,666 2,753 431,449 3,883,445 44.298 |
7. | 25 2.500 84,913 4,579 413,119 3,451,996 40.653 |
8. | 30 2.500 80,334 5,421 388,119 3,038,877 37.828 |
9. | 35 2.500 74,913 5,413 361,036 2,650,758 35.384 |
10. | 40 2.500 69,501 4,556 336,115 2,289,722 32.945 |
|---------------------------------------------------------------|
11. | 45 2.500 64,945 3,389 316,253 1,953,607 30.081 |
12. | 50 2.500 61,556 3,022 300,226 1,637,354 26.599 |
13. | 55 2.500 58,534 3,349 284,299 1,337,128 22.844 |
14. | 60 2.500 55,185 4,342 265,072 1,052,829 19.078 |
15. | 65 2.500 50,844 5,989 239,247 787,757 15.494 |
|---------------------------------------------------------------|
16. | 70 2.500 44,855 9,109 201,502 548,510 12.228 |
17. | 75 2.500 35,746 10,690 152,004 347,008 9.708 |
18. | 80 7.783 25,056 25,056 195,003 195,003 7.783 |
+---------------------------------------------------------------+
. save safeda, replace // for use in question 5
file safeda.dta saved
(a) Compute Brass's logit transformation and plot both the data and a linear fit against the general standard
. sort age
. merge age using http://data.princeton.edu/eco572/datasets/brassrlm5
. list age _merge if _merge != 3
+--------------+
| age _merge |
|--------------|
1. | 0 1 |
19. | 85 2 |
20. | 90 2 |
21. | 95 2 |
22. | 100 2 |
+--------------+
. drop if _merge != 3 // we don't want age 0 or ages above 80
(5 observations deleted)
. gen yobs = 0.5*log( (100000-lx)/lx )
. reg yobs ys
Source | SS df MS Number of obs = 17
-------------+------------------------------ F( 1, 15) = 204.49
Model | 4.61296171 1 4.61296171 Prob > F = 0.0000
Residual | .338378129 15 .022558542 R-squared = 0.9317
-------------+------------------------------ Adj R-squared = 0.9271
Total | 4.95133984 16 .30945874 Root MSE = .1502
------------------------------------------------------------------------------
yobs | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ys | .9615473 .0672413 14.30 0.000 .8182258 1.104869
_cons | -.4211604 .0366016 -11.51 0.000 -.4991748 -.3431459
------------------------------------------------------------------------------
. predict yfit
(option xb assumed; fitted values)
. twoway scatter yobs ys || line yfit ys, xtitle(Standard) ///
> title(Brass Logit of Survival) legend(off) name(A, replace)
I will also compute and plot the qx's for comparison with the Coale-Demeny fit below. This was not required.
. gen qobs = d/lx
. gen lfit = 100000*exp(-2*yfit)/(1+exp(-2*yfit))
. gen qfit = (lfit-lfit[_n+1])/lfit
(1 missing value generated)
. gen agem=(age+age[_n+1])/2
(1 missing value generated)
. scatter qobs agem || line qfit agem, yscale(log) xtitle(age) ///
> title(Conditional Probability of Dying) legend(off) name(B, replace)
. graph combine A B, xsize(6) ysize(3) ///
> title("Relational Logit Fit") subtitle(South African Females)
. graph export ps2fig2.png, replace
(file ps2fig2.png written in PNG format)

The model doesn't fit very well, it overestimates mortality at low and high ages and underestimates it in the middle.
(b) Can you find a table in the Coale-Demeny family that does a better job? (Your search doesn't have to be exhaustive, just pick one or two indicators to match in each family.)
The idea here is to pick one table from each family and see which one does best. Because these are one-parameter families, all we do is match one characteristic. The best one to pick is life expectancy at birth because it summarizes all age-specific rates. (It's also how the Coale-Demeny tables are indexed.) For a female life expectancy of 56.9 you should be looking at levels 15 and 16.
My own search suggested that the West family matched the rates a bit better (well, less badly) than the others, although North was also a possibility. The best choice among the published levels was 16, although a table with a life expectancy a bit above that would be marginally better. Here are the values for level 16
. input cdqx
cdqx
1. .082306
2. .041572
3. .013368
4. .010379
5. .014964
6. .019634
7. .022534
8. .025731
9. .029521
10. .034348
11. .041981
12. .057336
13. .078602
14. .117346
15. .172555
16. .261811
17. .384501
The right-hand-side panel of the figure below shows the observed and fitted nqx's. I also computed the fitted lx and took Brass logits so we could compare the fit on that scale as well.
. scatter qobs agem || line cdqx agem , yscale(log) xtitle(age) ///
> title(Conditional Probability of Dying) legend(off) name(B, replace)
. gen cdlx = 1
. qui replace cdlx = cdlx[_n-1]*(1-cdqx[_n-1]) if _n>1
. gen cdy = 0.5*log((1-cdlx)/cdlx)
(1 missing value generated)
. scatter yobs cdy || line cdy cdy, xtitle(Logit of Model) ///
> title(Brass Logit of Survival) legend(off) name(A, replace)
. graph combine A B, xsize(6) ysize(3) ///
> title("Coale-Demeny Fit") subtitle(South African Females)
. graph export ps2fig3.png, replace
(file ps2fig3.png written in PNG format)

We see once again that the model life table underestimates mortality in the middle of the range. Obviously the South African pattern doesn't correspond to the structures represented in this model because of its excess middle age mortality. The reason will become apparent in part [5].
Refer to the Pebley-Stupp data in the problem set. I will enter the age groups and hazards
. drop _all
. input age h
age h
1. 0 .0457006
2. 1 .0033105
3. 6 .0040912
4. 12 .0024781
5. 24 .0004648
6. end
(a) What's the probability of infant death (by 12.0 months) for the reference group?
. gen n = age[_n+1]-age // widths
(1 missing value generated)
. gen H = sum(n*h) // applies to END of each age group
. gen S = exp(-H) // ditto
. gen agep = age[_n+1] // to avoid confusion
(1 missing value generated)
. list age agep S
+------------------------+
| age agep S |
|------------------------|
1. | 0 1 .95532794 |
2. | 1 6 .93964503 |
3. | 6 12 .91686017 |
4. | 12 24 .88999672 |
5. | 24 . .88999672 |
+------------------------+
. di 1 - S[3]
.08313983
The probability of infant death is 1 - 0.91686 = .08314 or 8.31%. (This is usually expressed as 83 per thousand.)
(b) Describe the effect of age of mother on child survival
There is a U-shaped relationship. Children whose mothers are under 20 have 20% higher risk, and those those mothers are 40 or older have 53% higher risk, than children whose mothers are 20-39. This applies at every age in the first couple of years of life.
(c) What's the probability of infant death for a 6th child of a 44 year old mother?
We introduce both risk factors (which effectively doubles the hazard, as 1.5327 times 1.3094 = 2.007) and note that multiplying the hazard by a proportionality factor is equivalent to raising the survival function to that factor.
. di 1 - S[3]^( 1.532668 * 1.309283) .15985588
The probability of infant death for high-parity births to mothers age 40 or older is 0.16 (or 160 per thousand), almost double that of the reference group of lower parity births to younger mothers.
Consider two groups of individuals who face a constant hazard from age 20 onwards but are heterogenous, with gamma distributed frailty that averages 0.02 in group one and 0.03 in group two at age 20, and has a variance of one in both groups. Note that we only work from age 20 onwards.
(a) Group two has 50% higher risk at age 20. What's the risk ratio at age 40? At age 60? At age 80?
I'll create observations for ages 20, 40, 60, and 80. The population risk depends on the expected frailty of survivors as shown in Equation 3 in the handout:
. clear
. set obs 4
obs was 0, now 4
. gen age = 20*_n
. gen h1 = .02/(1+0.02*(age-20))
. gen h2 = .03/(1+0.03*(age-20))
. gen hratio = h2/h1
. list age h1 h2 hratio
+-----------------------------------------+
| age h1 h2 hratio |
|-----------------------------------------|
1. | 20 .02 .03 1.5 |
2. | 40 .01428571 .01875 1.3125 |
3. | 60 .01111111 .01363636 1.2272727 |
4. | 80 .00909091 .01071429 1.1785714 |
+-----------------------------------------+
The risk start 50% higher but by age 80 it's only 18% higher.
(b) Suppose the high risk group represented half the total population at age 20. What fraction of the total would it be at age 40? 60? 80?
The population survival is given on Equation 2 in the handout
. gen S1 = 1/(1+0.02*(age-20))
. gen S2 = 1/(1+0.03*(age-20))
. gen prop2 = S2/(S1+S2)
. list age S1 S2 prop2
+-----------------------------------------+
| age S1 S2 prop2 |
|-----------------------------------------|
1. | 20 1 1 .5 |
2. | 40 .71428571 .625 .46666667 |
3. | 60 .55555556 .45454545 .45 |
4. | 80 .45454545 .35714286 .44 |
+-----------------------------------------+
The high risk group starts at 50% but by age 80 it represents only 44% of the total, as the frail are selected out more quickly in higher risk groups.
Challenge: Can you answer these questions under the more realistic assumption that the baseline hazard is Gompertz from age 20 onwards, with a log hazard of -7 at age 20 and an aging rate of 0.08 for group one?
To answer this question we need to apply Equations 2 and 3 again, but with a Gompertz hazard. This can be done analytically or numerically.
We now reuse the file we saved in question 2 above.
. use safeda, clear
(a) Construct a multiple decrement life table distinguishing deaths due to AIDS and to other causes.
. gen di = d * aids/deaths
. gen qi = di / lx
. format %9.0fc di
. format %6.4f q qi
. list age lx d di q qi
+--------------------------------------------------+
| age lx d di q qi |
|--------------------------------------------------|
1. | 0 100,000 5,051 1,271 0.0505 0.0127 |
2. | 1 94,949 5,246 2,788 0.0553 0.0294 |
3. | 5 89,703 674 0 0.0075 0.0000 |
4. | 10 89,029 327 116 0.0037 0.0013 |
5. | 15 88,703 1,036 754 0.0117 0.0085 |
|--------------------------------------------------|
6. | 20 87,666 2,753 2,347 0.0314 0.0268 |
7. | 25 84,913 4,579 4,007 0.0539 0.0472 |
8. | 30 80,334 5,421 4,727 0.0675 0.0588 |
9. | 35 74,913 5,413 4,509 0.0723 0.0602 |
10. | 40 69,501 4,556 3,358 0.0656 0.0483 |
|--------------------------------------------------|
11. | 45 64,945 3,389 1,790 0.0522 0.0276 |
12. | 50 61,556 3,022 862 0.0491 0.0140 |
13. | 55 58,534 3,349 438 0.0572 0.0075 |
14. | 60 55,185 4,342 220 0.0787 0.0040 |
15. | 65 50,844 5,989 98 0.1178 0.0019 |
|--------------------------------------------------|
16. | 70 44,855 9,109 33 0.2031 0.0007 |
17. | 75 35,746 10,690 5 0.2991 0.0001 |
18. | 80 25,056 25,056 0 1.0000 0.0000 |
+--------------------------------------------------+
I list the conditional and unconditional probabilities of dying of AIDS at each age.
(b) What's the probability that a woman who dies before age 50 dies of AIDS?
. quietly summarize d if age < 50 . scalar db50 = r(sum) . quietly summarize di if age < 50 . di r(sum)/db50 .66764641
So two-thirds of female deaths before age 50 are due to AIDS.
(c) Construct the cause-deleted table in the absence of AIDS.
There are different ways of doing this, depending on how you handle nax. You could use the same factors as for the overall lifetable, assume constant risk in each interval, or use the more elaborate procedure described in the text. I'll use the first approach for consistency with the table in Q. 2.
. gen Rd=(deaths-aids)/deaths
. gen pd = (1-q)^Rd // Chiang's method
. gen ld = 100000
. replace ld = ld[_n-1] * pd[_n-1] in 2/-1
(17 real changes made)
. gen dd = ld*(1-pd)
. replace dd = ld in -1
(0 real changes made)
. gen Ld = ld * n - dd * a
(1 missing value generated)
. replace Ld = ld/(m*Rd) in -1
(1 real change made)
. quietly sum Ld
. gen Td = r(sum) - sum(Ld) + Ld
. gen ed = Td/ld
. format %9.0fc ld dd Ld Td
. format %6.3f ed
. list age a ld dd Ld Td ed
+---------------------------------------------------------------+
| age a ld dd Ld Td ed |
|---------------------------------------------------------------|
1. | 0 0.200 100,000 3,805 99,239 6,905,118 69.051 |
2. | 1 1.442 96,195 2,528 381,136 6,805,879 70.751 |
3. | 5 2.500 93,668 704 466,578 6,424,743 68.591 |
4. | 10 2.500 92,964 221 464,267 5,958,165 64.091 |
5. | 15 2.500 92,743 297 462,974 5,493,898 59.238 |
|---------------------------------------------------------------|
6. | 20 2.500 92,447 434 461,148 5,030,924 54.420 |
7. | 25 2.500 92,013 635 458,477 4,569,776 49.665 |
8. | 30 2.500 91,378 813 454,857 4,111,299 44.992 |
9. | 35 2.500 90,565 1,126 450,009 3,656,442 40.374 |
10. | 40 2.500 89,439 1,581 443,242 3,206,433 35.851 |
|---------------------------------------------------------------|
11. | 45 2.500 87,858 2,194 433,807 2,763,190 31.451 |
12. | 50 2.500 85,665 3,028 420,753 2,329,384 27.192 |
13. | 55 2.500 82,637 4,126 402,869 1,908,631 23.097 |
14. | 60 2.500 78,511 5,875 377,866 1,505,762 19.179 |
15. | 65 2.500 72,635 8,424 342,118 1,127,896 15.528 |
|---------------------------------------------------------------|
16. | 70 2.500 64,212 12,999 288,562 785,778 12.237 |
17. | 75 2.500 51,213 15,310 217,791 497,216 9.709 |
18. | 80 7.783 35,903 35,903 279,426 279,426 7.783 |
+---------------------------------------------------------------+
(d) How many years of life expectancy are lost to AIDS?
. di ed[1]-e[1] 12.183684
Life expectancy would increase from 56.8 to 69.1, a gain of 12.2 years (a 21.4% increase).
(e) What assumption underlies the calculations in c and d? How do you think the true impact of AIDS on life expectancy differs from your estimate in d?
The key assumption is that the underlying risks are independent. If women who die of AIDS are also at high risk of dying due to other causes, as seems likely, the gain in life expectancy that would obtain from eliminating AIDS would be less than 12 years.
Copyright © 2006, Germán Rodríguez, Office of Population Research, Princeton University