Interpolation
Splines can also be used for interpolation. Here we reproduce the results in the article by Mc Neil, Trussell and Turner listed in the readings.
The Data
The data represent cumulative fertility at ages 20(5)50, which we can just type into Stata.
. clear
. input age F
age F
1. 15 0
2. 20 .080
3. 25 .593
4. 30 1.297
5. 35 1.840
6. 40 2.171
7. 45 2.296
8. 50 2.306
9. end
Polynomial Interpolation
With 8 data points we can get an exact fit using a 7-th degree polynomial. Let us reproduce Figure 1 in the article, showing that polynomials don't work very well in this case. We need age^2 to age^7:
. forvalues p=2/7 {
2. gen age`p' = age^`p'
3. }
. quietly regress F age*
Because the fit is exact the residual sum of squares is 0 and the standard errors are undefined, so we wont print the results.
To do the interpolation we create a new data set with age
in single years and just predict. How that
for easy? We'll go from 13 to 51 to the extremes.
. drop _all
. set obs 39
obs was 0, now 39
. gen age = 12 + _n
. forvalues p=2/7 {
2. gen age`p' = age^`p'
3. }
. predict Fit
(option xb assumed; fitted values)
. line Fit age, xlabel(15(5)50) name(Fp)
Now that we have cumulative fertility at ages 13 to 51 we can compute differences to obtain age-specific fertility rates centered at the midpoints and plot them
. gen fit = Fit - Fit[_n-1] (1 missing value generated) . gen agem = (age + age[_n-1])/2 (1 missing value generated) . line fit agem, xtitle(age) xlabel(15(5)50) name(fp) . graph combine Fp fp, xsize(6) ysize(3) title(Polynomial Fit) . graph export iasfrp.png, replace (file iasfrp.png written in PNG format)

You can see that the polynomial is not very well behaved at the extremes. This is not unusual with polynomials.
(An alternative way to do the plots is to use Stata's
function plot type, which results in smoother
curves suitable for publication. The method used here has
the advantage that you get the single-year rates.)
Restricted Splines
McNeil et al. use a spline of degree 5 constrained so that the spline and its first two derivatives are 0 at the extreme ages 15 and 50. This means that fertility itself is zero and not changing at the extreme ages, and ensures good behavior in the tails.
The restricted spline uses 12 parameters: 6 for a fifth-degree polynomial, plus 6 corresponding to the knots at ages 20(5)45, but imposes 6 restrictions, 3 each at ages 0 and 50, two of which ensure that we reproduce the extreme, so the effective number of parameters is 8, just as the polynomial.
We enter ages 20(5)45 as the internal values, and then create three entries each for ages 15 and 50 to accomodate the six constraints.
. drop _all
. input age F
age F
1. 20 .080
2. 25 .593
3. 30 1.297
4. 35 1.840
5. 40 2.171
6. 45 2.296
7. 15 0
8. 15 0
9. 15 0
10. 50 2.306
11. 50 0
12. 50 0
13. end
The article shows how to set up a system of linear equations to compute the coefficients (see page 247). We could do the matrix calculations in Mata, but it is probably easier to set things up as a regression. To check our results against the original paper we will rescale age so we work with 0(1)7 rather than 15(5)50, so we get exactly the same coefficients.
We now need to create terms representing powers of age as well as the knots. We will use local macros to identify rows 8 and 11, which will have the first derivative constraints, and 9 and 12, corresponding to the second derivative contraints:
. local isd1 _n==8 | _n==11
. local isd2 _n==9 | _n==12
. gen one = 1
. replace one = 0 if `isd1' | `isd2'
(4 real changes made)
. // powers: a, a^2, ..., a^5
. gen a = (age-15)/5
. forvalues p=2/5 {
2. gen a`p' = a^`p'
3. qui replace a`p' = `p' * a^(`p'-1) if `isd1'
4. qui replace a`p' = (`p'-1)*`p'*a^(`p'-2) if `isd2'
5. }
. // knots: (a-1)^5+, ..., (a-6)^5+
. forvalues i=1/6 {
2. gen knot`i' = cond(a > `i', (a-`i')^5, 0)
3. qui replace knot`i' = cond(a > `i', 5*(a-`i')^4,0) if `isd1'
4. qui replace knot`i' = cond(a > `i', 20*(a-`i')^3,0) if `isd2'
5. }
. // derivatives of a:
. replace a = 1 if `isd1'
(2 real changes made)
. replace a = 0 if `isd2'
(1 real change made)
We can list the data to make sure everything looks OK
. list one age a*, sep(6)
+-----------------------------------------------+
| one age age a a2 a3 a4 a5 |
|-----------------------------------------------|
1. | 1 20 20 1 1 1 1 1 |
2. | 1 25 25 2 4 8 16 32 |
3. | 1 30 30 3 9 27 81 243 |
4. | 1 35 35 4 16 64 256 1024 |
5. | 1 40 40 5 25 125 625 3125 |
6. | 1 45 45 6 36 216 1296 7776 |
|-----------------------------------------------|
7. | 1 15 15 0 0 0 0 0 |
8. | 0 15 15 1 0 0 0 0 |
9. | 0 15 15 0 2 0 0 0 |
10. | 1 50 50 7 49 343 2401 16807 |
11. | 0 50 50 1 14 147 1372 12005 |
12. | 0 50 50 0 2 42 588 6860 |
+-----------------------------------------------+
. list age knot*, sep(6)
+-----------------------------------------------------+
| age knot1 knot2 knot3 knot4 knot5 knot6 |
|-----------------------------------------------------|
1. | 20 0 0 0 0 0 0 |
2. | 25 1 0 0 0 0 0 |
3. | 30 32 1 0 0 0 0 |
4. | 35 243 32 1 0 0 0 |
5. | 40 1024 243 32 1 0 0 |
6. | 45 3125 1024 243 32 1 0 |
|-----------------------------------------------------|
7. | 15 0 0 0 0 0 0 |
8. | 15 0 0 0 0 0 0 |
9. | 15 0 0 0 0 0 0 |
10. | 50 7776 3125 1024 243 32 1 |
11. | 50 6480 3125 1280 405 80 5 |
12. | 50 4320 2500 1280 540 160 20 |
+-----------------------------------------------------+
We are now ready to run a regression. The fit is perfect so the standard errors are undefined. We are interested in the coefficients, which are exactly the same as in the paper (see page 252):
. quietly regress F one a a2-a5 knot*, noconstant
. mat list e(b)
e(b)[1,12]
one a a2 a3 a4 a5 knot1 knot2 knot3
y1 1.853e-10 2.042e-10 4.158e-11 .02094163 .09502961 -.03597124 .05607291 -.01914399 -.00640362
knot4 knot5 knot6
y1 .00816153 -.00289059 -.00586751
Now that we have the coefficients of our spline, we can predict for single years of age, just like we did before. So we create a new dataset and set age and scaled age. In order to predict we need to recreate the power series and the knots, but we don't need the derivatives.
. drop _all
. set obs 39
obs was 0, now 39
. gen age = 13 + _n
. gen a = (age-15)/5
. gen one = 1
. // powers: a, a^2, ..., a^5
. forvalues p=2/5 {
2. gen a`p' = a^`p'
3. }
. // knots: (a-1)^5+, ..., (a-6)^5+
. forvalues i=1/6 {
2. gen knot`i' = cond(a > `i', (a-`i')^5, 0)
3. }
. predict Fit
(option xb assumed; fitted values)
. line Fit age, xlabel(15(5)50) ytitle(F) name(Fs)
Now we difference, get the age midpoints in the original scale, and plot
. gen fit = Fit-Fit[_n-1] (1 missing value generated) . gen agem = (age + age[_n-1])/2 (1 missing value generated) . line (fit agem), xlabel(15(5)50) xtitle(age) ytitle(f) name(fs) . graph combine Fs fs, xsize(6) ysize(3) title(Spline Interpolation) . graph export iasfrs.png, replace (file iasfrs.png written in PNG format)

The restricted spline is much better behaved at the extremes.
(The plot could also be done using the function
command, which results in a nice curve. Just remember to
differentiate the fitted model and to work in the right scale.)
