Home | GLMs | Multilevel | Survival | Demography | Stata | R

Interpolation

Splines can also be used for interpolation. Here we reproduce the results in the article by Mc Neil, Trussell and Turner listed in the readings.

The Data

The data represent cumulative fertility at ages 20(5)50, which we can just type into Stata.

. clear

. input age F

           age          F
  1. 15 0
  2. 20  .080
  3. 25  .593
  4. 30 1.297
  5. 35 1.840
  6. 40 2.171
  7. 45 2.296
  8. 50 2.306
  9. end

Polynomial Interpolation

With 8 data points we can get an exact fit using a 7-th degree polynomial. Let us reproduce Figure 1 in the article, showing that polynomials don't work very well in this case. We need age^2 to age^7:

. forvalues p=2/7 {
  2.         gen age`p' = age^`p'
  3. }

. quietly regress F age*

Because the fit is exact the residual sum of squares is 0 and the standard errors are undefined, so we wont print the results.

To do the interpolation we create a new data set with age in single years and just predict. How that for easy? We'll go from 13 to 51 to the extremes.

. drop _all

. set obs 39
obs was 0, now 39

. gen age = 12 + _n

. forvalues p=2/7 {
  2.         gen age`p' = age^`p'
  3. }

. predict Fit
(option xb assumed; fitted values)

. line Fit age,  xlabel(15(5)50) name(Fp)

Now that we have cumulative fertility at ages 13 to 51 we can compute differences to obtain age-specific fertility rates centered at the midpoints and plot them

. gen fit = Fit - Fit[_n-1]
(1 missing value generated)

. gen agem = (age + age[_n-1])/2
(1 missing value generated)

. line fit agem, xtitle(age) xlabel(15(5)50) name(fp)

. graph combine Fp fp, xsize(6) ysize(3) title(Polynomial Fit) 

. graph export iasfrp.png, replace
(file iasfrp.png written in PNG format)

You can see that the polynomial is not very well behaved at the extremes. This is not unusual with polynomials.

(An alternative way to do the plots is to use Stata's function plot type, which results in smoother curves suitable for publication. The method used here has the advantage that you get the single-year rates.)

Restricted Splines

McNeil et al. use a spline of degree 5 constrained so that the spline and its first two derivatives are 0 at the extreme ages 15 and 50. This means that fertility itself is zero and not changing at the extreme ages, and ensures good behavior in the tails.

The restricted spline uses 12 parameters: 6 for a fifth-degree polynomial, plus 6 corresponding to the knots at ages 20(5)45, but imposes 6 restrictions, 3 each at ages 0 and 50, two of which ensure that we reproduce the extreme, so the effective number of parameters is 8, just as the polynomial.

We enter ages 20(5)45 as the internal values, and then create three entries each for ages 15 and 50 to accomodate the six constraints.

. drop _all

. input age F

           age          F
  1. 20  .080
  2. 25  .593
  3. 30 1.297
  4. 35 1.840
  5. 40 2.171
  6. 45 2.296
  7. 15 0
  8. 15 0
  9. 15 0
 10. 50 2.306
 11. 50 0
 12. 50 0
 13. end

The article shows how to set up a system of linear equations to compute the coefficients (see page 247). We could do the matrix calculations in Mata, but it is probably easier to set things up as a regression. To check our results against the original paper we will rescale age so we work with 0(1)7 rather than 15(5)50, so we get exactly the same coefficients.

We now need to create terms representing powers of age as well as the knots. We will use local macros to identify rows 8 and 11, which will have the first derivative constraints, and 9 and 12, corresponding to the second derivative contraints:

. local isd1 _n==8 | _n==11

. local isd2 _n==9 | _n==12

. gen one = 1

. replace one = 0 if `isd1' | `isd2'
(4 real changes made)

. // powers: a, a^2, ..., a^5
. gen a = (age-15)/5

. forvalues p=2/5 {
  2.         gen a`p' = a^`p'
  3.         qui replace a`p' = `p' * a^(`p'-1) if `isd1'
  4.         qui replace a`p' = (`p'-1)*`p'*a^(`p'-2) if `isd2'
  5. }

. // knots: (a-1)^5+, ..., (a-6)^5+
. forvalues i=1/6 {
  2.         gen knot`i' = cond(a > `i', (a-`i')^5, 0)
  3.         qui replace knot`i' = cond(a > `i', 5*(a-`i')^4,0) if `isd1'
  4.         qui replace knot`i' = cond(a > `i', 20*(a-`i')^3,0) if `isd2'
  5. }

. // derivatives of a:
. replace a = 1 if `isd1'
(2 real changes made)

. replace a = 0 if `isd2'
(1 real change made)

We can list the data to make sure everything looks OK

. list one age a*, sep(6)

     +-----------------------------------------------+
     | one   age   age   a   a2    a3     a4      a5 |
     |-----------------------------------------------|
  1. |   1    20    20   1    1     1      1       1 |
  2. |   1    25    25   2    4     8     16      32 |
  3. |   1    30    30   3    9    27     81     243 |
  4. |   1    35    35   4   16    64    256    1024 |
  5. |   1    40    40   5   25   125    625    3125 |
  6. |   1    45    45   6   36   216   1296    7776 |
     |-----------------------------------------------|
  7. |   1    15    15   0    0     0      0       0 |
  8. |   0    15    15   1    0     0      0       0 |
  9. |   0    15    15   0    2     0      0       0 |
 10. |   1    50    50   7   49   343   2401   16807 |
 11. |   0    50    50   1   14   147   1372   12005 |
 12. |   0    50    50   0    2    42    588    6860 |
     +-----------------------------------------------+

. list age knot*, sep(6)

     +-----------------------------------------------------+
     | age   knot1   knot2   knot3   knot4   knot5   knot6 |
     |-----------------------------------------------------|
  1. |  20       0       0       0       0       0       0 |
  2. |  25       1       0       0       0       0       0 |
  3. |  30      32       1       0       0       0       0 |
  4. |  35     243      32       1       0       0       0 |
  5. |  40    1024     243      32       1       0       0 |
  6. |  45    3125    1024     243      32       1       0 |
     |-----------------------------------------------------|
  7. |  15       0       0       0       0       0       0 |
  8. |  15       0       0       0       0       0       0 |
  9. |  15       0       0       0       0       0       0 |
 10. |  50    7776    3125    1024     243      32       1 |
 11. |  50    6480    3125    1280     405      80       5 |
 12. |  50    4320    2500    1280     540     160      20 |
     +-----------------------------------------------------+

We are now ready to run a regression. The fit is perfect so the standard errors are undefined. We are interested in the coefficients, which are exactly the same as in the paper (see page 252):

. quietly regress F one a a2-a5 knot*, noconstant

. mat list e(b)

e(b)[1,12]
           one           a          a2          a3          a4          a5       knot1       knot2       knot3
y1   1.853e-10   2.042e-10   4.158e-11   .02094163   .09502961  -.03597124   .05607291  -.01914399  -.00640362

         knot4       knot5       knot6
y1   .00816153  -.00289059  -.00586751

Now that we have the coefficients of our spline, we can predict for single years of age, just like we did before. So we create a new dataset and set age and scaled age. In order to predict we need to recreate the power series and the knots, but we don't need the derivatives.

. drop _all

. set obs 39
obs was 0, now 39

. gen age = 13 + _n

. gen a = (age-15)/5

. gen one = 1

. // powers: a, a^2, ..., a^5
. forvalues p=2/5 {
  2.         gen a`p' = a^`p'
  3. }

. // knots: (a-1)^5+, ..., (a-6)^5+
. forvalues i=1/6 {
  2.         gen knot`i' = cond(a > `i', (a-`i')^5, 0)
  3. }

. predict Fit
(option xb assumed; fitted values)

. line Fit age, xlabel(15(5)50) ytitle(F) name(Fs)

Now we difference, get the age midpoints in the original scale, and plot

. gen fit = Fit-Fit[_n-1]
(1 missing value generated)

. gen agem = (age + age[_n-1])/2
(1 missing value generated)

. line (fit agem), xlabel(15(5)50) xtitle(age) ytitle(f) name(fs)

. graph combine Fs fs, xsize(6) ysize(3) title(Spline Interpolation)

. graph export iasfrs.png, replace
(file iasfrs.png written in PNG format)

The restricted spline is much better behaved at the extremes. (The plot could also be done using the function command, which results in a nice curve. Just remember to differentiate the fitted model and to work in the right scale.)