Germán Rodríguez

Survival Analysis
Princeton University
We continue our competing risk analysis of supreme court tenure by introducing two covariates: `age`

, how old the justice was when appointed, and `year`

, the calendar year of appointment. We obviously expect survival to improve over time and decline with the age of the justice at appointment, but it is not clear whether the trend over time is to retire earlier or later.

We use the file saved in the previous step. Alternatively, you may rebuild the dataset using the first block of code in that page.

. use http://data.princeton.edu/pop509/justices2, clear

> justices <- read.csv("http://data.princeton.edu/pop509/justices2.csv")

We can now run a Cox regression looking at overall tenure in the court

. stset tenure, fail(event) failure event: event != 0 & event < . obs. time interval: (0, tenure] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 113 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 113 observations remaining, representing 104 failures in single-record/single-failure data 1,887.365 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 36.57221 . stcox age year, efron failure _d: event analysis time _t: tenure Iteration 0: log likelihood = -389.72292 Iteration 1: log likelihood = -375.63169 Iteration 2: log likelihood = -375.56405 Iteration 3: log likelihood = -375.56403 Refining estimates: Iteration 0: log likelihood = -375.56403 Cox regression -- Efron method for ties No. of subjects = 113 Number of obs = 113 No. of failures = 104 Time at risk = 1887.364823 LR chi2(2) = 28.32 Log likelihood = -375.56403 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ 1.084647 .0187453 4.70 0.000 1.048522 1.122017 year │ .9941362 .0017051 -3.43 0.001 .9907998 .9974839 ─────────────┴────────────────────────────────────────────────────────────────

> library(survival) > coxb <- coxph(Surv(tenure, event > 0) ~ age + year, data=justices) > coef(summary(coxb)) coef exp(coef) se(coef) z Pr(>|z|) age 0.081254650 1.0846471 0.017282397 4.701585 2.581503e-06 year -0.005881029 0.9941362 0.001715201 -3.428770 6.063235e-04

We see that the risk of leaving the court at any length of service is 8.5% higher for every year of age at appointment, and just over half a percent lower per calendar year. This, of course, confounds trends in mortality with trends in retirement.

To look at the risk of death we use death as the event indicator, treating everything else as censoring

. streset, fail(event == 1) // just death -> stset tenure, failure(event==1) failure event: event == 1 obs. time interval: (0, tenure] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 113 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 113 observations remaining, representing 50 failures in single-record/single-failure data 1,887.365 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 36.57221 . stcox age year, efron failure _d: event == 1 analysis time _t: tenure Iteration 0: log likelihood = -187.36741 Iteration 1: log likelihood = -176.31427 Iteration 2: log likelihood = -176.28608 Iteration 3: log likelihood = -176.28608 Refining estimates: Iteration 0: log likelihood = -176.28608 Cox regression -- no ties No. of subjects = 113 Number of obs = 113 No. of failures = 50 Time at risk = 1887.364823 LR chi2(2) = 22.16 Log likelihood = -176.28608 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ 1.070161 .0251142 2.89 0.004 1.022053 1.120533 year │ .988923 .0025736 -4.28 0.000 .9838917 .9939801 ─────────────┴────────────────────────────────────────────────────────────────

> coxd <- coxph(Surv(tenure, event == 1) ~ age + year, data=justices) > coef(summary(coxd)) coef exp(coef) se(coef) z Pr(>|z|) age 0.06780911 1.070161 0.023467640 2.889473 3.858885e-03 year -0.01113878 0.988923 0.002602433 -4.280141 1.867747e-05

We see that the risk of death while in the court is about 7% higher per year of age at appointment, and declines just over one percent per calendar year of appointment, reflecting general improvements in mortality over time. Note that using dates in days we had no ties.

To look at the risk of retirement we change the event of interest

. streset , fail(event == 2) // just retirement -> stset tenure, failure(event==2) failure event: event == 2 obs. time interval: (0, tenure] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 113 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 113 observations remaining, representing 54 failures in single-record/single-failure data 1,887.365 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 36.57221 . stcox age year, efron failure _d: event == 2 analysis time _t: tenure Iteration 0: log likelihood = -202.36566 Iteration 1: log likelihood = -193.54943 Iteration 2: log likelihood = -193.28902 Iteration 3: log likelihood = -193.28867 Refining estimates: Iteration 0: log likelihood = -193.28867 Cox regression -- no ties No. of subjects = 113 Number of obs = 113 No. of failures = 54 Time at risk = 1887.364823 LR chi2(2) = 18.15 Log likelihood = -193.28867 Prob > chi2 = 0.0001 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ─────────────┼──────────────────────────────────────────────────────────────── age │ 1.105598 .0281334 3.95 0.000 1.051811 1.162137 year │ .999077 .0024063 -0.38 0.701 .9943719 1.003804 ─────────────┴────────────────────────────────────────────────────────────────

> coxr <- coxph(Surv(tenure, event == 2) ~ age + year, data=justices) > coef(summary(coxr)) coef exp(coef) se(coef) z Pr(>|z|) age 0.1003868036 1.105598 0.025446280 3.9450482 7.978392e-05 year -0.0009234354 0.999077 0.002408505 -0.3834061 7.014186e-01

We see an increase in the risk of retirement of 10.6% per year of age at appointment, and essentially no trend over time, so the risk of retirement depends only on starting age and not on calendar year of appointment.

The next step will be to calculate cumulative incidence functions based on the Cox models for cause-specific hazards. In the next section we will use the Fine and Gray model.

The aim is to compare the cumulative incidence of death for justices appointed at age 55 in 1950 and 2000. For convenience I will center age on 55 and calendar year on 1950. In both cases I add 0.5 to convert to exact age and the middle of the year. We start by fitting a Cox model to the risk of death

. gen agec = age - 55.5 . gen yearc = year - 1950.5 . streset, fail(event == 1) -> stset tenure, failure(event==1) failure event: event == 1 obs. time interval: (0, tenure] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 113 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 113 observations remaining, representing 50 failures in single-record/single-failure data 1,887.365 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 36.57221 . quietly stcox agec yearc

> library(dplyr) > justices <- mutate(justices, + agec = age - 55.5, + yearc = year - 1950.5 + ) > coxd <- coxph(Surv(tenure, event == 1) ~ agec + yearc, data=justices)

The next task is to calculate the baseline hazard. I will do this by calculating the Nelson-Aalen estimate of the cumulative hazard and then differencing it. I think this is conceptually clearer, and yields results that are very similar to an alternative calculation based on the Kalbfleisch-Prentice estimate of the survival function. I also calculate the hazard for a judge appointed 50 years later using the estimated coefficient from the Cox regression.

. predict H0d, basech . gsort +_t -_d . gen h0d = H0d - H0d[_n - 1] (1 missing value generated) . replace h0d = H0d in 1 (1 real change made) . gen h1d = h0d*exp(50 * _b[yearc])

> sfd <- survfit(coxd, newdata=list(agec = 0, yearc = 0)) > cif <- data.frame(time = sfd$time, + h0d = diff(c(0, sfd$cumhaz))) > cif <- mutate(cif, h1d = h0d * exp(50 * coef(coxd)["yearc"]))

The next step is to do exactly the same calculations for the risk of retirement

. streset, fail(event == 2) -> stset tenure, failure(event==2) failure event: event == 2 obs. time interval: (0, tenure] exit on or before: failure ────────────────────────────────────────────────────────────────────────────── 113 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────────── 113 observations remaining, representing 54 failures in single-record/single-failure data 1,887.365 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 36.57221 . quietly stcox agec yearc . predict H0r, basech . gsort +_t -_d . gen h0r = H0r - H0r[_n - 1] (1 missing value generated) . replace h0r = H0r in 1 (1 real change made) . gen h1r = h0r * exp(50 * _b[yearc])

> coxr <- coxph(Surv(tenure, event == 2) ~ agec + yearc, data=justices) > sfr <- survfit(coxr, newdata=list(agec = 0, yearc = 0)) > if(!all(sfr$time == sfd$time)) stop("times differ") > cif <- mutate(cif, + h0r = diff(c(0, sfr$cumhaz)), + h1r = h0r * exp(50 * coef(coxr)["yearc"]) + )

We can now add the hazards of death and retirement and calculate the overall survival probability as well as the incidence functions, all from first principles. For these calculations we need to retain just one observation per distinct failure time and make sure the hazards for all causes are attached to the same observation, which we do by taking the maximum for each value of `_t`

. Hat tip to Shawna Metzger, who alerted me to this issue and provided the solution. In this dataset there is only one tie between a death and a retirement at 1832 days or 5.0157 years. In the original dataset there were many ties, as dates were coded in years only.

. foreach hazard in h0d h0r h1d h1r { 2. capture drop temp 3. egen temp = max(`hazard'), by(_t) 4. quietly replace `hazard' = temp 5. } . egen first = tag(_t) . drop if !first (1 observation deleted)

We then calculate the incidence functions of death for 1950 and 2000, summing the product of the probabilities of surviving all causes of attrition up to each time and then failing due to each cause:

. forvalues y = 0/1 { 2. gen h`y' = h`y'd + h`y'r 3. gen S`y' = exp(-sum(h`y')) // Aalen 4. gen c`y' = S`y'[_n-1] * h`y'd 5. replace c`y' = h`y'd in 1 6. gen I`y' = sum(c`y') 7. } (1 missing value generated) (1 real change made) (1 missing value generated) (1 real change made) . line I0 I1 _t, connect(J J) /// > title("Cumulative incidence of death") xtitle(Tenure) /// > subtitle("Justice appointed at age 55 in 1950 and 2000") /// > note(Based on Cox model of cause-specific hazards) /// > text(.226 38 "2000") text(.328 38 "1950") legend(off) . graph export justices-cifcox.png, width(500) replace (note: file justices-cifcox.png not found) (file justices-cifcox.png written in PNG format) . list I0 I1 in -1 ┌─────────────────────┐ │ I0 I1 │ ├─────────────────────┤ 112. │ .3281087 .2261896 │ └─────────────────────┘

> n <- nrow(cif) > cif <- mutate(cif, + S0 = exp(-cumsum(h0d + h0r)), + S1 = exp(-cumsum(h1d + h1r)), + I0d = cumsum(c(1, S0[-n]) * h0d), + I1d = cumsum(c(1, S1[-n]) * h1d) + ) > cif[n, c("time", "I0d", "I1d")] time I0d I1d 112 36.57221 0.3281087 0.2261896 > library(ggplot2) > g <- ggplot(cif, aes(time, I0d)) + geom_line() + geom_line(aes(time, I1d)) + + ggtitle("Incidence of Death For Justices Appointed at Age 55 in 1950 and 2000") > g + annotate("text", x = 38, y=max(cif$I0d), label="1950") + + annotate("text", x = 38, y=max(cif$I1d), label="2000") > ggsave("justices-cifcoxr.png", width = 500/72, height = 400/72, dpi = 72)

We see that the probability of dying while serving in the supreme court for a judge appointed at age 55 has fallen from 32.8% in 1950 to 22.6% in 2000, assuming of course that the trends continue so the model still applies.

*Note*. You may be interested to know that using an approach analogous to the Kaplan-Meier estimate yields the same probabilities quoted above to three decimal places.