Child Mortality in Guatemala: Model Interpretation
We continue our discussion of shared frailty with a focus on interpretation of the results using various calculations based on the parameter estimates.
The Distribution of Frailty
The estimated variance of frailty at the family level is 0.2142. In terms of Clayton's model this means that the hazard given that a sibling died at age a is 21.4% higher than the hazard given that the sibling survived age a.
Using the result of Oakes, we can also say that the Tau correlation between sibling survival times is estimated to be 0.097, so the probability of a concordant pair is about 10 percentage points higher than the probability of a discordant pair. (The "pair" here refers to a pair of mothers, say A and B, each with two children, say 1 and 2. The pair is concordant if both children of mother A die at younger (or older) ages than the children of mother B. The pair is discordant if child 1 of A lives longer than child 1 of B but child 2 of A dies younger than child 2 of B. The interpretation is not terribly useful in this application because most observations are censored and thus one can't establish concordance or discordance.)
A more direct approach is to look at the actual distribution of
frailty. Stata has a function gammaden(a,b,g,x) to
compute the density of the gamma distribution with shape a,
scale b (which is 1/β in our notation) and location g (here 0).
Recall that the variance, using Stata notation, is b=1/a.
Here's a plot of the density
. scalar v = exp( _b[ln_the:_cons] )
. twoway function y=gammaden(1/v,v,0,x), range(0 3) ///
> title("Gamma density with mean 1 and variance 0.214")
. graph export gfr.png, width(500) replace
(file gfr.png written in PNG format)

We can also compute quantiles. Stata has a function
invgammap(a,p) to compute quantiles of the standard
gamma distribution with shape a, which has scale 1 and location 0.
A gamma with shape a and scale b is just b times a standard
gamma with shape a. So here are the quartiles of the
distribution depicted above
. mata v=st_numscalar("v")
. mata q=invgammap(1/v, (0.25,.50,.75) )*v
. mata q
1 2 3
+-------------------------------------------+
1 | .6618944858 .9295688957 1.261736474 |
+-------------------------------------------+
. mata q :/ q[2] :-1
1 2 3
+----------------------------------------------+
1 | -.2879554288 0 .3573350829 |
+----------------------------------------------+
So the quartiles are 0.66, 0.93 and 1.26. Computing the ratios Q1/Q2 and Q3/Q2 we see that families with frailty at Q1 have 29% lower risk, and families with frailty at Q3 have 36% higher risk, than families with median frailty. Clearly, unobserved family characteristics have a very substantial effect on survival.
Distribution of Observed Risks
It may be interesting to contrast the above results with
the risks hat can be attributed to observed characteristics.
Here we will simply use predict with the
haz option to predict the hazard holding
frailty constant at one. We then summarize the hazard.
In doing this we need to remember that each child contributes one or more pseudo observations. To make sure we compute averages that apply to the sample of children rather than the pseudo observations we select the first interval for each child:
. predict haz, haz
(option alpha1 assumed)
. sum haz if _t0==0, detail
Predicted hazard
-------------------------------------------------------------
Percentiles Smallest
1% .0309073 .0280617
5% .033402 .0297215
10% .0361795 .029956 Obs 3120
25% .0413038 .029956 Sum of Wgt. 3120
50% .0470627 Mean .0516725
Largest Std. Dev. .0182111
75% .0552813 .1704806
90% .0726199 .1706315 Variance .0003316
95% .0934537 .171702 Skewness 2.365153
99% .1179419 .187827 Kurtosis 10.73898
. di r(p25), r(p50), r(p75)
.04130384 .04706271 .05528133
. di r(p25)/r(p50)-1, r(p75)/r(p50)-1
-.12236582 .17463127
So children who are at Q1 in terms of the observed risk factors have 12% lower risk, and children at Q3 of observed risk factors have 17% higher risk, than children at the median. Looks like characteristics that are observed at birth have a smaller impact on survival than unobserved family characteristcs. (We have not considered time-varying covariates, such as the birth of another child, but one can easily calculate their impact for any given child by specifying a trajectory.)
Subject-Specific Probabilities
Another approach to presenting results is to calculate survival probabilities. I will calculate probabilities of surviving to ages one and five, the complements of the infant and child mortality "rates". First let us get the baseline hazard, the width of the intervals, and thus the cumulative baseline hazard. Recall that we did not include a constant, so the relevant parameters are the first five:
. mata
------------------------------------------------- mata (type end to exit) -------------------------
: b = st_matrix("e(b)")
: h = exp(b[1..5])
: w = (1,5,6,12,36)
: H = runningsum(w :* h)
: end
---------------------------------------------------------------------------------------------------
Next I will focus on a mother who is 26 years old, pretty close to the mean age, is having a second child (so we can include the length of the first interval as a predictor), and has not experienced a child death before giving birth to the second child. We consider first the case where the preceding birth interval is 3 years or longer.
The relevant parameters for age, age-squared, birth order and previous birth interval are in positions 6, 7, 8, and 13 of the vector of parameters:
. mata xb = b[6]*26 + b[7]*26^2 + b[8]*2 + b[13]
We can now compute the survival function for the average family with these characteristics. We will also change the previous birth interval to one year, removing the coefficient in position 13 and adding the one in position 10:
. mata 1 :- exp(-H * exp(xb) )
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0304345123 .0413444931 .0573727829 .0762983366 .0863384417 |
+-----------------------------------------------------------------------+
. mata 1 :- exp(-H * exp(xb - b[13] + b[10] ) )
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0763993311 .1028877058 .1409521526 .1846057143 .2072016182 |
+-----------------------------------------------------------------------+
So the probabilities of infant and child death for the average 26-year old mother having a second child are 5.7 and 8.6% with a three-year interval and 14.1% and 20.7% with a one-year interval.
If we consider instead mothers at the first and third quartiles of frailty we obtain
. mata
------------------------------------------------- mata (type end to exit) -------------------------
: 1 :- exp(-H * q[1] * exp(xb) )
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0202495123 .0275605766 .0383528006 .0511760483 .058014883 |
+-----------------------------------------------------------------------+
: 1 :- exp(-H * q[1] * exp(xb - b[13] + b[10] ) )
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0512447154 .0693431838 .0956710158 .1263555039 .1424560573 |
+-----------------------------------------------------------------------+
: 1 :- exp(-H * q[3] * exp(xb) )
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0382462283 .0518806776 .0718379424 .0952884784 .1076783111 |
+-----------------------------------------------------------------------+
: 1 :- exp(-H * q[3] * exp(xb - b[13] + b[10] ) )
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0954132854 .1280228624 .174442573 .2270180579 .2539465035 |
+-----------------------------------------------------------------------+
: end
---------------------------------------------------------------------------------------------------
So for a low-risk (Q1) family the probability of child death goes from 5.8 to 14.2 when we compare long and short intervals. For a high-risk (Q3) family the corresponding probability goes from 10.8 to 25.4%.
Population-Average Probabilities
We can also calculate the average probabilities in the population of 26-year old mothers having a second birth. From the results in the notes, the (complements of) the survival probabilities under gamma frailty are
. mata 1 :- (1 :/ (1 :+ v * H :* exp(xb))) :^(1/v)
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0303357356 .0411625024 .0570231867 .0756818763 .0855503244 |
+-----------------------------------------------------------------------+
. mata 1 :- (1 :/ (1 :+ v * H :* exp(xb - b[13] + b[10]))) :^(1/v)
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .0757812476 .1017714938 .1388706264 .181062986 .202757504 |
+-----------------------------------------------------------------------+
So on average the probabilities of infant and child deaths are 5.7 and 8.6% with three-year intervals and 13.9 and 20.3% with one-year birth intervals. These are a bit lower than the corresponding probabilities for the average family with the given characteristics. (The difference is modest because by ages one and five there hasn't been a lot of time for selection to operate.)
Marginal, Joint and Conditional Probabilities
One last calculation we can do explicitly under gamma heterogeneity involves the marginal and joint probabilities of infant and child death for two children in the same family.
We could do these calculations for 26 year-old mothers having a second birth three or more years after the first, but unless they have twins the calculation of bivariate probabilities doesn't make a lot of sense. So I will do the calculations for a second birth at age 26 and a third at age 29, everything else being the same. (A simpler approach would be to do the calculations for mothers whose observed risk factors put them at the median.)
Applying the results in the notes
. mata xb2 = b[6]*29 + b[7]*29^2 + b[8]*3 + b[13]
. mata S1 = (1 :/ (1 :+ v * H :* exp(xb))) :^(1/v)
. mata S2 = (1 :/ (1 :+ v * H :* exp(xb2))) :^(1/v)
. mata S12 = (1 :/ (1 :+ v * H :* (exp(xb) + exp(xb2)))) :^(1/v)
. mata S1 :* S2 \ S12
1 2 3 4 5
+-----------------------------------------------------------------------+
1 | .9392544348 .9180445945 .8874193438 .852069807 .833669972 |
2 | .9394506446 .9184008084 .8880888478 .8532200758 .835120237 |
+-----------------------------------------------------------------------+
So the probability of two children surviving to age five is slightly higher than the product of the two marginals. A better way to see the correlation is to calculate a two by two table with survival to age five for two children in the same family
. mata M = ( 1-S1[5]-S2[5]+S12[5] , S2[5]-S12[5] \
> S1[5]-S12[5], S12[5])
. mata M
1 2
+-----------------------------+
1 | .0090075205 .0765428038 |
2 | .0793294387 .835120237 |
+-----------------------------+
. mata (M[1,1]/M[1,2] ) / (M[2,1]/M[2,2] )
1.238840858
The last calculation is an odds ratio: the odds of one child dying by age five are 23.9% higher if the other child died by age five than if the other didn't!
Log-Normal Frailty
If we analyze the data using log-normal frailty rather
than gamma frailty the results are very similar.
One slightdrawback, however, is that Stata's xtreg
command allows only for gamma and inverse Gaussian frailty.
What can we do?
We can take advantage of the 'Poisson trick' to fit the model using Poisson regression with a normally distributed random effect. As a simple check I first use a gamma distributed random effect to show you that I get the same results as before
. local age "a0 a1to5 a6to11 a12to23 a24up"
. local fixed "mage mage2 borde pdead"
. local previous "p0014 p1523 p2435 p36up"
. local follow "i011a1223 i011a24p i1223a24p"
. gen expo = _t-_t0
. xtset momid
panel variable: momid (unbalanced)
. xtpoisson _d `age' `fixed' `previous' `follow', noconstant nolog ///
> exposure(expo) // default is gamma
Random-effects Poisson regression Number of obs = 13594
Group variable: momid Number of groups = 851
Random effects u_i ~ Gamma Obs per group: min = 1
avg = 16.0
max = 40
Wald chi2(16) = 8796.54
Log likelihood = -2107.0551 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a0 | -.9914825 .7746654 -1.28 0.201 -2.509799 .5268338
a1to5 | -3.605673 .7831759 -4.60 0.000 -5.14067 -2.070677
a6to11 | -3.389236 .7806303 -4.34 0.000 -4.919243 -1.859228
a12to23 | -3.897659 .7812767 -4.99 0.000 -5.428933 -2.366385
a24up | -5.614579 .8021054 -7.00 0.000 -7.186677 -4.042482
mage | -.155537 .0600573 -2.59 0.010 -.273247 -.0378269
mage2 | .002685 .001066 2.52 0.012 .0005957 .0047743
borde | .0574649 .0355913 1.61 0.106 -.0122928 .1272226
pdead | -.0754952 .1785745 -0.42 0.672 -.4254948 .2745045
p0014 | .5731241 .2175683 2.63 0.008 .1466981 .9995501
p1523 | -.0969716 .1893874 -0.51 0.609 -.4681641 .274221
p2435 | -.2280708 .1881447 -1.21 0.225 -.5968277 .1406861
p36up | -.3713334 .2122237 -1.75 0.080 -.7872841 .0446174
i011a1223 | .7931767 .7209825 1.10 0.271 -.6199231 2.206276
i011a24p | 1.601311 .7415929 2.16 0.031 .1478161 3.054807
i1223a24p | .0741753 .3776798 0.20 0.844 -.6660634 .8144141
expo | (exposure)
-------------+----------------------------------------------------------------
/lnalpha | -1.540714 .632364 -2.780125 -.3013038
-------------+----------------------------------------------------------------
alpha | .214228 .1354701 .0620307 .739853
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 3.29 Prob>=chibar2 = 0.035
. xtpoisson _d `age' `fixed' `previous' `follow', noconstant nolog ///
> exposure(expo) normal
Random-effects Poisson regression Number of obs = 13594
Group variable: momid Number of groups = 851
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 16.0
max = 40
Wald chi2(16) = 5811.12
Log likelihood = -2107.0852 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a0 | -1.084553 .773087 -1.40 0.161 -2.599775 .4306701
a1to5 | -3.698709 .781317 -4.73 0.000 -5.230062 -2.167356
a6to11 | -3.482367 .7786008 -4.47 0.000 -5.008396 -1.956337
a12to23 | -3.990986 .7789156 -5.12 0.000 -5.517633 -2.46434
a24up | -5.708019 .799513 -7.14 0.000 -7.275036 -4.141003
mage | -.1560088 .0601027 -2.60 0.009 -.2738079 -.0382097
mage2 | .0026957 .0010667 2.53 0.011 .0006051 .0047863
borde | .0575021 .0355654 1.62 0.106 -.0122048 .1272091
pdead | -.0736851 .1795495 -0.41 0.682 -.4255958 .2782255
p0014 | .5717255 .2175485 2.63 0.009 .1453384 .9981127
p1523 | -.0971671 .189422 -0.51 0.608 -.4684274 .2740932
p2435 | -.2285237 .1880919 -1.21 0.224 -.597177 .1401297
p36up | -.3714952 .2122264 -1.75 0.080 -.7874513 .0444608
i011a1223 | .7944906 .7210032 1.10 0.270 -.6186497 2.207631
i011a24p | 1.603746 .7414477 2.16 0.031 .1505347 3.056956
i1223a24p | .0737546 .3776714 0.20 0.845 -.6664677 .8139769
expo | (exposure)
-------------+----------------------------------------------------------------
/lnsig2u | -1.631103 .6089455 -2.68 0.007 -2.824614 -.4375917
-------------+----------------------------------------------------------------
sigma_u | .4423953 .1346973 .2435807 .8034857
------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01) = 3.23 Pr>=chibar2 = 0.036
One advantage of log-normal frailty is that we can view the log of frailty as σ z where z ∼ N(0,1) is a standard normal random variable, so the standard deviation of frailty can be interpreted as a regression coefficient for a normally-distributed latent variable representing unobserved family characteristics, just like the β's are coefficients for observed characteristics.
Here σ is estimated as 0.442, so a one-standard difference in unobserved characteristics is associated with 55.6% higher risk [exp(0.442)=1.556]. We can also look at the interquartile difference in risks, using the quartiles of the standard normal distribution
. mata sigma = exp(0.5*st_matrix("e(b)")[17])
. mata exp(sigma)
1.556430878
. mata exp( invnormal( (0.25,0.75) ) * sigma ) :- 1
1 2
+-------------------------------+
1 | -.2579889149 .3476887611 |
+-------------------------------+
So we see that families in the lower quartile have 26% lower risk, and families in the upper quartile have 35% higher risk, than families at the median. These results are very similar to those obtained under gamma frailty.
Another important advantage of log-normal frailty is that it extends easily to more than two levels and to more general random-intercept and random-slope models, as we will see in the multilevel course.
A disadvantage, however, is that the unconditional survival distribution does not have a closed form, unlike the case of gamma frailty. To do calculations similar to those of the previous sections we would have to use Gaussian quadrature to integrate out the random effect.
