Contraceptive Use in Bangladesh
The Stata manual [XT, p.249] analyzes data from the 1989 Bangladesh fertility survey, previously analyzed by Huq and Cleland, and by Ng et al.
. webuse bangladesh, clear
(Bangladesh Fertility Survey, 1989)
. desc
Contains data from http://www.stata-press.com/data/r11/bangladesh.dta
obs: 1,934 Bangladesh Fertility Survey, 1989
vars: 7 28 May 2009 20:27
size: 27,076 (99.9% of memory free) (_dta has notes)
---------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------
district byte %9.0g District
c_use byte %9.0g yesno Use contraception
urban byte %9.0g urban Urban or rural
age float %6.2f Age, mean centered
child1 byte %9.0g 1 child
child2 byte %9.0g 2 children
child3 byte %9.0g 3 or more children
---------------------------------------------------------------------------------------------------------------
Sorted by: district
A Random-Intercept Model
We will fit a logit model predicting contraceptive use by urban residence, age and number of children, with a random intercept by district. We use xtmelogit to be able to obtain estimates of the district effect.
. xtmelogit c_use urban age child1 child2 child3 || district: , mle
Refining starting values:
Iteration 0: log likelihood = -1219.2682
Iteration 1: log likelihood = -1209.3544
Iteration 2: log likelihood = -1207.1895
Performing gradient-based optimization:
Iteration 0: log likelihood = -1207.1895
Iteration 1: log likelihood = -1206.8323
Iteration 2: log likelihood = -1206.8322
Iteration 3: log likelihood = -1206.8322
Mixed-effects logistic regression Number of obs = 1934
Group variable: district Number of groups = 60
Obs per group: min = 2
avg = 32.2
max = 118
Integration points = 7 Wald chi2(5) = 109.60
Log likelihood = -1206.8322 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
c_use | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
urban | .7322764 .1194857 6.13 0.000 .4980887 .9664641
age | -.0264982 .0078916 -3.36 0.001 -.0419654 -.0110309
child1 | 1.116002 .1580921 7.06 0.000 .8061466 1.425856
child2 | 1.365895 .174669 7.82 0.000 1.02355 1.70824
child3 | 1.344031 .1796549 7.48 0.000 .9919141 1.696148
_cons | -1.68929 .1477592 -11.43 0.000 -1.978892 -1.399687
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
district: Identity |
sd(_cons) | .4643477 .0789531 .3327464 .6479975
------------------------------------------------------------------------------
LR test vs. logistic regression: chibar2(01) = 43.39 Prob>=chibar2 = 0.0000
. estimates store xtme
. mata exp(st_matrix("e(b)"))
1 2 3 4 5 6 7
+---------------------------------------------------------------------------------------------------+
1 | 2.07980972 .9738498451 3.052623893 3.919229511 3.834469892 .1846506578 .4643477362 |
+---------------------------------------------------------------------------------------------------+
The estimates show that when we compare women with the same age and family size, contraceptive use is higher in urban than in rural areas, with double the odds of using contraception [exp(0.732)=2.08]. Within an area and family size, contraceptive use declines with age, with 2.6% lower odds per year of age [exp(-.026)-1 = -.026]. Within an area and age, contraceptive use is higher among women with a child, and much higher among women with two or more children, than among those with no children, with odds ratios of three and almost four [exp(1.116)=3.05, exp(1.366)=3.92 and exp(1.344)=3.834].
There is very subtantial variation in contraceptive use across districts. The standard deviation of 0.46 indicates that women in a district which is one standard deviation above the mean have odds of using contraception that are 59% higher than comparable women in an average district [exp(0.464) = 1.59]. The standard deviation is also equivalent to a correlation of 0.06 in the latent propensities to use contraception of comparable women in the same district [0.464^2/(0.464^2+ π^2/3) = 0.06].
ML and Empirical Bayes Estimates
We can identify districts where women are more or less likely to
use contraception by "estimating" the random effects. We can
obtain maximum likelihood estimates by treating the estimated
linear predictor from this model as an offset and then fitting
a separate regression model in each district, something easy to
do with statsby as shown by Rabe-Hesketh and
Skrondal(2008, p. 265). So here we go:
. predict xb, xb
. statsby mle=_b[_cons], by(district) saving(ml,replace): ///
> logit c_use, offset(xb)
(running logit on estimation sample)
command: logit c_use, offset(xb)
mle: _b[_cons]
by: district
Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..x.......x.....................................x. 50
..........
. merge m:1 district using ml
Result # of obs.
-----------------------------------------
not matched 0
matched 1,934 (_merge==3)
-----------------------------------------
The mle is not defined when all women in a district have the same outcome, so everybody (or nobody) uses contraception. This occurs for 3 districts.
The empirical Bayes estimate, based on the mode of the posterior
distribution of the random effects, can be obtained with
predict, ref
. estimates restore xtme (results xtme are active now) . predict re, ref . bysort district: gen first=_n==1 . scatter re mle if first || function y=x, range(-1 1) /// > ,title(ML and EB Estimates of District Effects) /// > xtitle(Maximum likleihood) ytitle(Empirical Bayes) /// > legend(off) . graph export bd1.png, width(400) replace (file bd1.png written in PNG format)

Comparing the EB and ML estimates we see the usual shrinkage towards zero, which is particularly noticeable in four districts:
. egen ni = count(1), by(district)
. egen pi = mean(c_use), by(district)
. list district ni pi mle re if first & mle < -1, clean
district ni pi mle re
352. 10 13 .0769231 -1.563583 -.3736235
764. 24 14 .0714286 -1.790981 -.4534406
1740. 55 6 .1666667 -1.609306 -.3196343
1851. 59 10 .1 -1.700366 -.3979423
These are all fairly small districts with less than 15 women each. However, there are five other districts with small sizes that do not exhibit this phenomenon. The reason is that in those cases the estimated random effect was close to zero anyway. (There are also two small districts with all zeroes or all ones.)
. list district ni pi mle re if first & ni < 15 & mle > -1, clean
district ni pi mle re
138. 3 2 1 . .2074448
845. 26 13 .3846154 .0580263 .0228793
1101. 33 14 .4285714 -.1926461 -.0801968
1215. 37 13 .5384616 .7878244 .3173832
1228. 38 14 .2857143 -.4925295 -.1871244
1335. 42 11 .5454546 .5789949 .2067375
1600. 49 4 0 . -.1864481
One final word on EB estimates. If you use gllamm to
fit this model you can use gllapred to obtain
EB estimates. That procedure computes posterior means, whereas
predict, ref computes posterior modes. Because the
posterior distributions are not symmetric these two estimates can
differ, but usually they are quite close.
Predicted Probabilities
Subject-specific probabilities can be computed easily from first principles by setting the fixed covariates and the random effects to illustrative values. In this dataset age is centered at the sample mean. Here are probabilities for a woman with one child and average age in urban and rural areas of the average district:
. di invlogit(_b[_cons] + _b[child1]) .36047847 . di invlogit(_b[_cons] + _b[child1] + _b[urban]) .53966357
We see that the odds ratio of about two translates to probabilities of 36.0 and 54.0%, or 18 percentage points.
Population-average probabilities can also be computed, but they
require integration over the distribution of the random effect.
At this time these probabilities can only be computed by using
gllapred after gllamm, as they have not
(yet) been added to predict after xtmelogit.
They can, however, be computed from first principles using Mata,
as illustrated below.
I wrote a Mata function to compute abscissas and weights for
integrating over a standard normal random variable by translating
the C code for Gauss-Hermite quadrature in Numerical Recipes in C.
The values for 12 quadrature points are also available for general
use in a Stata file called gauher12.dta right here.
(Gauss-Hermite integrates over exp(-t^2), but I changed variables
to a standard normal, multipying the abcsissas by sqrt(2) and
dividing the weights by sqrt(π).)
Here's the calculation of the rural and urban probabilities averaged over the distribution of the district effects. We list the estimates to find the location of the coefficients of interest: the constant is 6th, the urban coef is first, the one-child coef is 3rd, and the log of sigma is 7th:
. mat list e(b)
e(b)[1,7]
eq1: eq1: eq1: eq1: eq1: eq1: lns1_1_1:
urban age child1 child2 child3 _cons _cons
y1 .73227641 -.02649815 1.1160015 1.3658951 1.3440312 -1.6892896 -.76712158
. mata
------------------------------------------------- mata (type end to exit) -------------------------------------
: gh12 = gauher(12) // first column has abcissas, second weights
: b = st_matrix("e(b)")
: sig = exp(b[7])
: u = r = 0
: for (i=1; i <= 12; i++) {
> r = r + invlogit(b[6] + b[3] + sig*gh12[i,1]) * gh12[i,2]
> u = u + invlogit(b[6] + b[3] + sig*gh12[i,1] + b[1]) * gh12[i,2]
> }
: r, u
1 2
+-----------------------------+
1 | .3668268221 .537737042 |
+-----------------------------+
: end
---------------------------------------------------------------------------------------------------------------
The population-average effect of urban residence is a difference from 36.7 to 53.8 or 17.1 percentage points, averaged across all districts.
A Random Slope Model
The next model will treat the urban-rural difference as random at the district level. (Most districts have urban and rural areas.) This is somewhat equivalent to allowing an interation between urban residence and district, but instead of estimating a separate urban-rural difference for each district, we assume that the differences are drawn from a normal distribution:
. xtmelogit c_use urban age child1 child2 child3 ///
> || district: urban, covariance(unstructured) mle
Refining starting values:
Iteration 0: log likelihood = -1215.8594 (not concave)
Iteration 1: log likelihood = -1204.0802
Iteration 2: log likelihood = -1199.7994
Performing gradient-based optimization:
Iteration 0: log likelihood = -1199.7994
Iteration 1: log likelihood = -1199.4784
Iteration 2: log likelihood = -1199.3158
Iteration 3: log likelihood = -1199.315
Iteration 4: log likelihood = -1199.315
Mixed-effects logistic regression Number of obs = 1934
Group variable: district Number of groups = 60
Obs per group: min = 2
avg = 32.2
max = 118
Integration points = 7 Wald chi2(5) = 97.50
Log likelihood = -1199.315 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
c_use | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
urban | .8157872 .1715519 4.76 0.000 .4795516 1.152023
age | -.026415 .008023 -3.29 0.001 -.0421398 -.0106902
child1 | 1.13252 .1603285 7.06 0.000 .818282 1.446758
child2 | 1.357739 .1770522 7.67 0.000 1.010724 1.704755
child3 | 1.353827 .1828801 7.40 0.000 .9953882 1.712265
_cons | -1.71165 .1605617 -10.66 0.000 -2.026345 -1.396954
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
district: Unstructured |
sd(urban) | .8162856 .1975237 .5080068 1.31164
sd(_cons) | .6242943 .1035136 .451079 .8640247
corr(urban,_cons) | -.7964729 .1151556 -.9361775 -.4394904
------------------------------------------------------------------------------
LR test vs. logistic regression: chi2(3) = 58.42 Prob > chi2 = 0.0000
Note: LR test is conservative and provided only for reference.
. mata exp(st_matrix("e(b)")[1])
2.260954838
It is very important in this model to specify the covariance matrix of the random effects as unstructured, so we allow correlation between the level of contraceptive use and the urban-rural difference in each district.
The estimates show that the odds of using contraception on the average district are about double in urban compared to urban areas, but the difference varies substantially by district, as shown by the standard deviation of the urban coefficient, 0.82. For example the urban-rural difference in logits is essentially zero in a district with differential one standard deviation below the mean (0.8158-0.8163 = -0.001), and equivalent to an odds ratio of about five in a district with differential one standard deviation above the mean (0.8158+0.8163 = 1.632).
EB Estimates of the Intercept and Slope
We now compute Empirical Bayes (EB) estimates of the random intercept and random slop, representing the estimated district effects on the leel of contraceptive use and the urban-rural difference. (We could compute ML estimates as well, using a procedure similar to the previous section, but we will focus on the EB estimates.)
. predict ri rs, reffects . scatter rs ri, title(District Effects on Contraceptive Use) /// > xtitle(Effect on Level of Use) /// > ytitle(Effect on Urban-Rural Differential) . graph export bd2.png, width(400) replace (file bd2.png written in PNG format)

We see a clear negative correlation. Districts where contraceptive use is higher than average, after considering the age and number of children of the residents, tend to show a much smaller urban-rural difference, reflecting the correlation of -0.796 estimated above.
We could also estimate separate random effects for urban and rural areas in each district by including dummies for both urban and rural and omitting the constant in the fixed and random parts. If you do that you will find that the two effects are nearly independent.
