Example 3: Union Membership
This is a dataset used in the Stata manuals and in my own paper on intra-class correlation for binary data. This is a subsample of the National Longitudinal Survey of Youth (NLSY) and has union membership information from 1970-88 for 4,434 women aged 14-26 in 1968. The data are available in the Stata and OPR websites
. clear . use http://opr.princeton.edu/stata/union (NLS Women 14-24 in 1968)
Logits
Here is a logit model
. logit union age grade not_smsa south southXt
Iteration 0: log likelihood = -13864.23
Iteration 1: log likelihood = -13550.511
Iteration 2: log likelihood = -13545.74
Iteration 3: log likelihood = -13545.736
Logit estimates Number of obs = 26200
LR chi2(5) = 636.99
Prob > chi2 = 0.0000
Log likelihood = -13545.736 Pseudo R2 = 0.0230
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0099931 .0026737 3.74 0.000 .0047527 .0152335
grade | .0483487 .0064259 7.52 0.000 .0357541 .0609432
not_smsa | -.2214908 .0355831 -6.22 0.000 -.2912324 -.1517493
south | -.7144461 .0612145 -11.67 0.000 -.8344244 -.5944678
southXt | .0068356 .0052258 1.31 0.191 -.0034067 .0170779
_cons | -1.888256 .113141 -16.69 0.000 -2.110009 -1.666504
------------------------------------------------------------------------------
. estimates store logit
Fixed-Effects
Let us try a fixed-effects model first
. xtlogit union age grade not_smsa south southXt, i(id) fe
note: multiple positive outcomes within groups encountered.
note: 2744 groups (14165 obs) dropped due to all positive or
all negative outcomes.
Iteration 0: log likelihood = -4541.9044
Iteration 1: log likelihood = -4511.1353
Iteration 2: log likelihood = -4511.1042
Conditional fixed-effects logistic regression Number of obs = 12035
Group variable (i): idcode Number of groups = 1690
Obs per group: min = 2
avg = 7.1
max = 12
LR chi2(5) = 78.16
Log likelihood = -4511.1042 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0079706 .0050283 1.59 0.113 -.0018848 .0178259
grade | .0811808 .0419137 1.94 0.053 -.0009686 .1633302
not_smsa | .0210368 .113154 0.19 0.853 -.2007411 .2428146
south | -1.007318 .1500491 -6.71 0.000 -1.301409 -.7132271
southXt | .0263495 .0083244 3.17 0.002 .010034 .0426649
------------------------------------------------------------------------------
. estimates store fixed
Note how we lost 63% of our sample (2744 out of 4434). These are women who didn't have variation in union membership. We will compare the estimates later.
Random-Effects
Now we fit a random-effects model:
. xtlogit union age grade not_smsa south southXt, i(id)
Fitting comparison model:
Iteration 0: log likelihood = -13864.23
Iteration 1: log likelihood = -13550.511
Iteration 2: log likelihood = -13545.74
Iteration 3: log likelihood = -13545.736
Fitting full model:
tau = 0.0 log likelihood = -13545.736
tau = 0.1 log likelihood = -12926.225
tau = 0.2 log likelihood = -12419.526
tau = 0.3 log likelihood = -12003.162
tau = 0.4 log likelihood = -11656.844
tau = 0.5 log likelihood = -11367.53
tau = 0.6 log likelihood = -11129.716
tau = 0.7 log likelihood = -10947.266
tau = 0.8 log likelihood = -10845.532
Iteration 0: log likelihood = -10947.266
Iteration 1: log likelihood = -10604.628
Iteration 2: log likelihood = -10557.905
Iteration 3: log likelihood = -10556.297
Iteration 4: log likelihood = -10556.294
Random-effects logistic regression Number of obs = 26200
Group variable (i): idcode Number of groups = 4434
Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 5.9
max = 12
Wald chi2(5) = 221.95
Log likelihood = -10556.294 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0092401 .0044368 2.08 0.037 .0005441 .0179361
grade | .0840066 .0181622 4.63 0.000 .0484094 .1196038
not_smsa | -.2574574 .0844771 -3.05 0.002 -.4230294 -.0918854
south | -1.152854 .1108294 -10.40 0.000 -1.370075 -.9356323
southXt | .0237933 .0078548 3.03 0.002 .0083982 .0391884
_cons | -3.25016 .2622898 -12.39 0.000 -3.764238 -2.736081
-------------+----------------------------------------------------------------
/lnsig2u | 1.669888 .0430016 1.585607 1.75417
-------------+----------------------------------------------------------------
sigma_u | 2.304685 .0495526 2.209582 2.403882
rho | .6175213 .0101565 .5974278 .6372209
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5978.89 Prob >= chibar2 = 0.000
. estimates store random
Comparisons
Here's a table comparing the estimates (we use the
equation option so Stata can find the correct estimates).
. estimates table logit random fixed, equation(1)
-----------------------------------------------------
Variable | logit random fixed
-------------+---------------------------------------
#1 |
age | .00999311 .00924011 .00797058
grade | .04834865 .08400659 .08118077
not_smsa | -.22149081 -.25745741 .02103677
south | -.71444608 -1.1528539 -1.0073178
southXt | .0068356 .02379331 .02634948
_cons | -1.8882564 -3.2501596
-------------+---------------------------------------
lnsig2u |
_cons | 1.6698883
-----------------------------------------------------
The main change is in the coefficient of not_smsa.
You might think this indicates something wrong with the logit and
random-effects models, but note that only women who have moved
between standard metropolitan statistical areas and other places contribute
to the fixed-effects estimate. It seems reasonable to believe that these
women differ from the rest.
The random-effect coefficients are larger in magnitude than the ordinary logit coefficients. This is almost always the case. Omission of the random effect biases the coefficients towards zero.
Intra-class correlation
The random-effects estimate shows an intra-class correlation of 0.6175, indicating a high correlation between a woman's propensity to be a union member in different years after controlling for education and residence.
My paper with Elo in the Stata journal, 2003, shows how this can be
interpreted in terms of an odds ratio and translated into measures of
manifest correlation using xtrho. The command is available from
the Stata journal website; in Stata type findit xtrho, or
net describe st0031, from(http://www.stata-journal.com/software/sj3-1).
For the average woman the correlation between actual union membership in
any two years is 0.408 using Pearson's r and 0.769 using Yule's Q.

