Germán Rodríguez
Multilevel Models Princeton University

## Random Effects Logit Models

The Stata manual has data on union membership from the NLS for 4434 women who were 14-24 in 1968 and were observed between 1 and 12 times.

We read the data from the web and compute southXt, an interaction term between south and year centered on 70.

. webuse union, clear
(NLS Women 14-24 in 1968)

. gen southXt = south * (year-70)

#### Logit Estimates

We first compute logit estimates for later comparison, fitting the same model as in [R] xtlogit with clustered standard errors

. logit union age grade not_smsa south southXt

Iteration 0:   log likelihood =  -13864.23
Iteration 1:   log likelihood = -13550.511
Iteration 2:   log likelihood =  -13545.74
Iteration 3:   log likelihood = -13545.736

Logistic regression                               Number of obs   =      26200
LR chi2(5)      =     636.99
Prob > chi2     =     0.0000
Log likelihood = -13545.736                       Pseudo R2       =     0.0230

------------------------------------------------------------------------------
union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |   .0099931   .0026737     3.74   0.000     .0047527    .0152335
grade |   .0483487   .0064259     7.52   0.000     .0357541    .0609432
not_smsa |  -.2214908   .0355831    -6.22   0.000    -.2912324   -.1517493
south |  -.7144461   .0612145   -11.67   0.000    -.8344244   -.5944678
southXt |   .0068356   .0052258     1.31   0.191    -.0034067    .0170779
_cons |  -1.888256    .113141   -16.69   0.000    -2.110009   -1.666504
------------------------------------------------------------------------------

. estimates store logit

. logit union age grade not_smsa south southXt, cluster(idcode)

Iteration 0:   log pseudolikelihood =  -13864.23
Iteration 1:   log pseudolikelihood = -13550.511
Iteration 2:   log pseudolikelihood =  -13545.74
Iteration 3:   log pseudolikelihood = -13545.736

Logistic regression                               Number of obs   =      26200
Wald chi2(5)    =     161.37
Prob > chi2     =     0.0000
Log pseudolikelihood = -13545.736                 Pseudo R2       =     0.0230

(Std. Err. adjusted for 4434 clusters in idcode)
------------------------------------------------------------------------------
|               Robust
union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |   .0099931   .0039563     2.53   0.012     .0022389    .0177473
grade |   .0483487    .013936     3.47   0.001     .0210346    .0756628
not_smsa |  -.2214908   .0713568    -3.10   0.002    -.3613475   -.0816341
south |  -.7144461   .0919865    -7.77   0.000    -.8947362   -.5341559
southXt |   .0068356   .0070241     0.97   0.330    -.0069314    .0206026
_cons |  -1.888256    .207871    -9.08   0.000    -2.295676   -1.480837
------------------------------------------------------------------------------

. estimates store cluster

#### Random Intercepts

The next step is to fit a random-intercepts model and compare results

. xtlogit union age grade not_smsa south southXt, i(idcode)

Fitting comparison model:

Iteration 0:   log likelihood =  -13864.23
Iteration 1:   log likelihood = -13550.511
Iteration 2:   log likelihood =  -13545.74
Iteration 3:   log likelihood = -13545.736

Fitting full model:

tau =  0.0     log likelihood = -13545.736
tau =  0.1     log likelihood = -12926.225
tau =  0.2     log likelihood = -12419.526
tau =  0.3     log likelihood = -12003.162
tau =  0.4     log likelihood = -11656.844
tau =  0.5     log likelihood =  -11367.53
tau =  0.6     log likelihood = -11129.716
tau =  0.7     log likelihood = -10947.266
tau =  0.8     log likelihood = -10845.532

Iteration 0:   log likelihood = -10947.312
Iteration 1:   log likelihood = -10557.296
Iteration 2:   log likelihood = -10540.582
Iteration 3:   log likelihood = -10540.367
Iteration 4:   log likelihood = -10540.367
Iteration 5:   log likelihood = -10540.366

Random-effects logistic regression              Number of obs      =     26200
Group variable: idcode                          Number of groups   =      4434

Random effects u_i ~ Gaussian                   Obs per group: min =         1
avg =       5.9
max =        12

Wald chi2(5)       =    227.30
Log likelihood  = -10540.366                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |   .0093936    .004454     2.11   0.035      .000664    .0181232
grade |   .0867878   .0176345     4.92   0.000     .0522247    .1213508
not_smsa |  -.2519379    .082334    -3.06   0.002    -.4133095   -.0905663
south |  -1.163769   .1114164   -10.45   0.000    -1.382141    -.945397
southXt |    .023245   .0078497     2.96   0.003     .0078599    .0386302
_cons |  -3.360131   .2586306   -12.99   0.000    -3.867038   -2.853225
-------------+----------------------------------------------------------------
/lnsig2u |   1.749534   .0469964                      1.657423    1.841645
-------------+----------------------------------------------------------------
sigma_u |   2.398317   .0563561                      2.290366    2.511356
rho |   .6361486   .0108779                      .6145729    .6571902
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) =  6010.74 Prob >= chibar2 = 0.000

. estimates store xtlogit

. estimates table  logit xtlogit, eq(1:1)

----------------------------------------
Variable |   logit       xtlogit
-------------+--------------------------
#1           |
age |  .00999311    .00939361
not_smsa | -.22149081   -.25193788
south | -.71444608   -1.1637691
southXt |   .0068356    .02324502
_cons | -1.8882564   -3.3601312
-------------+--------------------------
lnsig2u      |
_cons |               1.7495341
----------------------------------------

Except for age, we see that the subject-specific estimates are larger in magnitude than in the marginal logit model.

The odds of being in a union in 1970 are 69% lower for a woman who lives in the south than for one who doesn't, everything else being equal. The effect declined over time, and by 1988 the odds of belonging to a union were 53% lower in the south than elsewhere.

In contrast the logit model estimates the effect as 51% lower odds in 1970 and 45% lower in 1988. These estimates can be interpreted as population average effects.

Given how different the coefficients are, it doesn't make much sense to compare standar errors. Generally speaking, though, they increase as one moves from the logit model to robust standard errors to the estimates based on the random intercept model.

#### Intra-class Correlation

Stata reports the intraclass correlation as 0.636. This coefficient pertains to a latent variable reflecting propensity to belong to a union, rather than manifest union membership. The correlation between this propensity in any two years for the same individual is 0.64. We can also say that 64% of the variance in the propensity to belong to a union can be attributed to individuals.

Using the xtrho command we can compute the correlation in actual union membership in any two years for a woman with a median linear predictor:

. xtrho

Measures of intra-class manifest association in random-effects logit
Evaluated at median linear predictor

Measure          |    Estimate     [95% Conf.Interval]
-----------------+------------------------------------
Marginal prob.   |     .225847     .218973     .232833
Joint prob.      |     .125072     .117026     .133375
Odds ratio       |     8.29305     7.64627     9.00295
Pearson's r      |     .423617       .4039     .443193
Yule's Q         |     .784785     .768686     .800059

We estimate a probability of 23% of belonging to a union in any given year and 12% of belonging in two years, much more than one would expect under independence. The correlation is reflected in an odds ratio of 7.7, so for women at the median the odds of belonging to a union at t_2 are 7.7 times as high for those who belonged to a union at t_1 than for those who didn't. Pearson's r is 0.41 and Yule's Q is 0.77.

These measures can be computed for women whose observed characteristics make then more or less likely to belong to a union by using the detail option:

. xtrho, detail

Measures of intra-class manifest association in random-effects logit
Evaluated with linear predictor set at selected percentiles

Measure          |          p1         p25         p50         p75         p99
-----------------+------------------------------------------------------------
Marginal prob.   |      .10807     .161144     .225847     .251047     .309243
Joint prob.      |     .046724     .079682     .125072      .14408     .190535
Odds ratio       |     10.3122     9.09452     8.29305     8.08413     7.73468
Pearson's r      |      .36357      .39737     .423617     .431096     .444279
Yule's Q         |       .8232     .801873     .784785     .779836     .771028

The correlation as measured by the odds ratio or Yule's Q is higher when women are less likely to belong to a union, but the opposite is true if one uses Pearson's r.

For a more detailed discussion of this post-estimation command see muy paper with Elo in the Stata Journal 3(1):32--46 (2003), available here.

Another example using data on whether births are delivered at hospitals or elsewhere may be found here.