![]() |
|
![]() | |||
|
|
|||||
Stata has several commands that can be used to fit logistic regression
models by maximum likelihood.
The basic commands are logit for individual data and
blogit for grouped data.
There is also a logistic command that presents the results
in terms of odd-ratios instead of log-odds and can produce a variety of
summary and diagnostic statistics.
Finally, one can fit a logistic regression model as a special case
of a generalized linear model with Bernoulli or binomial errors and
link logit, using the glm command.
We will have occassion to use all commands, but we will emphasize the
first two which are simpler.
Following the lecture notes we will consider comparing two groups and then move on to more than two.
Consider the data on contraceptive use by desire for more children on Table 3.2 (page 14 of the notes). We can read these data into Stata as 2 observations. To make life easier I will enter desire for more children as a dummy variable that takes the value 1 for women who want no more children
. input nomore users n
nomore users n
1. 0 219 972
2. 1 288 635
3. end
Fitting the null model is easy. Note that with blogit
you specify the outcome in terms of
the number of 'successes' and the binomial denominator,
here users and n.
. blogit users n
Logit estimates Number of obs = 1607
LR chi2(0) = 0.00
Prob > chi2 = .
Log likelihood = -1001.8468 Pseudo R2 = 0.0000
------------------------------------------------------------------------------
_outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | -.7745545 .0536794 -14.43 0.000 -.8797641 -.6693448
------------------------------------------------------------------------------
The estimate of the constant is simply the logit of the overall proportion using contraception, say y/n, and the standard error is the square root of 1/y + 1/(n-y). You may want to check these results by hand.
Stata's blogit does not calculate the model deviance,
but we can obtain it 'by hand' using predict to obtain
fitted counts:
. predict yhat (option n assumed; predicted no. of cases) . gen di = 2*( users*log(users/yhat) + (n-users)*log((n-users)/(n-yhat)) ) . gen DI = sum(di) . display "Deviance = " DI[_N] Deviance = 91.674393
Try a similar method to calculate Pearson's chi-squared. It's 92.64.
Alternatively, you can fit the model using glm, which
reports both the deviance and Pearson's chi-squared by default.
Try glm users, family(binomial n).
Let us now fit the (saturated) model with want no more children as a predictor.
. blogit users n nomore
Logit estimates Number of obs = 1607
LR chi2(1) = 91.67
Prob > chi2 = 0.0000
Log likelihood = -956.00957 Pseudo R2 = 0.0458
------------------------------------------------------------------------------
_outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nomore | 1.048629 .110672 9.48 0.000 .831716 1.265542
_cons | -1.234993 .0767739 -16.09 0.000 -1.385468 -1.084519
------------------------------------------------------------------------------
Note the estimate of the effect of wanting no more children, about one in the logit scale. Thus, the odds of using contraception among women who want no more children are triple those of women who want more.
The z-statistic is as reported on page 16 of the notes. Let us square it:
. di (_b[nomore]/_se[nomore])^2 89.777635
This is Wald's chi-squared statistic.
Another way to get this value is using Stata's test command.
Try test nomore.
The chi2 statistic reported by Stata on the second line
of output is the likelihood ratio chi-squared comparing the current
model with the null model.
Can you explain why we get the same 91.67 we had before?
Hint: What's the deviance for this model?
A third test of the effect of want no more is given by Pearson's chi-squared, which is 92.64. Note that all three statistics are different, although close in value.
Stata is kind enough to give us a 95% confidence interval for the log-odds. Can you convert that into a 95% CI for the odds-ratio? Simple, we just exponentiate the bounds:
. di exp(0.8315523) "-" exp(1.265706) 2.2968814-3.545595
An even easier way is to type blogit, or.
The blogit command without any variables
retrieves the results of the last fit.
The option or is short for odds-ratios.
Either way, the odds of using contraception
among women who want no more children are 2.3 to 3.5 times
as high as for women who desire more children.
Continue with The Comparison of Several Groups
Copyright © Germán Rodríguez, 1993-2002.
Please send feedback to grodri@princeton.edu