Home | GLMs | Multilevel | Survival | Demography | Stata | R
Home Lecture Notes Stata Logs Datasets Problem Sets

3.5 Models with Two Predictors

We now turn to logistic regression models with two or more predictors.

The Contraceptive use Data

The rest of the analyses in this chapter will use a small dataset showing contraceptive use by age (<25, 35-29, 30-39 and 40-49), education (none, some) and desire for more children (yes, no). The dataset is described on page 31 of the notes and is available in the datasets section of the course website, so we'll read it directly from the web

. clear

. use http://data.princeton.edu/wws509/datasets/cuse
(Contraceptive Use Data (Fiji, 1976))

. list, clean nolabel

       age   educ   desire   cuse     n  
  1.     1      0        0      0    53  
  2.     1      0        0      1     6  
  3.     1      0        1      0    10  
  4.     1      0        1      1     4  
  5.     1      1        0      0   212  
  6.     1      1        0      1    52  
  7.     1      1        1      0    50  
  8.     1      1        1      1    10  
  9.     2      0        0      0    60  
 10.     2      0        0      1    14  
 11.     2      0        1      0    19  
 12.     2      0        1      1    10  
 13.     2      1        0      0   155  
 14.     2      1        0      1    54  
 15.     2      1        1      0    65  
 16.     2      1        1      1    27  
 17.     3      0        0      0   112  
 18.     3      0        0      1    33  
 19.     3      0        1      0    77  
 20.     3      0        1      1    80  
 21.     3      1        0      0   118  
 22.     3      1        0      1    46  
 23.     3      1        1      0    68  
 24.     3      1        1      1    78  
 25.     4      0        0      0    35  
 26.     4      0        0      1     6  
 27.     4      0        1      0    46  
 28.     4      0        1      1    48  
 29.     4      1        0      0     8  
 30.     4      1        0      1     8  
 31.     4      1        1      0    12  
 32.     4      1        1      1    31  

I omitted the labels so you can see the actual numeric codes used. Age is coded 1-4 for the four groups, education is coded as a dummy variable with 1 for 'Some', desire is coded as a dummy variable with 1 for 'Wants no more', and contraceptive use is coded 1 for 'Yes'.

The layout simulates individual data, with one row for each combination of values of the predictors and outcome. The last column is the number of women in that combination of categories, and can be used as a frequency weight in Stata.

For clarity and consistency with earlier work I will create a new variable called nomore.

. gen nomore = desire == 1

The Age Model

Let us start by fitting a model treating age as a factor, building the dummies 'by hand', and saving the estimates for later use under the name "age"

. gen age2529 = age == 2

. gen age30s  = age == 3

. gen age40s  = age == 4

. logit cuse age2529 age30s age40s [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -962.76243  
Iteration 2:   log likelihood = -962.25132  
Iteration 3:   log likelihood = -962.25091  
Iteration 4:   log likelihood = -962.25091  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(3)      =      79.19
                                                  Prob > chi2     =     0.0000
Log likelihood = -962.25091                       Pseudo R2       =     0.0395

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age2529 |   .4606758   .1727255     2.67   0.008     .1221401    .7992116
      age30s |   1.048293   .1544406     6.79   0.000     .7455952    1.350991
      age40s |   1.424638   .1939574     7.35   0.000     1.044488    1.804788
       _cons |  -1.507159   .1302529   -11.57   0.000     -1.76245   -1.251868
------------------------------------------------------------------------------

. estimates store age

. display -2*e(ll)
1924.5018

Note that the parameter estimates, standard errors, and model chi-squared are exactly the same as in Section 3.4, when we worked with only four binomial observations, showing that grouping the data does not affect any of these statistics.

The model deviance, however, is different. In Section 3.4 the deviance was zero because the model fitted exactly the four groups. In the lecture notes I work with eight groups and get a deviance of 66.48. Here we are working essentially with individual data and the deviance is simply negative twice the log-likelihood, in this case 1924.5 on 1603 d.f. With individual data the deviance can no longer be used as a goodness of fit test, but we will consider alternative approaches later.

You may want to verify that we also get the same results for the model with 'wants no more childre' as the sole predictor.

The Additive Model

We are now ready to consider a model with both age and desire for no more children:

. logit cuse age2529 age30s age40s nomore [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -938.32767  
Iteration 2:   log likelihood = -937.40556  
Iteration 3:   log likelihood = -937.40449  
Iteration 4:   log likelihood = -937.40449  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(4)      =     128.88
                                                  Prob > chi2     =     0.0000
Log likelihood = -937.40449                       Pseudo R2       =     0.0643

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age2529 |   .3678306   .1753677     2.10   0.036     .0241163     .711545
      age30s |   .8077888   .1597537     5.06   0.000     .4946773      1.1209
      age40s |   1.022618   .2039339     5.01   0.000     .6229153    1.422322
      nomore |    .824092   .1171128     7.04   0.000     .5945551    1.053629
       _cons |  -1.693933   .1352307   -12.53   0.000    -1.958981   -1.428886
------------------------------------------------------------------------------

. di exp(_b[nomore])
2.2798098

. estimates store additive

Compare the results with Table 3.9 on page 26 of the notes. Exponentiating the coefficient of "nomore" we get an odds ratio of 2.28. This means that the odds of using contraception among women who want no more children are double the odds among women in the same age group who do want more children. The model assumes that the odds ratio is the same for every age group, an assumption we will need to test.

To test the significance of the odds ratio we can use the Wald test given in the output, a z-statistic of 7.04 (which can be squared to obtain a chi-squared statistic on one d.f.) The likelihood ratio test would compare the additive model with the age model, which we saved just so we could do this test. Here are the two tests:

. test nomore

 ( 1)  [cuse]nomore = 0

           chi2(  1) =   49.52
         Prob > chi2 =    0.0000

. lrtest age .

Likelihood-ratio test                                  LR chi2(1)  =     49.69
(Assumption: age nested in additive)                   Prob > chi2 =    0.0000

We obtain chi-squared statistics of 49.5 and 49.7 on one d.f., so there is no doubt that the odds of using contraception in any given age group vary by whether the women want more children.

We verify quickly that we could obtain exactly the same results using factor variables:

. logit cuse i.age nomore [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -938.32767  
Iteration 2:   log likelihood = -937.40556  
Iteration 3:   log likelihood = -937.40449  
Iteration 4:   log likelihood = -937.40449  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(4)      =     128.88
                                                  Prob > chi2     =     0.0000
Log likelihood = -937.40449                       Pseudo R2       =     0.0643

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
          2  |   .3678306   .1753677     2.10   0.036     .0241163     .711545
          3  |   .8077888   .1597537     5.06   0.000     .4946773      1.1209
          4  |   1.022618   .2039339     5.01   0.000     .6229153    1.422322
             |
      nomore |    .824092   .1171128     7.04   0.000     .5945551    1.053629
       _cons |  -1.693933   .1352307   -12.53   0.000    -1.958981   -1.428886
------------------------------------------------------------------------------

You could use i.nomore, and Stata would recognize that 'no more' is a dummy variable and report the coefficient as 1.more, but leaving out the i. produces cleaner output.

A Model with an Interaction

We now add an interaction between age and desire for no more children. As usual we compute our own dummies and then fit the model:

. gen nom_age2529 = nomore * age2529

. gen nom_age30s  = nomore * age30s

. gen nom_age40s  = nomore * age40s

. logit cuse age2529 age30s age40s nomore ///
>    nom_age2529 nom_age30s nom_age40s [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -930.01061  
Iteration 2:   log likelihood = -929.01056  
Iteration 3:   log likelihood = -929.01009  
Iteration 4:   log likelihood = -929.01009  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(7)      =     145.67
                                                  Prob > chi2     =     0.0000
Log likelihood = -929.01009                       Pseudo R2       =     0.0727

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age2529 |   .3681565   .2009279     1.83   0.067    -.0256549    .7619679
      age30s |   .4506554   .1949898     2.31   0.021     .0684824    .8328283
      age40s |    .397144   .3401461     1.17   0.243      -.26953    1.063818
      nomore |   .0639996   .3303183     0.19   0.846    -.5834125    .7114116
 nom_age2529 |   .2672319   .4091443     0.65   0.514    -.5346762     1.06914
  nom_age30s |   1.090493   .3732853     2.92   0.003     .3588674    1.822119
  nom_age40s |   1.367148   .4834193     2.83   0.005     .4196637    2.314632
       _cons |  -1.519287   .1449654   -10.48   0.000    -1.803414    -1.23516
------------------------------------------------------------------------------

I could have used nom_age* as short-hand for the three dummies, but using wild cards can be dangerous if we later add a variable with a smilar prefix.

The estimates show a difference by desire for more children of 0.06 in the logit scale for women under age 25, with additional differences of 0.27 for women 25-29, 1.09 for women in their thirties and 1.38 for women in their forties. Clearly the difference is larger the older the age group. (You may want to be careful about saying "increases with age" as this is a cross-section, but maybe that is clear enough from the context.)

Exponentiating these numbers we get an odds ratio of 1.07 (or 7% higher odds of using contraception for those who want no more children) among women under 25, which gets multiplied by 1.3 for women 25-29, by about 3 for women in their thirties and by about 4 for women in their forties. Thus, the odds ratio among women in their forties is four times the odds ratio for women under 25.

We can test the significance of the interaction term using a Wald test or computing a likelihood ratio test that compares this model with the additive model saved earlier:

. test nom_age2529 nom_age30s nom_age40s

 ( 1)  [cuse]nom_age2529 = 0
 ( 2)  [cuse]nom_age30s = 0
 ( 3)  [cuse]nom_age40s = 0

           chi2(  3) =   16.03
         Prob > chi2 =    0.0011

. lrtest additive .

Likelihood-ratio test                                  LR chi2(3)  =     16.79
(Assumption: additive nested in .)                     Prob > chi2 =    0.0008

we find the interaction to be significant, with a P-value < 0.001, so the ratio of the odds of using contraception among women who do and do not want another child varies by age.

This is not the same as testing for the effect of preferences for more children. A test for the "main" effect of nomore would just test for women under age 25 and would find no difference. To test at all ages together we can compare this model with the age model, or do a Wald test for all terms involving preferences:

. test nomore nom_age2529 nom_age30s nom_age40s

 ( 1)  [cuse]nomore = 0
 ( 2)  [cuse]nom_age2529 = 0
 ( 3)  [cuse]nom_age30s = 0
 ( 4)  [cuse]nom_age40s = 0

           chi2(  4) =   62.45
         Prob > chi2 =    0.0000

. lrtest age .

Likelihood-ratio test                                  LR chi2(4)  =     66.48
(Assumption: age nested in .)                          Prob > chi2 =    0.0000

Before leaving this section we note that we can obtain exacly the same results using factor variables. Obviously we need to use the i. prefix for age. If you don't say anything about 'no more' but include the hash for the interaction Stata treats it as a factor. I think the output looks a bit cleaner if we remind Stata that this is a dummy using the c. prefix:

. logit cuse i.age##c.nomore [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -930.01061  
Iteration 2:   log likelihood = -929.01056  
Iteration 3:   log likelihood = -929.01009  
Iteration 4:   log likelihood = -929.01009  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(7)      =     145.67
                                                  Prob > chi2     =     0.0000
Log likelihood = -929.01009                       Pseudo R2       =     0.0727

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
          2  |   .3681565   .2009279     1.83   0.067    -.0256549    .7619679
          3  |   .4506554   .1949898     2.31   0.021     .0684824    .8328283
          4  |    .397144   .3401461     1.17   0.243      -.26953    1.063818
             |
      nomore |   .0639996   .3303183     0.19   0.846    -.5834125    .7114116
             |
age#c.nomore |
          2  |   .2672319   .4091443     0.65   0.514    -.5346762     1.06914
          3  |   1.090493   .3732853     2.92   0.003     .3588674    1.822119
          4  |   1.367148   .4834193     2.83   0.005     .4196637    2.314632
             |
       _cons |  -1.519287   .1449654   -10.48   0.000    -1.803414    -1.23516
------------------------------------------------------------------------------

We can test the significance of the interaction using our old friend testparm with just one hash:

. testparm i.age#c.nomore

 ( 1)  [cuse]2.age#c.nomore = 0
 ( 2)  [cuse]3.age#c.nomore = 0
 ( 3)  [cuse]4.age#c.nomore = 0

           chi2(  3) =   16.03
         Prob > chi2 =    0.0011

Can you reproduce the Wald test for all terms involving preferences?

Reparametrizing Interactions

It may be easier to present the result for this model in terms of odds ratios by desire for more children in the different age groups, as discussed on page 27 of the notes, see also Table 3.10.

To this end we add a dummy variable for the interaction at age < 25, and omit the 'main' effect of wanting no more:

. gen nom_agelt25 = nomore * (age==1) 

. logit cuse age2529 age30s age40s ///
>    nom_agelt25 nom_age2529 nom_age30s nom_age40s [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -930.01061  
Iteration 2:   log likelihood = -929.01056  
Iteration 3:   log likelihood = -929.01009  
Iteration 4:   log likelihood = -929.01009  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(7)      =     145.67
                                                  Prob > chi2     =     0.0000
Log likelihood = -929.01009                       Pseudo R2       =     0.0727

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age2529 |   .3681565   .2009279     1.83   0.067    -.0256549    .7619679
      age30s |   .4506554   .1949898     2.31   0.021     .0684824    .8328283
      age40s |    .397144   .3401461     1.17   0.243      -.26953    1.063818
 nom_agelt25 |   .0639996   .3303183     0.19   0.846    -.5834125    .7114116
 nom_age2529 |   .3312314   .2414308     1.37   0.170    -.1419642    .8044271
  nom_age30s |   1.154493   .1738727     6.64   0.000     .8137085    1.495277
  nom_age40s |   1.431148   .3529646     4.05   0.000     .7393498    2.122945
       _cons |  -1.519287   .1449654   -10.48   0.000    -1.803414    -1.23516
------------------------------------------------------------------------------

. mata exp( st_matrix("e(b)")[4..7] )  
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  1.066091954   1.392682073   3.172413793   4.183497537  |
    +---------------------------------------------------------+

The parameters now represent differences in the logit scale between women who do not want more children and those who do in each of the four age groups. Exponentiating the coefficients (which you can also do with the or option) we find that the odds of using contraception for women who want no more children, compared to those who do, are 7% higher at age under 25, 39% higher at ages 25-29, three times as high at age 30-39, and four times as high at age 40-49. The parametrization used here effectively combines the 'main' effect of wanting no more children with the 'additional' effect as one moves to older age groups in a single number for each age group. This leads to a more direct presentation of the results.

A Deviance Table

The deviances in Table 3.8 in the lecture notes are based on a tabulation of contraceptive use by age and desire which has eight groups. The analyses in this log are based on a table that also breaks use by education, with a total of sixteen groups. However, we can easily compute those deviances by comparing each model of interest with the model we just fit.

. estimates store ageXnom

. quietly logit cuse              [fw=n]

. lrtest . ageXnom

Likelihood-ratio test                                  LR chi2(7)  =    145.67
(Assumption: . nested in ageXnom)                      Prob > chi2 =    0.0000

. quietly logit cuse i.age        [fw=n]

. lrtest . ageXnom

Likelihood-ratio test                                  LR chi2(4)  =     66.48
(Assumption: . nested in ageXnom)                      Prob > chi2 =    0.0000

. quietly logit cuse       nomore [fw=n]

. lrtest . ageXnom

Likelihood-ratio test                                  LR chi2(6)  =     54.00
(Assumption: . nested in ageXnom)                      Prob > chi2 =    0.0000

. quietly logit cuse i.age nomore [fw=n]

. lrtest . ageXnom

Likelihood-ratio test                                  LR chi2(3)  =     16.79
(Assumption: . nested in ageXnom)                      Prob > chi2 =    0.0008

You can also verify these results from the log-likelihoods printed for the various models. You should be able to reproduce the tests for the gross effect of age, the net effect of wanting no more children given age, and the interaction term, from these deviances.

Analysis of Covariance Models

We now treat age as a covariate, using the mid-points of the age groups just as we did before:

. recode age 1=20 2=27.5 3=35 4=45, gen(agem)
(32 differences between age and agem)

The first model of interest has a linear effect of age, and is analogous to simple linear regression:

. logit cuse agem [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood =  -963.7238  
Iteration 2:   log likelihood = -963.45262  
Iteration 3:   log likelihood = -963.45258  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(1)      =      76.79
                                                  Prob > chi2     =     0.0000
Log likelihood = -963.45258                       Pseudo R2       =     0.0383

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agem |    .060671   .0071034     8.54   0.000     .0467486    .0745934
       _cons |  -2.672667   .2332492   -11.46   0.000    -3.129827   -2.215507
------------------------------------------------------------------------------

. lrtest . ageXnom

Likelihood-ratio test                                  LR chi2(6)  =     68.88
(Assumption: . nested in ageXnom)                      Prob > chi2 =    0.0000

. estimates store agem

The estimated slope shows that the odds of using contraception increase about six percent per year of age.

The second model of interest includes additive effects of age and desire for no more children, and is analogous to an analysis of covariance model:

. logit cuse agem nomore [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood = -939.19023  
Iteration 2:   log likelihood = -938.50429  
Iteration 3:   log likelihood = -938.50406  
Iteration 4:   log likelihood = -938.50406  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(2)      =     126.69
                                                  Prob > chi2     =     0.0000
Log likelihood = -938.50406                       Pseudo R2       =     0.0632

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agem |   .0441062    .007529     5.86   0.000     .0293497    .0588627
      nomore |   .8258978     .11711     7.05   0.000     .5963664    1.055429
       _cons |  -2.516654   .2365293   -10.64   0.000    -2.980243   -2.053065
------------------------------------------------------------------------------

. di exp(_b[nomore])
2.2839303

. lrtest . ageXnom

Likelihood-ratio test                                  LR chi2(5)  =     18.99
(Assumption: . nested in ageXnom)                      Prob > chi2 =    0.0019

We see that the odds of using contraception are 128% higher among women who want no more children than among women who want more and have the same age. The estimated difference in log-odds when we adjust linearly for age is very similar to that obtained by treating age as a factor (0.826 vs. 0.824).

None of these models fits the data very well, so we add an interaction between age and desire for no more children. As usual, we center age before constructing the interaction

. gen agec = agem - 30.6

. gen nomXagec = nomore * agec

. logit cuse agec nomore nomXagec [fw=n]

Iteration 0:   log likelihood = -1001.8468  
Iteration 1:   log likelihood =   -934.311  
Iteration 2:   log likelihood = -933.57774  
Iteration 3:   log likelihood = -933.57756  
Iteration 4:   log likelihood = -933.57756  

Logistic regression                               Number of obs   =       1607
                                                  LR chi2(3)      =     136.54
                                                  Prob > chi2     =     0.0000
Log likelihood = -933.57756                       Pseudo R2       =     0.0681

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        agec |   .0218229   .0103662     2.11   0.035     .0015055    .0421403
      nomore |    .757509    .121842     6.22   0.000      .518703     .996315
    nomXagec |   .0479913    .015438     3.11   0.002     .0177334    .0782493
       _cons |  -1.194371   .0785969   -15.20   0.000    -1.348418   -1.040323
------------------------------------------------------------------------------

. mata exp(st_matrix("e(b)"))[1..3]
                 1             2             3
    +-------------------------------------------+
  1 |  1.022062801   2.132956375   1.049161578  |
    +-------------------------------------------+

. test nomXagec

 ( 1)  [cuse]nomXagec = 0

           chi2(  1) =    9.66
         Prob > chi2 =    0.0019

The estimates agree with the results in Table 3.12. We see that the odds of using contraception among women who want more children are about two percent higher per year of age. The odds among women who want no more children are double those of women who want more at the mean age (30.6). This ratio is about five percent higher per year of age above the mean (and of course five percent lower per year below the mean).

Another way to look at this result is to note that the odds of using contraception among women who want no more children are about 7 percent higher per year of age. This may be seen more clearly if we parametrize the model using two constants and two slopes

. gen more = 1-nomore

. gen moreXagec = more*agec

. logit cuse more moreXagec nomore nomXagec [fw=n], noconstant

Iteration 0:   log likelihood = -1113.8875  
Iteration 1:   log likelihood = -933.72582  
Iteration 2:   log likelihood = -933.57757  
Iteration 3:   log likelihood = -933.57756  

Logistic regression                               Number of obs   =       1607
                                                  Wald chi2(4)    =     300.69
Log likelihood = -933.57756                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
        cuse |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        more |  -1.194371   .0785969   -15.20   0.000    -1.348418   -1.040323
   moreXagec |   .0218229   .0103662     2.11   0.035     .0015055    .0421403
      nomore |  -.4368615   .0931021    -4.69   0.000    -.6193383   -.2543847
    nomXagec |   .0698143     .01144     6.10   0.000     .0473923    .0922362
------------------------------------------------------------------------------

which is the main body of Table 3.12. Testing for equality of slopes is equivalent to testing the interaction term in the previous specification

. test moreXagec = nomXagec

 ( 1)  [cuse]moreXagec - [cuse]nomXagec = 0

           chi2(  1) =    9.66
         Prob > chi2 =    0.0019

Plotting Observed and Fitted Logits

Time for a plot. We will reproduce Figure 3.3 in the notes, comparing the analysis of covariance model we just fitted, a model with a quadratic effect of age where the curvature is assumed to be the same for the two groups defined by preferences, and the anova type model which was saturated for the age by preferences table and thus represents observed logits.

. predict lfit, xb

. gen agecsq = agec^2

. quietly logit cuse agec agecsq nomore nomXagec[fw=n]

. predict qfit, xb

. estimates restore nomXage
(results nomXage are active now)

. predict obs, xb

. graph twoway (scatter obs agem) ///
>         (line lfit agem if more) (lin lfit agem if nomore) ///
>         (mspline qfit agem if more, bands(4)) ///
>         (mspline qfit agem if nomore, bands(4)) ///
>   ,     title(Figure 3.3: Contraceptive Use by Age and Desire) ///
>     xtitle(age) ytitle(logit) legend(off)

. graph export fig33.png, width(500) replace
(file fig33.png written in PNG format)

Visually the model with some curvature provides a better fit, but we have no evidence that it is in fact better than the model with two straight lines with different slopes.

For purposes of presentation you may consider doing an equivalent plot in the probability scale. Don't forget that a linear relationship in the logit scale will be nonlinear in the probability scale, so you should probably use the function or mspline plot types to reflect the curvature.


Continue with 3.6 Multi-factor Models: Model Selection