Home | GLMs | Multilevel | Survival | Demography | Stata | R
Home Lecture Notes Stata Logs Datasets Problem Sets

Problem Set 3: Refugee Appeals in Canada
Due Friday October 28, 2011

Greene and Shaffer (1992), cited in Fox (1997), analyzed decisions by the Canadian Federal Court of Appeals on cases filed by refugee applicants who had been turned down by the Immigration and Refugee Board. We are interested in the fact that there are large differences among judges.

It is possible, of course, that the judges get to hear very different cases. In our analysis we will control for an expert assessment of whether the case had merit, the city where the original application was filed (Toronto, Montreal or other) and the language in which it was filed (English, French). An additional predictor is the logit of the success rate for all cases from the applicant's country. The country itself is also available.

Following Fox we restrict attention to the 10 judges who were present on the Court the entire period of the study and to countries of origin that produced at least 20 cases The data are available in the datasets section of the course website as the Stata system file greene.dta. There are 384 observations on 8 variables, most of which are strings.

. use http://data.princeton.edu/wws509/datasets/greene
(Refugee Appeals in Canada)

. desc, short

Contains data from http://data.princeton.edu/wws509/datasets/greene.dta
  obs:           384                          Refugee Appeals in Canada
 vars:             8                          17 Oct 2006 20:50
 size:        19,584                          
Sorted by:  

[1] Differences by Judge

(a) Create a variable granted to indicate whether leave to appeal was granted. This will be our outcome. (Decision is coded 'yes' or 'no'.)

. gen granted = decision == "yes"

. sum granted

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     granted |       384     .296875     .457477          0          1

(b) Tabulate the decisions by judge with row percents to verify that the proportion of cases where appeal is granted ranges from 10.3 for Iacobucci to 60% for Marceau.

You can use tab judge decision, row, but I think the following command produces nicer output:

. table judge, contents(mean granted) 

--------------------------
     judge | mean(granted)
-----------+--------------
Desjardins |      .5217391
     Heald |      .3055556
  Hugessen |      .1935484
 Iacobucci |      .1034483
 MacGuigan |      .2428571
   Mahoney |      .4333333
   Marceau |            .6
    Pratte |      .1428571
     Stone |      .2424242
      Urie |      .4545455
--------------------------

(c) Generate dummy variables for the judges and fit a logistic regression model of granted on the judge dummies. Use Iacobucci as the reference judge to facilitate the comparisons that follow.

I created the dummies using the levelsof command to get a list of judges and Stata's foreach to loop, but a series of commands such as gen Desjardins = judge == "Desjardins" would do just as well. Note that I removed Iacobucci from the list to reuse the macro in the model.

. levelsof judge, local(judges) clean
Desjardins Heald Hugessen Iacobucci MacGuigan Mahoney Marceau Pratte Stone Urie

. foreach judge of local judges {
  2.     gen byte `judge' = judge == "`judge'"
  3. }

. local judges = subinstr("`judges'","Iacobucci","",1)

. display `"`judges'"'
Desjardins Heald Hugessen  MacGuigan Mahoney Marceau Pratte Stone Urie

. logit granted `judges'

Iteration 0:   log likelihood = -233.54619  
Iteration 1:   log likelihood = -213.88943  
Iteration 2:   log likelihood = -213.34827  
Iteration 3:   log likelihood = -213.34487  
Iteration 4:   log likelihood = -213.34487  

Logistic regression                               Number of obs   =        384
                                                  LR chi2(9)      =      40.40
                                                  Prob > chi2     =     0.0000
Log likelihood = -213.34487                       Pseudo R2       =     0.0865

------------------------------------------------------------------------------
     granted |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  Desjardins |   2.246495   .6774334     3.32   0.001     .9187503     3.57424
       Heald |   1.338503   .7090161     1.89   0.059    -.0511427     2.72815
    Hugessen |   .7323676   .6892954     1.06   0.288    -.6186265    2.083362
   MacGuigan |   1.022405   .6704374     1.52   0.127    -.2916277    2.336438
     Mahoney |    1.89122   .7124194     2.65   0.008     .4949036    3.287536
     Marceau |   2.564949   .7337993     3.50   0.000     1.126729    4.003169
      Pratte |   .3677245   .7524887     0.49   0.625    -1.107126    1.842575
       Stone |    1.02005   .7326628     1.39   0.164     -.415943    2.456042
        Urie |   1.977162   .8593378     2.30   0.021     .2928913    3.661434
       _cons |  -2.159484   .6097498    -3.54   0.000    -3.354572   -.9643964
------------------------------------------------------------------------------

(d) Interpret the coefficient for judge Marceau. Verify that the predicted probability for this judge is 60%, in agreement with the crosstab.

. di exp(_b[Marceau])
12.999996

. di invlogit(_b[_cons] + _b[Marceau])
 .6

The odds that judge Marceau will grant appeal are 13 times the odds that judge Iacobucci will. (The odds ratio is 13.) The predicted probability is indeed 60%.

(e) Test the hypothesis that the probabiity of granting leave to appeal is the same for all judges using (i) a likelihood ratio test, (ii) a Wald test.

The likelihood ratio test requires comparing this model with the null model, which is precisely what the model chi-squared does, so we get a LR chi-squared of 40.4 on 9 d.f., which is highly significant. The Wald statistic, shown below, is 36.84 on the same 9 d.f., also highly significant. We reject the hypothesis that the judges have the same probability of granting appeal.

. test `judges' 

 ( 1)  [granted]Desjardins = 0
 ( 2)  [granted]Heald = 0
 ( 3)  [granted]Hugessen = 0
 ( 4)  [granted]MacGuigan = 0
 ( 5)  [granted]Mahoney = 0
 ( 6)  [granted]Marceau = 0
 ( 7)  [granted]Pratte = 0
 ( 8)  [granted]Stone = 0
 ( 9)  [granted]Urie = 0

           chi2(  9) =   36.84
         Prob > chi2 =    0.0000

[2] Adjusting for Merit, City, Language and Country

(a) Fit a model with the judge dummies and the following controls: (i) an indicator for whether the expert thought the case had merit, (ii) city indicators using 'Other City' as the reference cell, (iii) a language dummy with French as omitted category, and (iv) the logit of the country's success rate, as a way to control for country short of using 16 dummies.

We build the variables and fit the model

. gen hasMerit = merit == "yes"

. gen Montreal = city == "Montreal"

. gen Toronto = city == "Toronto"

. gen english = language == "English"

. logit granted hasMerit Montreal Toronto english lsr `judges'

Iteration 0:   log likelihood = -233.54619  
Iteration 1:   log likelihood = -182.00686  
Iteration 2:   log likelihood = -177.96049  
Iteration 3:   log likelihood = -177.91526  
Iteration 4:   log likelihood = -177.91522  
Iteration 5:   log likelihood = -177.91522  

Logistic regression                               Number of obs   =        384
                                                  LR chi2(14)     =     111.26
                                                  Prob > chi2     =     0.0000
Log likelihood = -177.91522                       Pseudo R2       =     0.2382

------------------------------------------------------------------------------
     granted |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    hasMerit |   1.404937   .2747502     5.11   0.000     .8664363    1.943437
    Montreal |  -1.194297   .6776086    -1.76   0.078    -2.522385    .1337916
     Toronto |  -.2451519    .381211    -0.64   0.520    -.9923117    .5020079
     english |   .1938402   .6028116     0.32   0.748    -.9876489    1.375329
         lsr |   1.608784   .3015521     5.34   0.000     1.017753    2.199815
  Desjardins |   2.700312   .7272995     3.71   0.000     1.274831    4.125793
       Heald |   1.337069   .7478763     1.79   0.074    -.1287419    2.802879
    Hugessen |    1.20252    .778368     1.54   0.122    -.3230534    2.728093
   MacGuigan |   1.412501   .7154485     1.97   0.048     .0102478    2.814754
     Mahoney |   1.858221    .748769     2.48   0.013     .3906604    3.325781
     Marceau |   3.772253   .8416708     4.48   0.000     2.122609    5.421898
      Pratte |   .6992399   .8134871     0.86   0.390    -.8951656    2.293645
       Stone |   1.038858     .76319     1.36   0.173    -.4569668    2.534683
        Urie |   2.628744   .9290055     2.83   0.005     .8079263    4.449561
       _cons |    -1.1807    .959169    -1.23   0.218    -3.060637    .6992365
------------------------------------------------------------------------------

(b) How would you justify the use of the logit of the country's success rate as a control, as opposed to country dummies? No need to fit any models, we want to keep this short. If you do, you'll have to figure out what to do with Fiji, India and Poland.

The simple way would be to compare this model with a model that uses country dummies instead, and in fact we could do a test that the single d.f. captures all the country variation. In case you are curious it doesn't, but it gets the gist of it. One complication is that some countries have 0 or 100% success rates, so Stata drops them from the regression!

(c) Interpret briefly the coefficients of merit and of Montreal in this model in terms of odds ratios.

. di exp(_b[hasMerit])
4.0752689

. di exp(_b[Montreal])
.30291687

The odds ratio is about four, so the odds that a judge will grant appeal are three times higher when the expert thinks the case has merit than when it does not, everything else being equal. Cases heard in Montreal have 70% lower odds of success than comparable cases heard in places other than Toronto and Montreal.

(d) Interpret the coefficient for judge Marceau in this model, explaining carefully what it means to adjust for the other predictors. Has the adjustment reduced the contrast with Iacobucci?

. di exp(_b[Marceau])
43.47793

No, in fact it has increased the discrepancy. Jugde Marceau's odds of granting appeal are 43 times judge Iacobucci's odds when we compare cases with the same merit, city, language and success rate of the country of origin. (The adjustment relies on the technical assumption that these variables have additive and, in the case of the logit of the success rate linear, effects on the logit of the probability of granting appeal.)

(e) Test the hypothesis that the probability of granting leave is the same for all judges after adjusting for the control variables using (i) a likelihood ratio test, and (ii) a Wald test.

The likelihood ratio test requires comparing this model with a model that excludes the judge dummies. We save our estimates before fitting the simpler model, and restore them afterwards

. estimates store additive

. quietly logit granted hasMerit Montreal Toronto english lsr

. di e(chi2)
61.603101

. lrtest . additive

Likelihood-ratio test                                 LR chi2(9)  =     49.66
(Assumption: . nested in additive)                    Prob > chi2 =    0.0000

. estimates restore additive
(results additive are active now)

. test `judges'

 ( 1)  [granted]Desjardins = 0
 ( 2)  [granted]Heald = 0
 ( 3)  [granted]Hugessen = 0
 ( 4)  [granted]MacGuigan = 0
 ( 5)  [granted]Mahoney = 0
 ( 6)  [granted]Marceau = 0
 ( 7)  [granted]Pratte = 0
 ( 8)  [granted]Stone = 0
 ( 9)  [granted]Urie = 0

           chi2(  9) =   41.96
         Prob > chi2 =    0.0000

The LR test gives a chi-squared of 49.7 on 9 d.f. The Wald test is 42.0 on the same 9 d.f. Both tests are significant, so we reject the hypothesis that the probability of granting appeal when we compare equivalent cases is the same for all judges. (I displayed the model chi-squared, so you can verify that the test can also be obtained as a difference in model chi-squareds.

[3] Adjusted Probabilities

(a) One way to present the results to a non-technical audience is to compute predicted probabilities for each judge setting all other variables in the model to their means. Do this calculation from first principles for Iacobucci and for Marceau and interpret the results.

I will run a simple loop by predictor to compute the mean and multiply it by the coefficient. The inverse logit is the probability for the reference judge, Iacobucci. Adding the Marceau coefficient gives us the probability for Marceau.

. scalar eta = _b[_cons]

. foreach x in hasMerit Montreal Toronto english lsr {
  2.     quietly sum `x'
  3.         scalar eta = eta + _b[`x'] * r(mean)
  4. }

. display invlogit(eta)
.058958

. display invlogit(eta + _b[Marceau])
.73146949

We predict that for the average case, the probablity that judge Iacobucci will grant appeal is 5.9%, whereas the probability for judge Marceau is 73%, a difference of 67 percentage points. (The approximation based on the derivative evaluated at the overall probability of 29.7% is 79 percentage points.)

(b) One problem with the previous approach is that setting a dummy to its mean is not very meaningful (0.36 Montreal?) Here's an alternative approach. Save your data. Set all judge dummies to zero, predict, and compute the mean probability. This is what the model predicts if Iacobucci ruled in all cases. Now set the Marceau dummy to one, predict and average the probabilities. This is what the model predicts if Marceau ruled in all cases. Compare results.

. local judges Desjardins Heald Hugessen  MacGuigan Mahoney Marceau Pratte Stone Urie

. foreach judge of local judges {
  2.     quietly replace `judge' = 0
  3. }

. predict pIacobucci
(option pr assumed; Pr(granted))

. quietly replace Marceau = 1

. predict pMarceau
(option pr assumed; Pr(granted))

. sum pIacobucci pMarceau

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  pIacobucci |       384    .0965598    .0959264   .0032091    .469063
    pMarceau |       384    .6887541    .2285615   .1227876   .9746264

The model implies that if all cases were heard by Iacobucci 9.7% would be granted leave to appeal, compared to 68.9% if they were heard by Marceau, a difference of 59 percentage points.

(c) Stata's powerful margin command can do these calculations for you. Treat the judge as a factor variable (this requires integer codes), and fit the model. Estimate the judge margins with the option at(means) and compare with 3.a. Estimate the judge margins with the default option and compare with 3.b. Now you know what this command does.

. encode judge, gen(jcode)

. table judge, contents(mean jcode) // a key to judge codes

------------------------
     judge | mean(jcode)
-----------+------------
Desjardins |           1
     Heald |           2
  Hugessen |           3
 Iacobucci |           4
 MacGuigan |           5
   Mahoney |           6
   Marceau |           7
    Pratte |           8
     Stone |           9
      Urie |          10
------------------------

. logit granted hasMerit Montreal Toronto english lsr i.jcode

Iteration 0:   log likelihood = -233.54619  
Iteration 1:   log likelihood = -182.00686  
Iteration 2:   log likelihood = -177.96049  
Iteration 3:   log likelihood = -177.91526  
Iteration 4:   log likelihood = -177.91522  
Iteration 5:   log likelihood = -177.91522  

Logistic regression                               Number of obs   =        384
                                                  LR chi2(14)     =     111.26
                                                  Prob > chi2     =     0.0000
Log likelihood = -177.91522                       Pseudo R2       =     0.2382

------------------------------------------------------------------------------
     granted |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    hasMerit |   1.404937   .2747502     5.11   0.000     .8664363    1.943437
    Montreal |  -1.194297   .6776086    -1.76   0.078    -2.522385    .1337916
     Toronto |  -.2451519    .381211    -0.64   0.520    -.9923117    .5020079
     english |   .1938402   .6028116     0.32   0.748    -.9876489    1.375329
         lsr |   1.608784   .3015521     5.34   0.000     1.017753    2.199815
             |
       jcode |
          2  |  -1.363243   .5365546    -2.54   0.011    -2.414871   -.3116155
          3  |  -1.497792   .5289346    -2.83   0.005    -2.534485   -.4610995
          4  |  -2.700312   .7272995    -3.71   0.000    -4.125793   -1.274831
          5  |  -1.287811   .4616725    -2.79   0.005    -2.192672   -.3829496
          6  |  -.8420913   .5348913    -1.57   0.115    -1.890459    .2062763
          7  |   1.071941    .596727     1.80   0.072    -.0976221    2.241505
          8  |  -2.001072   .5955593    -3.36   0.001    -3.168347   -.8337974
          9  |  -1.661454   .5565166    -2.99   0.003    -2.752207   -.5707016
         10  |  -.0715685   .7539317    -0.09   0.924    -1.549248    1.406111
             |
       _cons |   1.519612   .8277227     1.84   0.066    -.1026948    3.141919
------------------------------------------------------------------------------

. margin jcode, atmeans

Adjusted predictions                              Number of obs   =        384
Model VCE    : OIM

Expression   : Pr(granted), predict()
at           : hasMerit        =    .3385417 (mean)
               Montreal        =     .359375 (mean)
               Toronto         =    .4973958 (mean)
               english         =    .6588542 (mean)
               lsr             =   -1.020439 (mean)
               1.jcode         =    .1197917 (mean)
               2.jcode         =      .09375 (mean)
               3.jcode         =    .1614583 (mean)
               4.jcode         =    .0755208 (mean)
               5.jcode         =    .1822917 (mean)
               6.jcode         =     .078125 (mean)
               7.jcode         =    .0651042 (mean)
               8.jcode         =     .109375 (mean)
               9.jcode         =    .0859375 (mean)
               10.jcode        =    .0286458 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       jcode |
          1  |   .4825445   .0836218     5.77   0.000     .3186487    .6464402
          2  |   .1926171   .0667174     2.89   0.004     .0618533    .3233808
          3  |   .1725527   .0582151     2.96   0.003     .0584532    .2866522
          4  |    .058958   .0362154     1.63   0.104     -.012023     .129939
          5  |   .2046206   .0524187     3.90   0.000     .1018819    .3073594
          6  |   .2866026   .0869682     3.30   0.001     .1161481    .4570572
          7  |   .7314695    .095734     7.64   0.000     .5438343    .9191047
          8  |   .1119553    .049242     2.27   0.023     .0154427    .2084678
          9  |   .1504208   .0579656     2.59   0.009     .0368102    .2640314
         10  |   .4647041   .1691983     2.75   0.006     .1330815    .7963267
------------------------------------------------------------------------------

. margin jcode

Predictive margins                                Number of obs   =        384
Model VCE    : OIM

Expression   : Pr(granted), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       jcode |
          1  |   .4948775   .0641441     7.72   0.000     .3691573    .6205977
          2  |   .2507893   .0626393     4.00   0.000     .1280185    .3735601
          3  |   .2306667   .0599959     3.84   0.000     .1130768    .3482565
          4  |   .0965598   .0499085     1.93   0.053     -.001259    .1943786
          5  |   .2624906   .0493004     5.32   0.000     .1658635    .3591177
          6  |   .3371647   .0726959     4.64   0.000     .1946834    .4796459
          7  |   .6887541    .077986     8.83   0.000     .5359044    .8416038
          8  |   .1644451   .0572912     2.87   0.004     .0521565    .2767338
          9  |    .207536   .0597046     3.48   0.001     .0905171     .324555
         10  |   .4811424    .129872     3.70   0.000     .2265979    .7356869
------------------------------------------------------------------------------

The margins command reproduces the results of 3.a with the atmeans option and those of 3.b with the default. Our code could easily be extended to do all the judges, but margins is easier. (I just wish it would use labels in the output.)

[4] Goodness of Fit

(a) Test the goodness of fit of the model of part [2] and [3] using Stata's default setting for the post-estimation command estat gof. Can we trust the asymptotics given the number of distinct covariate patterns relative to the number of observations?

. estat gof

Logistic model for granted, goodness-of-fit test

       number of observations =       384
 number of covariate patterns =       211
            Pearson chi2(196) =       243.51
                  Prob > chi2 =         0.0118

The chi-squared of 243.5 on 196 d.f. is significant, with a p-value just above one percent, indicatin gthat the model does not fit the data. Note, however, that we have 211 covariate patters with 384 observations, and average of less than two observations per pattern, hardly a large sample, so we can't trust the asymptotics.

(b) Compute the Hosmer-Lemeshow goodness of fit test using 10 groups based on deciles of predicted probabilities. Feel better now?

. estat gof, group(10)

Logistic model for granted, goodness-of-fit test

  (Table collapsed on quantiles of estimated probabilities)

       number of observations =       384
             number of groups =        10
      Hosmer-Lemeshow chi2(8) =         4.35
                  Prob > chi2 =         0.8240

With ten groups, which by default are constructed on the basis of quantiles of predicted probabilities, we get a chi-squared of 4.35 with a p-value of 0.82, so we have no evidence of lack of fit. Similar results are obtained with 20 grous. I do feel better now.

Another approach to examining goodness of fit, which we skipped in the interest of time, is to consider interactions. I though merit might interact with other variables, but it doesn't. The largest interaction I found involved the city dummies and the logit of the country's success rate. With a LR chi-squared of 1.30 on 2 d.f. this was hardly significant. All other interactions were smaller than this, so we have no evidence against the additive model.