Problem Set 3: Refugee Appeals in Canada
Due Friday October 28, 2011
Greene and Shaffer (1992), cited in Fox (1997), analyzed decisions by the Canadian Federal Court of Appeals on cases filed by refugee applicants who had been turned down by the Immigration and Refugee Board. We are interested in the fact that there are large differences among judges.
It is possible, of course, that the judges get to hear very different cases. In our analysis we will control for an expert assessment of whether the case had merit, the city where the original application was filed (Toronto, Montreal or other) and the language in which it was filed (English, French). An additional predictor is the logit of the success rate for all cases from the applicant's country. The country itself is also available.
Following Fox we restrict attention to the 10 judges who were present on the
Court the entire period of the study and to countries of origin that produced
at least 20 cases
The data are available in the datasets section of the course website as the
Stata system file greene.dta.
There are 384 observations on 8 variables, most of which are strings.
. use http://data.princeton.edu/wws509/datasets/greene (Refugee Appeals in Canada) . desc, short Contains data from http://data.princeton.edu/wws509/datasets/greene.dta obs: 384 Refugee Appeals in Canada vars: 8 17 Oct 2006 20:50 size: 19,584 Sorted by:
[1] Differences by Judge
(a) Create a variable granted to indicate whether leave to appeal
was granted. This will be our outcome. (Decision is coded 'yes' or 'no'.)
. gen granted = decision == "yes"
. sum granted
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
granted | 384 .296875 .457477 0 1
(b) Tabulate the decisions by judge with row percents to verify that the proportion of cases where appeal is granted ranges from 10.3 for Iacobucci to 60% for Marceau.
You can use tab judge decision, row, but I think the following
command produces nicer output:
. table judge, contents(mean granted)
--------------------------
judge | mean(granted)
-----------+--------------
Desjardins | .5217391
Heald | .3055556
Hugessen | .1935484
Iacobucci | .1034483
MacGuigan | .2428571
Mahoney | .4333333
Marceau | .6
Pratte | .1428571
Stone | .2424242
Urie | .4545455
--------------------------
(c) Generate dummy variables for the judges and fit a logistic regression model of granted on the judge dummies. Use Iacobucci as the reference judge to facilitate the comparisons that follow.
I created the dummies using the levelsof command to get a list
of judges and Stata's foreach to loop, but a series of commands
such as gen Desjardins = judge == "Desjardins" would do just as
well. Note that I removed Iacobucci from the list to reuse the macro in
the model.
. levelsof judge, local(judges) clean
Desjardins Heald Hugessen Iacobucci MacGuigan Mahoney Marceau Pratte Stone Urie
. foreach judge of local judges {
2. gen byte `judge' = judge == "`judge'"
3. }
. local judges = subinstr("`judges'","Iacobucci","",1)
. display `"`judges'"'
Desjardins Heald Hugessen MacGuigan Mahoney Marceau Pratte Stone Urie
. logit granted `judges'
Iteration 0: log likelihood = -233.54619
Iteration 1: log likelihood = -213.88943
Iteration 2: log likelihood = -213.34827
Iteration 3: log likelihood = -213.34487
Iteration 4: log likelihood = -213.34487
Logistic regression Number of obs = 384
LR chi2(9) = 40.40
Prob > chi2 = 0.0000
Log likelihood = -213.34487 Pseudo R2 = 0.0865
------------------------------------------------------------------------------
granted | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Desjardins | 2.246495 .6774334 3.32 0.001 .9187503 3.57424
Heald | 1.338503 .7090161 1.89 0.059 -.0511427 2.72815
Hugessen | .7323676 .6892954 1.06 0.288 -.6186265 2.083362
MacGuigan | 1.022405 .6704374 1.52 0.127 -.2916277 2.336438
Mahoney | 1.89122 .7124194 2.65 0.008 .4949036 3.287536
Marceau | 2.564949 .7337993 3.50 0.000 1.126729 4.003169
Pratte | .3677245 .7524887 0.49 0.625 -1.107126 1.842575
Stone | 1.02005 .7326628 1.39 0.164 -.415943 2.456042
Urie | 1.977162 .8593378 2.30 0.021 .2928913 3.661434
_cons | -2.159484 .6097498 -3.54 0.000 -3.354572 -.9643964
------------------------------------------------------------------------------
(d) Interpret the coefficient for judge Marceau. Verify that the predicted probability for this judge is 60%, in agreement with the crosstab.
. di exp(_b[Marceau]) 12.999996 . di invlogit(_b[_cons] + _b[Marceau]) .6
The odds that judge Marceau will grant appeal are 13 times the odds that judge Iacobucci will. (The odds ratio is 13.) The predicted probability is indeed 60%.
(e) Test the hypothesis that the probabiity of granting leave to appeal is the same for all judges using (i) a likelihood ratio test, (ii) a Wald test.
The likelihood ratio test requires comparing this model with the null model, which is precisely what the model chi-squared does, so we get a LR chi-squared of 40.4 on 9 d.f., which is highly significant. The Wald statistic, shown below, is 36.84 on the same 9 d.f., also highly significant. We reject the hypothesis that the judges have the same probability of granting appeal.
. test `judges'
( 1) [granted]Desjardins = 0
( 2) [granted]Heald = 0
( 3) [granted]Hugessen = 0
( 4) [granted]MacGuigan = 0
( 5) [granted]Mahoney = 0
( 6) [granted]Marceau = 0
( 7) [granted]Pratte = 0
( 8) [granted]Stone = 0
( 9) [granted]Urie = 0
chi2( 9) = 36.84
Prob > chi2 = 0.0000
[2] Adjusting for Merit, City, Language and Country
(a) Fit a model with the judge dummies and the following controls: (i) an indicator for whether the expert thought the case had merit, (ii) city indicators using 'Other City' as the reference cell, (iii) a language dummy with French as omitted category, and (iv) the logit of the country's success rate, as a way to control for country short of using 16 dummies.
We build the variables and fit the model
. gen hasMerit = merit == "yes"
. gen Montreal = city == "Montreal"
. gen Toronto = city == "Toronto"
. gen english = language == "English"
. logit granted hasMerit Montreal Toronto english lsr `judges'
Iteration 0: log likelihood = -233.54619
Iteration 1: log likelihood = -182.00686
Iteration 2: log likelihood = -177.96049
Iteration 3: log likelihood = -177.91526
Iteration 4: log likelihood = -177.91522
Iteration 5: log likelihood = -177.91522
Logistic regression Number of obs = 384
LR chi2(14) = 111.26
Prob > chi2 = 0.0000
Log likelihood = -177.91522 Pseudo R2 = 0.2382
------------------------------------------------------------------------------
granted | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hasMerit | 1.404937 .2747502 5.11 0.000 .8664363 1.943437
Montreal | -1.194297 .6776086 -1.76 0.078 -2.522385 .1337916
Toronto | -.2451519 .381211 -0.64 0.520 -.9923117 .5020079
english | .1938402 .6028116 0.32 0.748 -.9876489 1.375329
lsr | 1.608784 .3015521 5.34 0.000 1.017753 2.199815
Desjardins | 2.700312 .7272995 3.71 0.000 1.274831 4.125793
Heald | 1.337069 .7478763 1.79 0.074 -.1287419 2.802879
Hugessen | 1.20252 .778368 1.54 0.122 -.3230534 2.728093
MacGuigan | 1.412501 .7154485 1.97 0.048 .0102478 2.814754
Mahoney | 1.858221 .748769 2.48 0.013 .3906604 3.325781
Marceau | 3.772253 .8416708 4.48 0.000 2.122609 5.421898
Pratte | .6992399 .8134871 0.86 0.390 -.8951656 2.293645
Stone | 1.038858 .76319 1.36 0.173 -.4569668 2.534683
Urie | 2.628744 .9290055 2.83 0.005 .8079263 4.449561
_cons | -1.1807 .959169 -1.23 0.218 -3.060637 .6992365
------------------------------------------------------------------------------
(b) How would you justify the use of the logit of the country's success rate as a control, as opposed to country dummies? No need to fit any models, we want to keep this short. If you do, you'll have to figure out what to do with Fiji, India and Poland.
The simple way would be to compare this model with a model that uses country dummies instead, and in fact we could do a test that the single d.f. captures all the country variation. In case you are curious it doesn't, but it gets the gist of it. One complication is that some countries have 0 or 100% success rates, so Stata drops them from the regression!
(c) Interpret briefly the coefficients of merit and of Montreal in this model in terms of odds ratios.
. di exp(_b[hasMerit]) 4.0752689 . di exp(_b[Montreal]) .30291687
The odds ratio is about four, so the odds that a judge will grant appeal are three times higher when the expert thinks the case has merit than when it does not, everything else being equal. Cases heard in Montreal have 70% lower odds of success than comparable cases heard in places other than Toronto and Montreal.
(d) Interpret the coefficient for judge Marceau in this model, explaining carefully what it means to adjust for the other predictors. Has the adjustment reduced the contrast with Iacobucci?
. di exp(_b[Marceau]) 43.47793
No, in fact it has increased the discrepancy. Jugde Marceau's odds of granting appeal are 43 times judge Iacobucci's odds when we compare cases with the same merit, city, language and success rate of the country of origin. (The adjustment relies on the technical assumption that these variables have additive and, in the case of the logit of the success rate linear, effects on the logit of the probability of granting appeal.)
(e) Test the hypothesis that the probability of granting leave is the same for all judges after adjusting for the control variables using (i) a likelihood ratio test, and (ii) a Wald test.
The likelihood ratio test requires comparing this model with a model that excludes the judge dummies. We save our estimates before fitting the simpler model, and restore them afterwards
. estimates store additive
. quietly logit granted hasMerit Montreal Toronto english lsr
. di e(chi2)
61.603101
. lrtest . additive
Likelihood-ratio test LR chi2(9) = 49.66
(Assumption: . nested in additive) Prob > chi2 = 0.0000
. estimates restore additive
(results additive are active now)
. test `judges'
( 1) [granted]Desjardins = 0
( 2) [granted]Heald = 0
( 3) [granted]Hugessen = 0
( 4) [granted]MacGuigan = 0
( 5) [granted]Mahoney = 0
( 6) [granted]Marceau = 0
( 7) [granted]Pratte = 0
( 8) [granted]Stone = 0
( 9) [granted]Urie = 0
chi2( 9) = 41.96
Prob > chi2 = 0.0000
The LR test gives a chi-squared of 49.7 on 9 d.f. The Wald test is 42.0 on the same 9 d.f. Both tests are significant, so we reject the hypothesis that the probability of granting appeal when we compare equivalent cases is the same for all judges. (I displayed the model chi-squared, so you can verify that the test can also be obtained as a difference in model chi-squareds.
[3] Adjusted Probabilities
(a) One way to present the results to a non-technical audience is to compute predicted probabilities for each judge setting all other variables in the model to their means. Do this calculation from first principles for Iacobucci and for Marceau and interpret the results.I will run a simple loop by predictor to compute the mean and multiply it by the coefficient. The inverse logit is the probability for the reference judge, Iacobucci. Adding the Marceau coefficient gives us the probability for Marceau.
. scalar eta = _b[_cons]
. foreach x in hasMerit Montreal Toronto english lsr {
2. quietly sum `x'
3. scalar eta = eta + _b[`x'] * r(mean)
4. }
. display invlogit(eta)
.058958
. display invlogit(eta + _b[Marceau])
.73146949
We predict that for the average case, the probablity that judge Iacobucci will grant appeal is 5.9%, whereas the probability for judge Marceau is 73%, a difference of 67 percentage points. (The approximation based on the derivative evaluated at the overall probability of 29.7% is 79 percentage points.)
(b) One problem with the previous approach is that setting a dummy to its mean is not very meaningful (0.36 Montreal?) Here's an alternative approach. Save your data. Set all judge dummies to zero, predict, and compute the mean probability. This is what the model predicts if Iacobucci ruled in all cases. Now set the Marceau dummy to one, predict and average the probabilities. This is what the model predicts if Marceau ruled in all cases. Compare results.
. local judges Desjardins Heald Hugessen MacGuigan Mahoney Marceau Pratte Stone Urie
. foreach judge of local judges {
2. quietly replace `judge' = 0
3. }
. predict pIacobucci
(option pr assumed; Pr(granted))
. quietly replace Marceau = 1
. predict pMarceau
(option pr assumed; Pr(granted))
. sum pIacobucci pMarceau
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
pIacobucci | 384 .0965598 .0959264 .0032091 .469063
pMarceau | 384 .6887541 .2285615 .1227876 .9746264
The model implies that if all cases were heard by Iacobucci 9.7% would be granted leave to appeal, compared to 68.9% if they were heard by Marceau, a difference of 59 percentage points.
(c) Stata's powerful margin command can do these calculations
for you. Treat the judge as a factor variable (this requires integer codes),
and fit the model. Estimate the judge margins with the option at(means) and
compare with 3.a.
Estimate the judge margins with the default option and compare with 3.b.
Now you know what this command does.
. encode judge, gen(jcode)
. table judge, contents(mean jcode) // a key to judge codes
------------------------
judge | mean(jcode)
-----------+------------
Desjardins | 1
Heald | 2
Hugessen | 3
Iacobucci | 4
MacGuigan | 5
Mahoney | 6
Marceau | 7
Pratte | 8
Stone | 9
Urie | 10
------------------------
. logit granted hasMerit Montreal Toronto english lsr i.jcode
Iteration 0: log likelihood = -233.54619
Iteration 1: log likelihood = -182.00686
Iteration 2: log likelihood = -177.96049
Iteration 3: log likelihood = -177.91526
Iteration 4: log likelihood = -177.91522
Iteration 5: log likelihood = -177.91522
Logistic regression Number of obs = 384
LR chi2(14) = 111.26
Prob > chi2 = 0.0000
Log likelihood = -177.91522 Pseudo R2 = 0.2382
------------------------------------------------------------------------------
granted | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hasMerit | 1.404937 .2747502 5.11 0.000 .8664363 1.943437
Montreal | -1.194297 .6776086 -1.76 0.078 -2.522385 .1337916
Toronto | -.2451519 .381211 -0.64 0.520 -.9923117 .5020079
english | .1938402 .6028116 0.32 0.748 -.9876489 1.375329
lsr | 1.608784 .3015521 5.34 0.000 1.017753 2.199815
|
jcode |
2 | -1.363243 .5365546 -2.54 0.011 -2.414871 -.3116155
3 | -1.497792 .5289346 -2.83 0.005 -2.534485 -.4610995
4 | -2.700312 .7272995 -3.71 0.000 -4.125793 -1.274831
5 | -1.287811 .4616725 -2.79 0.005 -2.192672 -.3829496
6 | -.8420913 .5348913 -1.57 0.115 -1.890459 .2062763
7 | 1.071941 .596727 1.80 0.072 -.0976221 2.241505
8 | -2.001072 .5955593 -3.36 0.001 -3.168347 -.8337974
9 | -1.661454 .5565166 -2.99 0.003 -2.752207 -.5707016
10 | -.0715685 .7539317 -0.09 0.924 -1.549248 1.406111
|
_cons | 1.519612 .8277227 1.84 0.066 -.1026948 3.141919
------------------------------------------------------------------------------
. margin jcode, atmeans
Adjusted predictions Number of obs = 384
Model VCE : OIM
Expression : Pr(granted), predict()
at : hasMerit = .3385417 (mean)
Montreal = .359375 (mean)
Toronto = .4973958 (mean)
english = .6588542 (mean)
lsr = -1.020439 (mean)
1.jcode = .1197917 (mean)
2.jcode = .09375 (mean)
3.jcode = .1614583 (mean)
4.jcode = .0755208 (mean)
5.jcode = .1822917 (mean)
6.jcode = .078125 (mean)
7.jcode = .0651042 (mean)
8.jcode = .109375 (mean)
9.jcode = .0859375 (mean)
10.jcode = .0286458 (mean)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
jcode |
1 | .4825445 .0836218 5.77 0.000 .3186487 .6464402
2 | .1926171 .0667174 2.89 0.004 .0618533 .3233808
3 | .1725527 .0582151 2.96 0.003 .0584532 .2866522
4 | .058958 .0362154 1.63 0.104 -.012023 .129939
5 | .2046206 .0524187 3.90 0.000 .1018819 .3073594
6 | .2866026 .0869682 3.30 0.001 .1161481 .4570572
7 | .7314695 .095734 7.64 0.000 .5438343 .9191047
8 | .1119553 .049242 2.27 0.023 .0154427 .2084678
9 | .1504208 .0579656 2.59 0.009 .0368102 .2640314
10 | .4647041 .1691983 2.75 0.006 .1330815 .7963267
------------------------------------------------------------------------------
. margin jcode
Predictive margins Number of obs = 384
Model VCE : OIM
Expression : Pr(granted), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
jcode |
1 | .4948775 .0641441 7.72 0.000 .3691573 .6205977
2 | .2507893 .0626393 4.00 0.000 .1280185 .3735601
3 | .2306667 .0599959 3.84 0.000 .1130768 .3482565
4 | .0965598 .0499085 1.93 0.053 -.001259 .1943786
5 | .2624906 .0493004 5.32 0.000 .1658635 .3591177
6 | .3371647 .0726959 4.64 0.000 .1946834 .4796459
7 | .6887541 .077986 8.83 0.000 .5359044 .8416038
8 | .1644451 .0572912 2.87 0.004 .0521565 .2767338
9 | .207536 .0597046 3.48 0.001 .0905171 .324555
10 | .4811424 .129872 3.70 0.000 .2265979 .7356869
------------------------------------------------------------------------------
The margins command reproduces the results of 3.a with the
atmeans option and those of 3.b with the default. Our code
could easily be extended to do all the judges, but margins
is easier. (I just wish it would use labels in the output.)
[4] Goodness of Fit
(a) Test the goodness of fit of the model of part [2] and [3] using Stata's
default setting for the post-estimation command estat gof.
Can we trust the asymptotics given the number of distinct covariate patterns
relative to the number of observations?
. estat gof
Logistic model for granted, goodness-of-fit test
number of observations = 384
number of covariate patterns = 211
Pearson chi2(196) = 243.51
Prob > chi2 = 0.0118
The chi-squared of 243.5 on 196 d.f. is significant, with a p-value just above one percent, indicatin gthat the model does not fit the data. Note, however, that we have 211 covariate patters with 384 observations, and average of less than two observations per pattern, hardly a large sample, so we can't trust the asymptotics.
(b) Compute the Hosmer-Lemeshow goodness of fit test using 10 groups based on deciles of predicted probabilities. Feel better now?
. estat gof, group(10)
Logistic model for granted, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
number of observations = 384
number of groups = 10
Hosmer-Lemeshow chi2(8) = 4.35
Prob > chi2 = 0.8240
With ten groups, which by default are constructed on the basis of quantiles of predicted probabilities, we get a chi-squared of 4.35 with a p-value of 0.82, so we have no evidence of lack of fit. Similar results are obtained with 20 grous. I do feel better now.
Another approach to examining goodness of fit, which we skipped in the interest of time, is to consider interactions. I though merit might interact with other variables, but it doesn't. The largest interaction I found involved the city dummies and the logit of the country's success rate. With a LR chi-squared of 1.30 on 2 d.f. this was hardly significant. All other interactions were smaller than this, so we have no evidence against the additive model.
