Guatemalan Data

This page makes available the two datasets used in our 2001 JRSS paper These data are also part of the analysis in our 1996 Demography paper.

The datasets are available in a zipped archive rggudat.zip (40 KB). They are also available individually in uncompressed form as guImmun.dat (115 KB) and guPrenat.dat (160 KB). Descriptions of these datasets follow:

Immunization

The first dataset (guImmun.dat)refers to complete immunization among children receiving any immunization. It has 2159 observations on 19 variables. The very first line is a header with variable names, so the file can be read into R or S-Plus using read.table(filename,header=T). The variables include child, family and community id numbers, the outcome coded 0-1, and a set of individual, family and community variables used as predictors. These appear in exactly the same order as Table 2 in the JRSS-A paper:

ColumnVariableNotes
1 kid child id (2159 kids)
2 mom family id (1595 families)
3 cluster cluster id (161 communities)
4 immun whether fully immunized (1=yes, 0=no)
5 kid2p child aged 2+ years
6 mom25p mother aged 25+ years
7 order23 birth order 2-3
8 order46 birth order 4-6
9 order7p birth order 7+
10 indNoSpa indigenous, speaks no spanish
11 indSpa indigenous, speaks spanish
12 momEdPri mother's education primary
13 momEdSec mother's education secondary+
14 husEdPri husband's education primary
15 husEdSec husband's education secondary+
16 husEdDK husband's education missing
17 momWork mother ever worked
18 rural rural residence
19 pcInd81 proportion indigenous in 1981

The last predictor is a continuous variable. All others are 0-1 dummy variables, representing discrete factors coded using the reference cell method. The omitted categories are child aged 1 year, mother's age less than 25, birth order 1, ladino, mother with no education, husband with no education, mother never worked, and urban residence.

Prenatal Care

The second dataset (guPrenat.dat) refers to use of modern prenatal care among women using some form of prenatal care. It has 2449 observations on 25 variables. The first line is a header with variable names, so the file can be read into R or S-Plus using read.table(filename,header=T). The variables include level ids, the outcome, and individual, family and community-level predictors. These appear in the same order as Table 3 in the JRSS-A paper.

ColumnVariableNotes
1kidchild id (2449 kids)
2momfamily id (1558 families)
3clustercluster id (161 communities)
4prenatused modern prenatal care (1=yes, 0=no)
5kid3pchild aged 3-4 years
6mom25pmother aged 25+ years
7order23birth order 2-3
8order46birth order 4-6
9order7pbirth order 7+
10indNoSpaindigenous, speaks no spanish
11inSpaindigenous, speaks spanish
12momEdPrimother's education primary
13momEdSecmother's education secondary+
14husEdPrihusband's education primary
15husEdSechusband's education secondary+
16husEdDKhusband's education missing
17husProfhusband professional, sales, clerical
18husAgrSelfhusband agricultural self-employed
19husAgrEmphusband agricultural employee
20husSkilledhusband skilled service
21toiletmodern toilet in household
22tvNotDailytelevision not watched daily
23tvDailytelevision watched daily
24pcInd81proportion indigenous in 1981
25ssDistdistance to nearest clinic

All predictors are either continuous variables (numbers 24 and 25) or 0-1 dummy variables (all others) representing discrete factors coded using the reference cell method. Omitted categories are child aged 0-2, mother aged <25, birth order 1, ladino, mother with no education, husband with no education, husband not working or in unskilled occupation, no modern toilet in household, and no television in the household.

 

If you have any questions or comments, please send e-mail to
grodri@princeton.edu