3. Stata Graphics
Stata has excellent graphic facilities, accessible through the
graph command, see
help graph for an overview. The most common graphs in statistics are X-Y plots showing points or lines. These are available in Stata through the
twoway subcommand, which in turn has 31 sub-subcommands or plot types, the most important of which are
line. We will also describe briefly bar plots, available through the
subcommand, and other plot types.
Stata 10 introduced a graphics editor that can be used to modify a graph interactively. We do not recomment this practice, however, because it conflicts with the goals of documenting and ensuring reproducibility of all the steps in your research.
In this section we will illustrate a few plots using the data on fertility decline first used in Section 2.1. To read the data from net-aware Stata type
infile str14 country setting effort change /// using http://data.princeton.edu/wws509/datasets/effort.raw
To whet your appetite, here's the plot that we will produce in this section:
3.1.1 A Simple Scatterplot
To produce a simple scatterplot of fertility change by social setting you use the command
graph twoway scatter change setting
Note that you specify y first, then x. Stata labels the axes using the
variable labels, if they are defined, or variable names if not.
The command may be abbreviated to
twoway scatter, or just
scatter if that is the only plot on the graph.
We will now add a few bells and whistles.
3.1.2 Fitted Lines
Suppose we want to show the fitted regression line as well.
In some packages you would need to run a regression, compute the fitted line, and then plot it. Stata can do all that in one step using the
lfit plot type. (There is also a
qfit plot for quadratic fits.) This can be combined with the scatter plot by enclosing each
sub-plot in parenthesis. (One can also combine plots using two horizontal bars
||, but I find the method using parentheses clearer.)
graph twoway (scatter setting effort) /// (lfit setting effort)
Now suppose we wanted to put confidence bands around the regression line.
Stata can do this with the
lfitci plot type, which draws the confidence region as a gray band. (There is also a
qfitci band for quadratic fits.)
Because the confidence band can obscure some points we draw the region first and the points later
graph twoway (lfitci setting effort) /// (scatter setting effort)
Note that this command doesn't label the y-axis but uses a legend instead. You could specify a label for the y-axis using the
ytitle() option, and omit the (rather obvious) legend using
Here we specify both as options to the
To make the option more obvious to the reader I put the comma at the start of a new line:
graph twoway (lfitci setting effort) /// (scatter setting effort) /// , ytitle("Fertility Decline") legend(off)
3.1.3 Labeling Points
There are many options that allow you to control the markers used for the points, including their shape and color, see
It is also possible to label the points using text included in another variable, using the
mlabel(varname) option. In the next step we add the country names to the plot:
graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) )
One slight problem with the labels is the overlap of Costa Rica and Trinidad Tobago (and to a lesser extent Panama and Nicaragua). We can solve this problem by specifying the position of the label relative to the marker using a 12-hour clock (so 12 is above, 3 is to the right, 6 is below and 9 is to the left) and the mlabv() option. We create a variable to hold the position set by default to 3 o'clock and then move Costa Rica to 9 o'clock and Trinidad Tobago to just a bit above that at 11 o'clock (we can also move Nicaragua and Panama up a bit, say to 2 o'clock):
gen pos=3 replace pos = 11 if country == "TrinidadTobago" replace pos = 9 if country == "CostaRica" replace pos = 2 if country == "Panama" | country == "Nicaragua"The graph then looks as follows
graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) )
3.1.4 Titles, Legends and Captions
There are options that apply to all two-way graphs, including titles, labels, and legends. Stata graphs can have a
subtitle(), usually at the top, and a
caption(), usually at the bottom, type
help title_options to learn more. Usually a title is all you need. Stata 11 allows text in graphs to include bold, italics, greek letters, mathematical symbols, and a choice of fonts, type
help graph text to learn more.
Our final tweak to the graph will be to add a legend to specify the linear
fit and 95% confidence interval, but not fertility decline itself. We do this
order(2 "linear fit" 1 "95% CI") option
of the legend to label the second and first items in that order.
We also use
ring(0) to move the legend inside the plotting area,
pos(5) to place the legend box near the 5 o'clock position.
Our complete command is then
graph twoway (lfitci change setting) /// (scatter change setting, mlabel(country) mlabv(pos) ) /// , title("Fertility Decline by Social Setting") /// ytitle("Fertility Decline") /// legend(ring(0) pos(5) order(2 "linear fit" 1 "95% CI"))
The result is the graph shown at the beginning of this section.
(If your graph looks slightly different it's probably because we used
different color schemes. I used a customized version of
See 3.2.5 below for more information about schemes.)
3.1.5 Axis Scales and Labels
There are options that control the scaling and range of the axes, including
yscale(), which can be arithmetic, log, or reversed, type
help axis_scale_options to learn more.
Other options control the placing and labeling of major and minor ticks and labels,
such as as
and similarly for the y-axis, see
Usually the defaults are acceptable, but it's nice to know
that you can change them.
3.2 Line Plots
I will illustrate line plots using data on U.S. life expectancy, available as one of the datasets shipped with Stata. (Try
sysuse dir to see what else is available.)
The idea is to plot life expectancy for white and black males over the 20th century. Again, to whet your appetite I'll start by showing you the final product, and then we will build the graph bit by bit.
3.2.1 A Simple Line Plot
The simplest plot uses all the defaults:
graph twoway line le_wmale le_bmale year
If you are puzzled by the dip before 1920, Google "US life expectancy 1918".
We could abbreviate the command to
line if that's all we are plotting.
(This shortcut only works for
line plot allows you to specify more than one "y" variable,
the order is y1, y2, ..., ym, x.
In our example we specified two, corresponding to white and black life
expectancy. Alternatively, we could have used two line plots:
(line le_wmale year) (line le_bmale year).
3.2.2 Titles and Legends
The default graph is quite good, but the legend seems too wordy. We will move most of the information to the title and keep only ethnicity in the legend:
graph twoway line le_wmale le_bmale year /// , title("U.S. Life Expectancy") subtitle("Males") /// legend( order(1 "white" 2 "black") )
Here we used three options, which as usual in Stata go after a comma:
legend option has many sub options; we used
order to list the keys and their labels, saying that the first line represented whites and the second blacks. To omit a key you just leave it out of the list. To add text without a matching key use a hyphen (or minus sign) for the key. There are many other legend options, see
help legend_option to learn more.
We would like to use space a bit better by moving the legend inside the plot area, say around the 5 o'clock position, where improving life expectancy has left some spare room.
As noted earlier we can move the legend inside the plotting area by using
ring(0), the "inner circle", and place it near the 5 o'clock position using
pos(5). Because these are legend sub-options they have to go inside
graph twoway line le_wmale le_bmale year /// , title("U.S. Life Expectancy") subtitle("Males") /// legend( order(1 "white" 2 "black") ring(0) pos(5) )
3.2.3 Line Styles
I don't know about you, but I find hard to distinguish the default lines on the plot. Stata lets you control the line style in different ways. The
clstyle() option lets you use a named style, such as
p1-p15 for the styles used by lines 1 to 15, see
help linestyle. This is useful if you want to pick your style elements from
a scheme, as noted further below.
Alternatively, you can specify the three components of a style: the line pattern, width and color:
- Patterns are specified using the
clpattern()option. The most common patterns are
help linepatternstylefor more information.
- Line widthis specified using
clwidth(); the available options include
thick, see h
elp linewidthstylefor more.
- Colors can be specified using the
clcolor()option using color names (such as
sienna, and many others) or RGB values, see
Here's how to specify blue for whites and red for blacks:
graph twoway (line le_wmale le_bmale year , clcolor(blue red) ) /// , title("U.S. Life Expectancy") subtitle("Males") /// legend( order(1 "white" 2 "black") ring(0) pos(5))
clcolor() is an option of the line plot, so we put parentheses round the
line command and inserted it there.
3.2.4 Scale Options
It looks as if improvements in life expectancy slowed down a bit in the second half of the century. This can be better appreciated using a log scale, where a straight line would indicate a constant percent improvement. This is easily done using the axis options of the two-way command, see
help axis_options, and in particular
yscale(), which lets you choose
reversed scales. There's also a suboption
range() to control the plotting range. Here I will specify the y-range as 25 to 80 to move the curves a bit up:
graph twoway (line le_wmale le_bmale year , clcolor(blue red) ) /// , title("U.S. Life Expectancy") subtitle("Males") /// legend( order(1 "white" 2 "black") ring(0) pos(5)) /// yscale(log range(25 80))
3.2.5 Graph Schemes
Stata uses schemes to control the appearance of graphs, see
help scheme. You can set the default scheme
to be used in all graphs with
set scheme_name. You can
also redisplay the (last) graph using a different scheme with
graph display, scheme(scheme_name).
To see a list of available schemes type
graph query, schemes. Try
s2color for screen graphs,
s1manual for the style used in the Stata manuals, and
economist for the style used in The Economist. Using the latter we obtain the graph shown at the
start of this section.
graph display, scheme(economist)
3.3 Managing Graphs
Stata keeps track of the last graph you have drawn, which is stored in memory,
and calls it "Graph". You can actually keep more than one graph in memory
if you use the
name() option to name the graph when you create it.
This is useful for combining graphs, type
help graph combine to learn more.
Note that graphs kept in memory disappear when you exit Stata, even if you save the data,
unless you save the graph itself.
To save the current graph on disk using Stata's own format, type
graph save filename.
This command has two options,
replace, which you need to use if the file already exists, and
asis, which freezes the graph (including its current style)
and then saves it.
The default is to save the graph in a live format that can be edited in
future sessions, for example by changing the scheme.
After saving a graph in Stata format you can load it from the disk
with the command
graph use filename.
graph save and
are analogous to
use for Stata files.)
Any graph stored in memory can be displayed using
(You can also list, describe, rename, copy, or drop graphs stored in memory,
help graph_manipulation to learn more.)
If you plan to incorporate the graph in another document you will probably
need to save it in a more portable format.
graph export filename can export the graph
using a wide variety of vector or raster formats, which is usually understood from
the file extension.
Vector formats such as Windows metafile (wmf or emf) or
Adobe's PostScript and its variants (ps, eps, pdf) contain
essentially drawing instructions and are thus resolution independent, so they
are best for inclusion in other documents where they may be resized.
Raster formats such as Portable Network Graphics (png) save the image
pixel by pixel using the current display resolution, and are best for inclusion
in web pages.
You can also print a graph using
graph print, or copy and paste
it into a document using the Windows clipboard;
to do this right click on the window containing the graph and then
select copy from the context menu.
Continue with 4. Programming Stata