![]() |
|
![]() |
This section is a gentle introduction to programming Stata. We will discuss macros and loops, and will illustrate writing your own (simple) programs. This is a large subject and all we can hope to do here is provide a few tips that hopefully will spark your interest in further study. However, the material covered will help you use Stata more effectively.
Stata 9 has a new and extremely powerful matrix programming language called Mata. This extends the programmer's tools well beyond the macro substitution tools discussed here, but Mata is a subject that deserves separate treatment. Your efforts here will not be wasted, however, because Mata is complementary to, not a complete substitute for, classic Stata programming.
To learn more about programming Stata you should read Chapter 18 in the
User's Guide and then refer to the Programming volume and/or the online
help as needed.
Nick Cox's regular columns in the Stata Journal are a wonderful resource
for learning about Stata.
Other resources were listed in Section 1 of this tutorial.
4.1 Macros
A macro is simply a name associated with some text. Macros can be local or global in scope.
Local macros have names of up to 31 characters and are known only in the current context (the console, a do file, or a program).
You define a local macro using local name [=] text
and you evaluate it using `name'. (Note the use of a backtick or left quote.)
The first variant, without an equal sign, is used to store arbitrary text of up to ~64k characters (up to a million in Stata SE). The text is often enclosed in quotes, but it doesn't have to be.
You need to run a bunch of regression equations that include a standard set of
control variables, say age, agesq, education, and income.
You could, of course, type these names in each equation,
or you could cut and paste the names,
but these alternatives are tedious and error prone.
The smart way is to define a macro
local controls age agesq education income
You then type commands such as
regress outcome treatment `controls'
which in this case is exactly equivalent to typing
regress outcome treatment age agesq education income.
If there's only one regression to run you haven't saved anything, but if you have to run several models with different outcomes or treatments, the macro saves work and ensures consistency.
This approach also has the advantage that if later on your advisor insists
that you should have used log-income rather than income as a control,
all you need to do is change the macro definition at the top of your do file,
say to read logincome instead of income
and all subsequent models will be run with income properly logged
(assuming these variables exist).
Warning: Evaluating a macro that doesn't exist is not an error;
it just returns an empty string.
So be careful to spell macro names correctly.
If you type regress outcome treatment `contrls',
Stata will read regress outcome treatment, because the macro contrls
does not exist.
The same would happen if you type `control' because macro names cannot
be abbreviated the way variable names can.
Either way, the regression will run without any controls.
But you always check your output, right?
Suppose you are working with a demographic survey where age has been grouped
in five-year groups and ends up being represented by seven dummies, say
age15to19 to age45to49, six of which will be used in your regressions.
Define a macro
local age "age20to24 age25to29 age30to39 age35to39 age40to44 age45to49"
and then in your regression models use something like
regress ceb `age' urban
which is not only shorter and more readable, but also closer to what you
intend, which is to regress ceb on "age",
which happens to be a bunch of dummies.
This also makes it easier to change the representation of age;
if you later decide to use linear and quadratic terms instead of the
six dummies all you do is define local age "age agesq" and rerun your models.
Note that the first occurrence of age here is the name of the macro and the
second is the name of a variable. I used quotes to make the code clearer.
Stata never gets confused.
For syntax lawyers only: if a macro includes macro evaluations, these are resolved at the time the macro is created, not when it is evaluated.
local initials My initials are `myinitials' local myinitials GR display "`initials'"
You might expect the output to be "My initials are GR" but it won't,
because the local macro myinitials was empty when initials was
created. Changing the order fixes the problem but defeats the purpose.)
The second type of macro definition local name = text, with an equal sign
is used to store results.
It instructs Stata to treat the text on the right hand side as an expression,
evaluate it, and store a text representation of the result under the given name.
Suppose you just run a regression and want to store the resulting R-squared,
for comparison with a later regression. You know that regress stores R-squared in
e(r2), so you think local rsq e(r2) would do the trick.
But it doesn't. Your macro stored the formula e(r2), as
you can see by typing display "`rsq'". What you needed to store was the value.
The solution is to type local rsq = e(r2), with an equal sign.
This causes Stata to evaluate the expression and store the result.
To see the difference try this
sysuse auto, clear regress mpg weight local rsqf e(r2) local rsqv = e(r2) di `rsqf' // this has the current R-squared di `rsqv' // as does this regress mpg weight foreign di `rsqf' // the formula has the new R-squared di `rsqv' // this guy has the old one
Another way to force evaluation is to enclose e(r2) in single quotes
when you define the macro. This is called a macro expression, and is
also useful when you want to display results. It allows us to type
display "R-squared=`rsqv'" instead of display "R-squared=" `rsq'.
(What do you think would happen if you type display "``rsqf''"?)
An alternative way to store results for later use is to use scalars
(type help scalars to learn more.) This has the advantage that Stata
stores the result in binary form without loss of precision. A macro
stores a text representation that is good only for about 8 digits.
The downside is that scalars are in the global namespace, so there is
a potential for name conflicts, particular in programs (unless you
use temporary names, which we discuss later).
You can use an equal sign when you are storing text, but let me tell
why this is not always a good idea.
We could have said local controls = "age agesq education income" and
this would have worked. Note the use of quotes to ensure that the
right hand side is an expression, in this case a string.
And therein lies the problem: strings are limited to 244 characters
(used to be 80 in Intercooled Stata before 9.1),
whereas macro text can be much longer as noted above
(type help limits to be reminded).
Global macros have names of up to 32 characters and, as the name indicates, have global scope.
You define a global macro using global name [=] text and evaluate it
using $name. (You may need to use ${name} to clarify where the name ends.)
I suggest you avoid global macros because of the potential for name conflicts. A useful application, however, is to map the function keys on your keyboard. If you work on a shared network folder with a long name try something like this
global F5 \\server\shared\research\project\subproject\
Then when you hit F5 Stata will substitute the full name. And your do files
can use commands like do ${F5}dofile. (We need the braces to indicate that
the macro is called F5, not F5dofile.)
Obviously you don't want to type this macro each time you use Stata.
Solution? Enter it in your profile.do file, a set of commands that is
executed each time you run Stata. Your profile is best stored in Stata's
start-up directory, usually C:\data. Type help profilew to learn more.
Macros can also be used to obtain and store information about the system or the variables in your dataset using extended macro functions. For example you can retrieve variable and value labels, a feature that can come handy in programming.
There are also commands to manage your collection of macros, including
macro list and macro drop. Type help macro to learn more.