![]() |
|
![]() |
While it is fun to type commands interactively and see the results straightaway, serious work requires that you save your results and keep track of the commands that you have used, so that you can document your work and reproduce it later if needed. Here are some practical recommendations.
Stata reads and saves data from the working directory, usually
C:\DATA, unless you specify otherwise.
You can change directory using the command cd [drive:]directory_name,
and print the (name of the) working directory using pwd,
type help cd for details.
I recommend that you create a separate directory for each course or research project
you are involved in,
and start your Stata session by changing to that directory.
Stata understands nested directory structures
and doesn't care if you use \ or / to separate directories. Version 9 also
understands the double slash used in Windows to refer to a computer,
so you can cd \\opr\shares\research\myProject to
access a shared project folder. An alternative approach, which also works
in earlier versions, is to use Windows explorer to assign a drive letter to
the project folder,
for example assign P: to \\opr\shares\research\myProject
and then in Stata use cd p:.
Alternatively, you may assign R: to \\opr\shares\research and then
use cd R:\myProject,
a more convenient solution if you work in several projects.
Stata has other commands for interacting with the operating system, including
mkdir to create a directory,
dir to list the names of the files in a directory,
type to list their contents,
copy to copy files,
and erase to delete a file. You can (and probably should) do these tasks using the operating system directly,
but the Stata commands may come handy if you want to write a program to perform repetitive tasks.
So far all our output has gone to the Results window,
where it can be viewed but eventually disappears.
(You can control how far you can scroll back, type help scrollbufsize
to learn more.)
To keep a permanent record of your results, however,
you should log your session.
When you open a log, Stata writes all results to both the Results window and
to the file you specify. To open a log file use the command
log using filename, text replace
where filename is the name of your log file.
Note the use of two recommended options: text and replace.
By default the log is written using SMCL, Stata Markup and Control Language (pronounced "smicle"),
which provides some formatting facilities but can only be viewed using Stata's Viewer.
Fortunately, there is a text option to create logs in plain text (ASCII) format,
which can be viewed in an editor such as Notepad or a word processor such as Word.
(An alternative is to create your log in SMCL and then use the translate command
to convert it to plain text, postscript, or even PDF if you are a Mac user,
type help translate to learn more about this option.)
The replace option specifies that the file is to be overwritten if
it already exists. This will often be the case if (like me) you need to run your
commands several times to get them right. In fact, if an earlier run has failed
it is likely that you have a log file open,
in which case the log command will fail.
The solution is to close any open logs using the log close command.
The problem with this solution is that it will not work if there is no log open!
The way out of the catch 22 is to use
capture log close
The capture keyword tells Stata to run the command that follows and
ignore any errors.
Use judiciously!
A do file is just a set of Stata commands typed in a plain text file. You can use Stata's own built-in do-file Editor, which has the great advantage that you can run your program directly from the editor by clicking on the run icon or selecting Tools|Run from the menu. You can also select just a few commands and run them by selecting Tools|Run Selection in the menu.
Alternatively, you can use an editor such as Notepad. Save the file using
extension .do and then execute it using the do filename
command.
For a thorough discussion of alternative text editors see
http://fmwww.bc.edu/repec/bocode/t/textEditors.html,
a page maintained by Nicholas J. Cox, of the University of Durham.
You could even use a word processor such as Word, but you would have to remember to save the file in plain text format, not in Word document format. Also, you may find Word's insistence on capitalizing the first word on each line annoying when you are trying to type Stata commands that must be in lowercase. You can, of course, turn auto-correct off. But it's a lot easier to just use a plain-text editor.
Code that looks obvious to you may not be so obvious to a co-worker, or even to you a few months later. It is always a good idea to annotate your do files with explanatory comments that provide the gist of what you are trying to do.
In the Stata command window you can start a line with a * to indicate that it is a comment, not a command. This can be useful to annotate your output.
In a do file you can also use two other types of comments: // and /* */
// is used to indicate that everything that follows to the end of the line
is a comment and should be ignored by Stata. For example you could write
gen one = 1 // this will serve as a constant in the model
/* */ is used to indicate that all the text between the opening
/* and the closing */, which may be a few characters or may span several lines,
is a comment to be ignored by Stata. This type of comment can be used anywhere,
even in the middle of a line, and is sometimes used to "comment out" code.
There is a third type of comment used to break very long lines, as explained in the
next subsection. Type help comments to learn more about comments.
It is always a good idea to start every do file with comments that include at least a title, the name of the programmer who wrote the file, and the date. Assumptions about required files should also be noted.
When you are typing on the command window a command can be as long as needed. In a do-file you will probably want to break long commands into lines to improve readability.
To indicate to Stata that a command continues on the next line you use ///,
which says everything else to the end of the line is a comment and
the command itself continues on the next line. For example you could write
graph twoway (scatter lexp loggnppc) ///
(lfit lexp loggnppc)
Old hands might write
graph twoway (scatter lexp loggnppc) /*
*/ (lfit lexp loggnppc)
which "comments out" the end of the line.
An alternative is to tell Stata to use a semi-colon instead of the carriage return
at the end of the line to mark the end of a command, using #delimit ;,
as in this example:
#delimit ;
graph twoway (scatter lexp loggnppc)
(lfit lexp loggnppc) ;
Now all commands need to terminate with a semi-colon. To return to using carriage return as the delimiter use
#delimit cr
The delimiter can only be changed in do files. But then you always use do files, right?
Here's a simple do file that can reproduce all the results in our Quick Tour. The file doesn't have many comments because it refers to a web page for more details. Following the listing we comment on a couple of lines that require explanation.
// A Quick Tour of Stata // German Rodriguez - Fall 2005 // See http://data.princeton.edu/stata/s11.html version 9.1 // boilerplate clear capture log close log using QuickTour, text replace display 2+2 display 2 * ttail(20,2.1) sysuse lifeexp desc summarize lexp gnppc list country gnppc if missing(gnppc) graph twoway scatter lexp gnppc graph export scatter.png, replace // save the graph in PNG format gen loggnppc = log(gnppc) regress lexp loggnppc predict plexp graph twoway (scatter lexp loggnppc) (lfit lexp loggnppc) graph export fit.png, replace list country lexp plexp if lexp < 55, clean list gnppc loggnppc lexp plexp if country == "United States", clean log close // make sure you hit enter for the last line
We start the do file by specifying the version of Stata we are using, in this case
9.1.
This helps ensure that future versions of Stata will continue to interpret the
commands correctly, even if Stata has changed, see help version for details.
(Last year this file read version 8.2 and I could have left that in place
to run under version control; the results would be the same because none of the commands
used has changed.)
The clear statement deletes (practically) everything in memory.
It is there to make sure we start with a clean slate, and of course you will not
always want to include it. In this case if we had to
rerun the program the sysuse command would fail because we already
have a dataset in memory and it has not been saved, so we clear the memory. An alternative would be to say sysuse lifeexp, clear.
Note also that we use a graph export command to convert the
graph in memory to Portable Network Graphics (PNG) format, ready for inclusion
in a web page. To include a graph in a Word document you are better off cutting and
pasting a graph in Windows Metafile format, as explained in Section 3.
The note on the last line is to remind you that by default Stata uses the (invisible) carriage return at the end of the line as the command delimiter. If you haven't pressed return after the last line, the entire line will usually be ignored by Stata.
Having used a few Stata commands it may be time to comment briefly on their structure, which usually follows the following syntax, where bold indicates keywords and square brackets indicate optional elements:
[by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [using filename] [, options]
We now describe each syntax element:describe and Describe are different,
and only the former will work. Commands can usually be abbreviated as noted earlier.
When we introduce a command we underline the letters that are required.
For example regress indicates that the regress command
can be abbreviated to reg.describe lexp or regress lexp loggnppc.
Variable names are case sensitive. lexp and LEXP are
different variables. A variable name can be abbreviated to the minimum number of letters
that makes it unique in a dataset. For example in our quick tour we could refer to
loggnppc as log because it is the only variable that begins
with those three letters, but this is a really bad idea.
Abbreviations that are unique may become ambiguous as you create new variables,
so you have to be very careful.
You can also use wildcards such as v* or
name ranges, such as v101-v105 to refer
to several variables. Type help varlist to lear more about variable lists.
generate log_gnp = log(gnp),
include an arithmetic expression, basically a formula using the standard operators
(+ - * and / for the four basic operations and ^ for exponentiation, so 3^2 is three squared),
functions, and parentheses. We discuss expressions in Section 2.
lexp < 55. Relational
operators are <, <=, ==, >= and >, and logical negation is expressed using ! or
~, as we will see in Section 2.
Alternatively, you can specify a range of the data, for example in 1/10 will restrict
the command's action to the first 10 observations. Type help numlist to learn
more about lists of numbers.
help weights to learn more.
using introduces a file name; this can be a file in your computer,
on the network, or on the internet, as you will see when we discuss data input in Section 2.
help command
where command is the actual command name.