Germán Rodríguez
Stata Markdown Princeton University

Documentation

How It Works

The basic idea is very simple. You type a script that contains a narrative written in Markdown, and include Stata code that is indented one tab or four spaces. For example you could say

Let's read the fuel efficiency data that comes with Stata,
compute gallons per 100 miles, and regress that on weight

    sysuse auto, clear
    gen gphm = 100/mpg
    regress gphm weight
	
We see that heavier cars use more fuel, with 1,000 pounds requiring
an extra 1.6 gallons to travel 100 miles.
This simple rule is all it takes. The markdown command will read this script and generate a web page or PDF document that combines the code with the output and your narrative.

Here's how it's done. The command goes through the script and separates the Markdown code, which goes in a .md file with all Stata code removed, and the Stata code, which goes in a .do file with all annotations removed. We call this the tangle step.

The .md file is just plain Markdown. We use a placeholder of the form {{n}} where n is a chunk number to mark where the Stata code block used to be, so please don't use double braces in your narrative. We run this file through the Pandoc converter to obtain either html or TeX output in a file with the custom extension .pdx.

The .do file is just plain Stata. We insert a comment of the form //_n to mark the beginning of the n-th code chunk, and //_^ to mark the end of the last chunk, so please avoid these patterns if you include comments. We run this file through Stata to obtain a .smcl log.

The next step is to weave together the Markdown and Stata output, using the information in our placeholders and markers, calling Stata commands to help translate SMCL blocks to TeX or plain text which we encode as HTML.

If you are generating HTML you are done! If you want to produce PDF output the command will then run a LaTeX-to-PDF converter, as explained in gettingStarted.

The script file may be edited using Stata's code editor, which has the advantage that you can select and run your Stata code to check that it works or examine results while you are authoring the narrative.

Command Syntax

The syntax of the command is quite simple:

markdown using filename [, pdf mathjax strict ]

The only required argument is the name of the script file. This must have extension .stmd, but the extension does not have to be typed.

If you are producing HTML and do not have complex mathematical equations you don't need any of the options, so let me give you just a brief summary here.

pdf is used if you want to generate a PDF document, which we do via LaTeX, so this option requires additional tooling.

mathjax is use to activate the MathJax JavaScript library, which does an excellent job of rendering mathematical equations on the web.

strict is used to select an alternative strict syntax to distinguish Stata from Markdown code, using code fences instead of the "one tab or four spaces" indentation rule, as explained here.

To facilitate troubleshooting, the key files generated by the command are kept around for examination.

Markdown

Markdown is a lightweight markup language invented by John Gruber. It is easy to write and, more importantly, it was designed to be readable "as is", without intrusive markings.

Gruber's Markdown: Basics has a quick introduction to the notation. There is an ongoing effort to standardize Common Markdown with reference implementations in C and JavaScript, see commonmark.org for details.

The markdown command uses John MacFarlane's Pandoc to convert Markdown to HTML or LaTeX, so you first need to install this converter as explained in Getting Started.

In Markdown you create a heading by "underlining" the title with === for level 1 and --- for level 2. You can also define a heading at levels one to six by starting a line with one to six hashmarks, as in ### a level 3 heading.

You define a paragraph break by leaving a blank line. If you need a line break leave two or more spaces at the end of the line, or end the line with a backslash \, a Pandoc extension.

To indicate emphasis using italics wrap the text using an asterisk or underscore, as in *italics*. For strong emphasis using a bold font wrap the text using two asterisks or underscores, as in **bold**. For a monospace font suitable for code use backticks, for example to refer to the regress command type `regress`.

To create a list you start a line with an asterisk *, plus + or minus - sign for a bulleted list or a number follwoed by a period, for example 1., for a numbered list. You add items to a list by starting a line with the same symbol or with a nmbers. Items in ordered lists are numbered consecutively regardeless of the numbers you use. To end the list enter a blank line.

You can link to another document by putting the anchor in square brackets and the link in parentheses, as in [GR's website](http://data.princeton.edu). To link to an image start with a bang, type a title in square brackets and the source in parentheses, as in ![Fuel Efficiency](auto.png).

Markdown lets you include HTML as well, so we could have coded the image as <img src='auto.png' alt='Fuel Efficiency'/> and a line break as <br/>. This is not recommended if the aim is to generate a pdf file.

Strict Code Blocks

The simple "one tab or four spaces" rule to distinguish Stata and Markdown code works well, but precludes some advanced Markdown options such as nested lists. The strict option uses code fences instead, with Stata code blocks defined as

```{s}
  // Stata code goes here
 ```

The braces around the s are optional, so the opening fence can be coded ```s.

You can supress echoing the commands in a Stata block using the syntax

```{s/}
  // Commands here are not echoed
 ```

Of course you can always supress output using quietly.

Code inside fences may be indented to improve readability. The markdown command will remove one level of indentation if present.

Inline Code

You can quote results by including inline code as part of your narrative using the syntax `s [fmt] expression`, where fmt is an optional format, followed by an expression.

The markdown command will generate code to evaluate the expression using Stata's display command, and will splice the output inline with the text.

For example after running a regression you can cite R-squared by coding `s e(r2)`. If you prefer to display the value with 2 decimal places only use `s %5.2f e(r2)`.

Markdown Tables

Markdown does not have a syntax for tables, but Pandoc provides a simple syntax, best explained through an example. The code below shows average fuel efficiency in gallons per 100 miles for foreign and domestic cars before and after adjusting for weight:

Car Type    Unadjusted   Adjusted
--------- ------------ ----------
Foreign        4.31         5.46
Domestic       5.32         4.83

Basically you type the column headers, some underlining, and then the table lining up the columns yourself. The cell alignment is determining by the position of the header relative to the underlining. Our first column is left aligned and the other two are right aligned.

Unfortunately this syntax will not work with inline code because the expressions, the placeholders and the final output may all have different widths. Fortunately Pandoc has an alternative syntax using pipe tables, where columns are separated by the pipe character |, and alignment is indicated by placing a colon in the header underlining. The previous table would be coded as follows:

| Car Type | Unadjusted |  Adjusted|
|:---------|-----------:|---------:|
| Foreign  |   4.31     |   5.46   |
| Domestic |   5.32     |   4.83   |

I lined up the pipe characters for readability but this is not required. Both tables render the same way.

Combining inline code with pipe tables lets us produce dynamic reports. An example generating the above table from scratch may be found here. Another example generating a table of estimates using Jann's esttab command may be found here.