Germán Rodríguez
Stata Markdown Princeton University

Documentation

The markstat command comes with a help file, so you can type in Stata help markstat.

Input and Output

The basic idea is very simple. You type a script that contains a narrative written in Markdown, and include Stata code that is indented one tab or four spaces, as in the following example

Let's read the fuel efficiency data that comes with Stata,
compute gallons per 100 miles, and regress that on weight

    sysuse auto, clear
    gen gphm = 100/mpg
    regress gphm weight

We see that heavier cars use more fuel, with 1,000 pounds requiring
an extra 1.4 gallons to travel 100 miles.

This simple rule is all it takes. The markstat command will read this script and generate a web page or PDF document that combines the code with the output and your narrative. Here's the HTML output:


Let's read the fuel efficiency data that comes with Stata, compute gallons per 100 miles, and regress that on weight

. sysuse auto, clear
(1978 Automobile Data)

. gen gphm = 100/mpg

. regress gphm weight

      Source │       SS           df       MS      Number of obs   =        74
─────────────┼──────────────────────────────────   F(1, 72)        =    194.71
       Model │  87.2964969         1  87.2964969   Prob > F        =    0.0000
    Residual │  32.2797639        72  .448330054   R-squared       =    0.7300
─────────────┼──────────────────────────────────   Adj R-squared   =    0.7263
       Total │  119.576261        73  1.63803097   Root MSE        =    .66957

─────────────┬────────────────────────────────────────────────────────────────
        gphm │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
      weight │    .001407   .0001008    13.95   0.000      .001206    .0016081
       _cons │   .7707669   .3142571     2.45   0.017     .1443069    1.397227
─────────────┴────────────────────────────────────────────────────────────────

We see that heavier cars use more fuel, with 1,000 pounds requiring an extra 1.4 gallons to travel 100 miles.


How It Works

The command goes through the script and separates the Markdown narrative, which goes in a .md file with all Stata code removed, and the Stata code, which goes in a .do file with all annotations removed. We call this the tangle step.

The .md file is just plain Markdown. We use a placeholder of the form {{n}}, where n is a chunk number, to mark where the Stata code block used to be, so please don't use double braces in your narrative. The command runs this file through the Pandoc converter to obtain either HTML or LaTeX output in a file with the custom extension .pdx.

The .do file is just plain Stata. We insert a comment of the form //_n to mark the beginning of the n-th code chunk, and //_^ to mark the end of the last chunk, so please avoid these patterns if you include comments. The command runs this file through Stata to obtain a .smcl log.

The next step is to weave together the Markdown and Stata output, using the information in the placeholders and markers, calling Stata commands to help translate SMCL blocks to TeX or plain text, which we then encode as HTML.

If you are generating HTML you are done! If you want to produce PDF output the command will then run a LaTeX-to-PDF converter, as explained in Getting Started.

The script file may be edited using Stata's code editor, which has the advantage that you can select and run your Stata code to check that it works, or to examine results while you are authoring the narrative.

Command Syntax

The syntax of the command is quite simple:

markstat using filename [, pdf mathjax strict bibliography]

The only required argument is the name of the script file. This must have extension .stmd, but the extension does not have to be typed.

If you are producing HTML and do not have complex mathematical equations you don't need any of the options, so let me give you just a brief summary here.

  • pdf is used if you want to generate a PDF document, which we do via LaTeX, so this option requires additional tooling.

  • mathjax is use to activate the MathJax JavaScript library, which does an excellent job of rendering mathematical equations on the web.

  • strict is used to select an alternative strict syntax to distinguish Stata from Markdown code, using code fences instead of the "one tab or four spaces" indentation rule, as explained here.

  • bibliography is used to resolve citations and generate a list of references using a BibTeX database.

To facilitate troubleshooting, the key files generated by the command are kept around for examination.

Markdown

Markdown is a lightweight markup language invented by John Gruber. It is easy to write and, more importantly, it was designed to be readable "as is", without intrusive markings.

Gruber's Markdown: Basics has a quick introduction to the notation. There is an ongoing effort to standardize Common Markdown with reference implementations in C and JavaScript, see commonmark.org for details.

The markstat command uses John MacFarlane's Pandoc to convert Markdown to HTML or LaTeX, so you first need to install this converter as explained in Getting Started.

In Markdown you create a heading by "underlining" the title with === for level 1 and --- for level 2. You can also define a heading at levels one to six by starting a line with one to six hashmarks, as in ### a level 3 heading.

You define a paragraph break by leaving a blank line. If you need a line break leave two or more spaces at the end of the line, or end the line with a backslash \, a Pandoc extension that makes the line break clearer.

To indicate emphasis using italics wrap the text using an asterisk or underscore, as in *italics*. For strong emphasis using a bold font wrap the text using two asterisks or underscores, as in **bold**. For a monospace font suitable for code use backticks, for example to refer to the regress command type `regress`.

To create a list you start a line with an asterisk *, plus +, or minus - sign for a bulleted list, or a number followed by a period, for example 1., for a numbered list. You add items to a list by starting a line with the same symbol or with a number. Items in ordered lists are numbered consecutively starting with the first number, regardeless of the numbers actually used for the other items. To end the list enter a blank line.

You can link to another document by putting the anchor in square brackets and the link in parentheses, as in [GR's website](http://data.princeton.edu) to link to my website.

To link to an image start with a bang, type a title in square brackets and the source in parentheses, as in ![Fuel Efficiency](auto.png).

Markdown lets you include HTML as well, so we could have coded the image as <img src='auto.png' alt='Fuel Efficiency'/> and a line break as <br/>. This is not recommended if the aim is to generate a PDF file.

Mathematical Equations

Pandoc interprets any text between dollar signs as a LaTeX formula, so you may write a regression equation as $y = \alpha + beta x + e$. If you are generating HTML this will be rendered by default using Unicode characters. For best results, however, use the mathjax option to link to the MathJax JavaScript library, which does a an excellent job rendering LaTeX equations. If you are rendering PDF via LaTeX the equations are rendered natively by LaTeX.

In addition to inline equations you can include displayed equations by using double dollar signs as in the following example

$$
    y = \alpha + \beta x + e
$$

I think indenting the equation as above improves readability, and markstat ensures that equations in display blocks are not confused with Stata code.

Metadata

Pandoc allows including the document's title, author and date as metadata, which you do by starting the document with the lines that begin with % and contain the relevant information

% Stata Markdown
% Germán Rodríguez
% 26 October 2016

Alternatively, you may use the YAML format, see the note at the end of here.

Strict Code Blocks

The simple "one tab or four spaces" rule to distinguish Stata and Markdown code works well, but precludes some advanced Markdown options such as nested lists.

The strict option of markstat uses code fences instead, with Stata code blocks defined as

```{s}
  // Stata code goes here
```

The braces around the s are optional, so the opening fence can be coded ```s.

You can supress echoing the commands in a Stata block using the syntax

```{s/}
  // Commands here are not echoed
```

Of course you can always supress output using quietly.

Code inside fences may be indented to improve readability. The markstat command will remove one level of indentation if present.

A Stata block can always enter Mata using mata: and exit using end, but you can also code a Mata block directly using an m instead of an s. For an example see Mata Matters.

Inline Code

You can quote results by including inline code as part of your narrative, using the syntax `s [fmt] expression`, where fmt is an optional format, followed by an expression.

The markstat command will generate code to evaluate the expression using Stata's display command, and will splice the output inline with the text.

For example after running a regression you can cite R-squared by coding `s e(r2)`. If you prefer to display the value with 2 decimal places only use `s %5.2f e(r2)`.

Consistent withy our treatment of Mata as a first class citizen, you can also include inline Mata code using the same syntax, but with an m instead of an s. The markstat command will generate a printf() function call to display the expression with the given format. If the format is omitted it defaults to %s, so the expression should yield a string.

Markdown Tables

Markdown does not have a syntax for tables, but Pandoc provides a simple syntax, best explained through an example. The code below shows average fuel efficiency in gallons per 100 miles for foreign and domestic cars before and after adjusting for weight:

Car Type    Unadjusted   Adjusted
--------- ------------ ----------
Foreign        4.31         5.46
Domestic       5.32         4.83

Basically you type the column headers, some underlining, and then the table lining up the columns yourself. The cell alignment is determining by the position of the header relative to the underlining. Our first column is left aligned and the other two are right aligned.

Unfortunately this syntax will not work with inline code because the expressions, the placeholders and the final output may all have different widths. Fortunately Pandoc has an alternative syntax using pipe tables, where columns are separated by the pipe character |, and alignment is indicated by placing a colon in the header underlining. The previous table would be coded as follows:

| Car Type | Unadjusted |  Adjusted|
|:---------|-----------:|---------:|
| Foreign  |   4.31     |   5.46   |
| Domestic |   5.32     |   4.83   |

I lined up the pipe characters for readability but this is not required. Both tables render the same way.

Combining inline code with pipe tables lets us produce dynamic reports. An example generating the above table from scratch may be found here.

Another example generating a table of estimates using Jann's esttab command may be found here.

Citations

Thanks to the amazing Pandoc, markstat can also handle bibliographic references.

In a nutshell, you create a BibTeX database with the literature you intend to cite, and include a reference to this file in the documents YAML metadata.

Each reference has a unique key, for example knuth92, and you can cite it in the text using an ampersand and the key, as in @knuth92, with options to include page numbers and other information.

Using the bibliography option coordinates with Pandoc's cite-proc to resolve the references and list them at the end of the document. More information here.

Updated for version 1.6