Germán Rodríguez
Stata Markdown Princeton University

Stata 15

Stata 15 was released on June 6, 2017 and I got my copy three days later. The big news is that it includes support for Markdown and dynamic documents through three new commands: dyndoc, putpdf and putdocx. It also has a markdown command to convert Markdown to HTML.

How do these tools compare with markstat? Obviously the main difference is that the new commands are part of official Stata, whereas markstat relies on Pandoc and, for PDF targets, on a LaTeX installation. On the other hand, markstat has a simpler syntax and provides additional functionality via Pandoc, the most important of which is the ability to generate both HTML and PDF output from the same input script.

Here are my first impressions.

1. Cleaner Scripts

Stata's dyndoc uses the following syntax for code blocks

~~~~
<dd_do>
sysuse auto, clear
summarize mpg
</dd_do>
~~~~

Note the use of both a Markdown code fence ~~~~ and a dynamic tag <<dd_do>>. In contrast, markstat relies on a simple "one tab or four spaces" indentation rule

    sysuse auto, clear
    summarize mpg

An alternative to allow more control, such as hiding Stata code, is to specify the strict option and use code fences

```s
    sysuse auto, clear
    summarize mpg
```

I believe this leads to more readable input scripts, much in the spirit or Markdown itself. Checkout this comparison with Stata's dyndoc example The difference is more noticeable in complex documents with lots of code.

Also, markstat lets you introduce Mata code blocks using an m instead of an s in the code fence. For an example see Mata matters.

2. Nicer Output

Compare the HTML output of dyndoc using the previous two commands

. sysuse auto, clear
(1978 Automobile Data)

. summarize mpg

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41

With the output from markstat

. sysuse auto, clear
(1978 Automobile Data)

. summarize mpg

    Variable │        Obs        Mean    Std. Dev.       Min        Max
─────────────┼─────────────────────────────────────────────────────────
         mpg │         74     21.2973    5.785503         12         41

Just a cosmetic issue, but markstat HTML output is more in line with PDF output.

3. Inline Code

Inline code in dyndoc uses a dynamic tag:

The average fuel efficiency is <<dd_display: %4.2f `r(mean)'>>.

The equivalent markstat code is a bit less obtrusive and easier to type

The average fuel efficiency is `s %4.2f r(mean)`.

Moreover, markstat supports inline Mata code using an m instead of an s. (This is also the syntax of R markdown, which uses an r.)

4. Metadata

markstat takes advantage of Pandoc's support for metadata, using a simple three-line syntax for author, title and date (which may be inline code):

% Literate Data Analysis
% Germán Rodríguez
% `s c(current_date)`

markstat also supports more general YAML blocks. For more information see metadata.

5. Bibliographies

Thanks again to the amazing Pandoc, markstat supports citations. The basic idea is to prepare a BibTeX file with the references. You can then cite them in the text, for example typing @knuth84 to refer to his literate programming paper. The bib option of markstat will arrange for Pandoc to format the citations, look them up in the BibTeX database, and generate a list of references at the end of your document, in a style of your choice. For example Knuth's paper will appear in the default Chicago style as

Knuth, Donald. 1984. “Literate Programming.” The Computer Journal 27 (2): 97—111.

For a quick example see citations. A more extended example is provided by my Stata Journal paper introducing markstat (forthcoming). Check out this page for access to the source code of the paper and the BibTeX database used to resolve the references, as well as the resulting HTML and PDF versions.

6. PDF Output

I think a big advantage of markstat is that it can generate a PDF file from the same input script, admitedly at the expense of needing a LaTeX distribution. But once you have jumped the installation hurdle, all you do is add the pdf option, as explained in the original paper.

The dyndoc command generates HTML only. There is a new putpdf command, but this is really a lower-level command; it provides a lot of control, but seems aimed more at programmers than regular users. Compare typing the text

You can *italicize*, ~~strikeout~~, <u>underline</u>, sub/super script~2~

with writing the commands

putpdf text ("You can ")
putpdf text ("italicize, "), italic
putpdf text ("strikeout, "), strikeout
putpdf text ("underline"), underline
putpdf text (", sub/super script")
putpdf text ("2 "), script(sub)

A comparison of markstat with putpdf using the example in the Stata announcement may be found here.

For longer examples, you can see both the input script and the HTML and PDF output for my papers on the wfs and markstat commands, as well as the latest update of my Stata tutorial.