Germán Rodríguez
Stata Markdown Princeton University

Citations

Thanks to the amazing Pandoc, markstat now supports bibliographic references. The short script below illustrates the main ideas

citations.stmd
---
title: Literate Data Analysis
author: Germán Rodríguez
date: 1 June 2017
bibliography: markstat.bib
---

Donald Knuth [-@knuth84] is a strong believer in documenting computer
programs and originated the term *literate programming*. This concept
is even more important in data analysis, where documenting each step
in data collection, processing and analysis is crucial [see @leisch02;
@rossini01]. 

## References
---

The script uses a YAML metadata block with a bibliography entry referring to a BibTeX database; in this case markstat.bib,which has the references for my markstat paper (forthcoming). For example the entry for Knuth's paper appears below. The complete file is here.

@article{knuth84,
    author  = "Donald Knuth",
    title   = "Literate Programming",
    journal = "The Computer Journal",
    volume  = 27,
    number  = 2,
    pages   = "97--111",
    year    = 1984
}

Each entry has a unique key and we can cite it using the syntax [@key]. The citation may include a prefix [see @key] and/or a locator [@key, page 101]. If the name has been mentioned already use [-@key]. You may also use just @key with author-year formats (read about styles below).

When you run markstat add the bibliography option. The command will coordinate with Pandoc to run the pandoc-citeproc filter, resolve all the citations, and include them in your dynamic document.

The HTML output for this example, generated with markstat using citations, bib, appears below

citations.html

Literate Data Analysis

Germán Rodríguez

1 June 2017

Donald Knuth (1984) is a strong believer in documenting computer programs and originated the term literate programming. This concept is even more important in data analysis, where documenting each step in data collection, processing and analysis is crucial (see Leisch 2002; Rossini 2001).

References

Knuth, Donald. 1984. “Literate Programming.” The Computer Journal 27 (2): 97—111.

Leisch, F. 2002. “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In Compstat 2002. Proceedings in Computational Statistics, and edited by Wolfgang Härdle and Bernd Rönz, 575–80. Physika Verlag, Heidelberg, Germany.

Rossini, A. J. 2001. “Literate Statistical Practice.” In DSC 2001. Proceedings of the 2nd International Workshop on Distributed Statistical Computing, edited by K. Hornick and F. Leisch. http://www.R-project.org/conferences/DSC-2001.

 

Citation Styles

The default style is the Chicago Manual of Style author-date format, but you can use any style available in Citation Style Language (CSL), of which there are more than 8,000 listed in the Zotero Style Repository. To change style download the .csl file and add a reference to it in the YAML block. For example to change to the IEEE style I downloaded proceedings-of-the-ieee.csl from the repository, edited the metadata to read

---
title: Literate Data Analysis
author: Germán Rodríguez
date: 1 June 2017
bibliography: markstat.bib
csl: proceedings-of-the-ieee.csl
---

Saving the file as citations2.stmd and running markstat using citations2, bib results in the output below.

citations2.html

Literate Data Analysis

Germán Rodríguez

1 June 2017

Donald Knuth [1] is a strong believer in documenting computer programs and originated the term literate programming. This concept is even more important in data analysis, where documenting each step in data collection, processing and analysis is crucial [2], [3].

References

[1] D. Knuth, “Literate programming,” The Computer Journal, vol. 27, no. 2, p. 97—111, 1984.

[2] F. Leisch, “Sweave: Dynamic generation of statistical reports using literate data analysis,” in Compstat 2002. proceedings in computational statistics, 2002, pp. 575–580.

[3] A. J. Rossini, “Literate statistical practice,” in DSC 2001. proceedings of the 2nd international workshop on distributed statistical computing, 2001.

 

Note that the only change in the script was the addition of the `csl` line in the metadata.

When citing entries you may take a shortcut such as @Knuth84 if you know you will be using an author-date style, but Knuth [-@knuth84] is better. It yields the same output for author-date, but works better with numeric formats, rendering Knuth [1] for the IEEE style and Knuth1 for the AMA style instead of just [1] or 1.