Econ Life Hack: Using Notepad++ and texdoc to write reproducible LaTex documents with Stata.

latexstatanotepadlogos

Fair warning: this post is off the nerd deep end.I’m going to describe how to use the Notepad++ text editor and the Stata package texdoc to edit Stata and LaTex code from within the same document. So this post will assume a basic level of interest in and knowledge of Stata and LaTex. Here goes!

During my coursework at UIUC I got acquainted with R and was particularly taken with the Knitr package for writing reproducible reports.

Brief discursus on why this matters: reproducible research is a growing preoccupation in the social sciences, especially with recent high-profile spats about p-hacking and reproducibility in the psychology literature. Reproducibility actually encompasses a number of questions, the most obvious being, “if I did the same experiment over again, would I get the same result?” But even before you get to attempting to re-run an experiment (which in economics is tricky, since we don’t exactly work in labs…) there is a lower bar to pass: “if I ran the same code on the same data would I get the same result?” That might seem like an obvious and easy test, but there are a number of documented cases where papers submitted to peer-review journals have failed it. So there is a push within economics to publish the code used in the statistical analysis along with the data and the analysis itself.

Reproducible documents make that task a lot easier. Using a tool like Knitr, you can write a report or paper, including snippets of R so that the code can be edited along with the analysis rather than copying and pasting tables back and forth. This eliminates some possibilities for errors and keeps you from including old tables when the model or the data changes. It also simplifies and organizes the task of providing code and data, because everything is all in one place. Most importantly for the lazy grad student in all of us: it saves a lot of time. Moving to RStudio with Knitr saved me considerable time, because each table was automatically regenerated inside my document any time the code changed.

Which brings me to my major beef with Stata: where’s the equivalent of Knitr? Enter texdoc. This is a Stata package that allows you to write LaTex (the markdown language used to format papers for publication) and Stata code within the same document. This means you can write a report, format it, and run the Stata analysis all in one place, no copying and pasting needed. So…

What You Need:

  1. texdoc and sjlatex Stata packages
  2. Notepad++ text editor
  3. A distribution of LaTex (I used MikTex v. 2.9)

Step 1: Install texdoc and sjlatex

First, to get Stata to allow LaTex code inside your .do files, install texdoc by typing ssc install texdoc into the Stata command line. Then consult the texdoc help file and this paper by Ben Jann for how to include LaTex code inside your .do files.

Next, in order to get LaTex to properly read your Stata output, install sjlatex. Instructions are available here for installing sjlatex and getting your installation of MikTex to find it.

Note: the file “stata.sty” included in the sjlatex package is essential for telling MikTex how to format Stata output. I’m sure there is a more elegant way to do this, but I had to put that file into my LaTex project’s working directory, i.e. the folder where the LaTex pdf files are generated, in order to get MikTex to recognize it.

Now play around with writing LaTex and Stata code, placing LaTex code inside /*tex and tex*/ tags. This will allow you to write a document complete with a LaTex header, sections of text, and a bibliography, while spitting out your Stata output from within the same file.

texdocexample

As you can see above, my LaTex header is treated as a comment (shout-out to Mani at UIUC for his primer on LaTex). So Stata ignores the LaTex, then produces a separate .tex file which you can run MikTex on to produce the final formatted document.

End of story?

Unfortunately no. If you try this out for yourself, you’ll find a few flaws in the workflow, mainly stemming from the limitations of Stata’s native .do file editor. It is just not set up for writing an entire document. It doesn’t have spell-check, it doesn’t wrap text (so a paragraph is one long line), and it will definitely not help you with your LaTex syntax.

So it’s time to ante up and use a real text editor to work with the .texdoc files you’ll be writing with this package. There are two very good options for this task: SublimeText and Notepad++. You can check out how to use SublimeText with Stata here, but I’ll be focusing on Notepad++ because I have a bias for open-source tools, and I already had it installed when I was working this out.

Step 2: Configure Notepad++ to work with Stata

Install Notepad++, then follow the instructions here to install Freidrich Huebler’s extension rundolines. This will allow you to write Stata code in Notepad++ then execute it in Stata.

Note: This requires some fussing with the code in the programs Huebler provides. Make sure you get the file path to Stata entered correctly, and edit the code so it refers to your version of Stata. I futzed with this for a while before realizing the code pointed to Stata 14.0 and my version is 14.1.

Next, to get Notepad++ to recognize Stata commands and provide syntax highlighting, follow these instructions from Konstantin Golyaev to set Stata code as a user-defined language.

notepadlanguages

Now you can choose Stata as one of the languages. Note that Tex comes pre-installed under “T” in the menu pictured above, so you can toggle between the two languages as you write.

Step 3: Configure Notepad++ to work with LaTex

John Bruer has thorough instructions here for setting up Notepad++ so that you can run LaTex to generate documents and even search back and forth between the final document and the code that produced it.

Note on references: The code in Bruer’s instructions above uses Bibtex to generate references. If you prefer Biber (which is more recent and has more options) you’ll need to substitute biber.exe for bibtex.exe in the pdf_latex.bat file. And again, pay close attention to the paths to the various programs that are being called.

Step 4: Put it all together

Now that Notepad++, Stata and LaTex can all talk to each other, it’s just a matter of settling on a workflow that works for you. I tend to keep three files going in Notepad++ at a time: the .texdoc file, the .tex file, and a master .do file that gets everything started. The master do file just has a few lines of code like this:

clear
cd “example working directory”
set more off
texdoc do example.texdoc
texdoc strip example.texdoc example.do, replace

This sets the working directory, initiates the .texdoc file, and creates a separate .do file with just the Stata code, in case I want to look at or share the code without all the LaTex stuff.

So to run the whole thing, I execute the master .do file, which runs all the code in the .texdoc file, generating both my stripped .do file and a .tex file in the process.

notepaddomaster

Then I click over to the .tex file’s tab and run MikTex using the pdflatex_build command (which I’ve set as keyboard shortcut F8), and there it is: a nicely formatted pdf with paragraphs of text and Stata-generated tables all included.

notepadtex

So that’s that! Using texdoc with Notepad++ you can write reproducible papers and reports and look like a boss doing it. If anyone reading this has any additional improvements or modifications they’ve made, please share them.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: