0% found this document useful (0 votes)
58 views8 pages

Active Documents With Org-Mode

Uploaded by

Khoi Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views8 pages

Active Documents With Org-Mode

Uploaded by

Khoi Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

S cie n t i f ic Programming

Editors: Konstantin Läufer, [email protected]


Konrad Hinsen, [email protected]

Active Documents
with Org-Mode
By Eric Schulte and Dan Davison

Org-mode is a simple, plain-text markup language for hierarchical documents that allows the intermingling
of data, code, and prose.

O
rg-mode is implemented as a mechanisms for evaluating embedded The ellipses at the end of each line
part of the Emacs text editor.1 code, and publishing functionality indicate that the heading’s content
It was initially developed as that might be used to automate the is hidden from view. Notice that the
a simple outlining tool intended for computational analysis and genera- heading beginning with the keyword
note taking and brainstorming. It was tion of figures. Here, we focus on the COMMENT is not included in the ex-
later augmented with task manage- Org-mode features that support the ported document. Org-mode uses
ment tools—letting researchers trans- practice of RR; information on other many such keywords for associating
form notes into tasks with deadlines aspects of Org-mode can be found information with headlines.
and priorities—and with syntax for in the manual (http://orgmode.org/
the inclusion of tables, data blocks, manual)4 and in the community wiki Code and Data
and active code blocks. Users new to (http://orgmode.org/worg). Using a simple block syntax, both
Org-mode often start with its simple The plain text Org-mode source of code and data can be embedded in
plain-text note-taking system, then this article is available for download Org-mode documents as follows:
move on to increasingly sophisti- at https://github.com/eschulte/CiSE/
cated features as their comfort level raw/master/org-mode-active-doc.org First a data block.
permits. (see the sidebar “How to Download this #+begin_example
In reproducible research (RR), re- Document” for more information). raw textual data
searchers publish scientific results Readers with the requisite open-source #+end_example
along with the software environment software can execute the source code
and data required to reproduce all examples—which analyze a dataset Second a code block.
computational analyses in the publi- and create graphics—as well as export #+begin_src sh
cation.2 Reproducibility is essential the complete paper to one of several echo "shell script code"
to peer-reviewed research, but sci- output formats. #+end_src
entific publications often lack the in-
formation required for reviewers to Syntax Code and data blocks can be named,
reproduce the analysis described in Org-mode documents are plain text allowing their contents to be refer-
the work. As Jonathan Buckheit and files organized using a hierarchical enced from elsewhere in the Org-
David L. Donoho noted,3 outline defined by a number of simple mode file. Figure 2 shows an example,
syntactical rules. in which the shell script references the
An article about computational sci- data block’s content.
ence in a scientific publication is not Outlines Cross references between an Org-
the scholarship itself, it is merely The outline can be folded and ex- mode file’s code and data elements turn
advertising of the scholarship. The panded, hiding or exposing as much Org-mode into a powerful, multilingual
actual scholarship is the complete of the document as wanted. Using this programming environment in which
software development environment facility, even very large documents data and code expressed in many differ-
and complete set of instructions, can be comfortably navigated in a ent programming languages can interact.
which generated the figures.­ manner similar to that of a file system.
Headlines are indicated by leading *’s, Evaluation
Org-mode supports RR with syntax as in the folded view of this article in Code and data references make
for including inline data and code, Figure 1. chained evaluation strings possible.

2 Copublished by the IEEE CS and the AIP 


1521-9615/11/$26.00 © 2011 IEEE Computing in Science & Engineering

CISE-13-3-SciProg.indd 2 25/03/11 5:32 PM


How to Export Configuration
Next, you can evaluate the emacs-lisp code block to
this Document configure Org-mode to export our article (see Figure A).

T his article was originally composed as an Org-mode docu-


ment; the raw plain-text version is available for download
at https://github.com/eschulte/CiSE/raw/master/org-mode-
Export
After installing all required software, you can export the
active-doc.org. All of the examples presented in this article article to several different back ends in three steps.
can be interactively recreated from the original document. First, open this document in Emacs. Second, evaluate
the “Configuration” emacs-lisp code block immediately
Requirements
previous in this document. This can be done with C-c C-v p
The first step is to ensure that you have installed on your
to jump to the previous code block, then C-c C-c to
system recent versions of Emacs (www.gnu.org/software/
evaluate the code block where C-c means press “c” while
emacs) version 23 or greater and Org-mode (http://orgmode.
holding the control key, C-v means press “v” while hold-
org) version 7.5 or greater. To evaluate the code blocks
ing the control key, and so forth.
in our article, you also need the following programming
Finally, use C-c C-e to open the Org-mode export
languages installed on your system:
dialog, which displays a number of backend options and
• Python (www.python.org), the key which should be used to export to that backend,
• R (www.r-project.org), for example, press “d” to export this document to a .pdf
• Emacs speaks statistics (ESS; http://ess.r-project.org) and open the resulting file in your document reader, or
• gnuplot (www.gnuplot.info), and press “b” to export this document to .html and open the
• gnuplot-mode (www.emacswiki.org/emacs/GnuplotMode) resulting file in your Web browser.

#+source: configuration
#+begin_src emacs-lisp :results silent
;; first it is necessary to ensure that Org-mode loads support for the
;; languages used by code blocks in this article
(org-babel-do-load-languages
'org-babel-load-languages
'((sh . t)
(org . t)
(emacs-lisp . t)
(python . t)
(R . t)
(gnuplot . t)))
;; then we'll remove the need to confirm evaluation of each code
;; block, NOTE: if you are concerned about execution of malicious code
;; through code blocks, then comment out the following line
(setq org-confirm-babel-evaluate nil)
;; finally we'll customize the default behavior of Org-mode code blocks
;; so that they can be used to display examples of Org-mode syntax
(setf org-babel-default-header-args:org '((:exports . "code")))
#+end_src

Figure A. The emacs-lisp code block to configure Org-mode to export this article.

Figure 3 shows the series of actions 2. To resolve this reference, the evaluated as a literal value that’s
that result when the analyze code data code block is located in the assigned to the url variable and
block is evaluated interactively or Org-mode file and is evaluated. passed to the shell script. The
during export. 3. The :var raw=raw header argu- shell script then downloads data
ment causes Org-mode to resolve from the external url and makes
1. The analyze code block is evalu- the raw reference. these data available to Org-mode.
ated. The :var data=data head- 4. The raw code block is evaluated 5. The results of the shell script
er argument causes Org-mode to causing the :var url=http:// are assigned to the raw variable,
evaluate the data reference. data.org header argument to be which is passed to the Python

May/June 2011 3

CISE-13-3-SciProg.indd 3 25/03/11 5:32 PM


S cie n t i f ic P r o g r a m m i n g

* Introduction...
* Syntax...
** Outlines...
** Code and Data...
* Evaluation...
* Example Application...
** Download External Data...
** Parsing... Major League Baseball (MLB) games
** Analysis... in the 2010 season. We hypothesize
** Display... what every baseball fan wants to
* Conclusion... believe: that large crowds spur the
* COMMENT How to Export this Document... home team to superior performance
* Footnotes... levels. We found and report on
the offensive statistic that has the larg-
Figure 1. The folded view of this article. Headlines are indicated by leading *’s.
est correlation with high attendance.

First a data block. Download External Data


#+results: raw-data Our example correlates home team
#+begin_example offensive statistics with attendance for
raw textual data the 2010 MLB season (see Figure 4).
#+end_example As Figure 5 shows, this first code
block, named url, translates the
Second a code block. numerical 2010 season into the url
#+begin_src sh :var text=raw-data for the retrosheet.org website (we ob-
echo $text|wc tained this copyrighted information
#+end_src free of charge; see www.retrosheet.
org). The website is devoted to col-
#+results: lecting and curating MLB statistics.
: 1 3 17 As Figure 6 shows, with the raw-
data shell code block, the zip file of
statistics located at the specified url
Figure 2. The shell script referencing the data block’s content. By naming code and
is downloaded and its contents are
data blocks, you can reference their contents from elsewhere in the Org-mode file.
unpacked into a local text file named
data block’s content.
2010.csv. The cache yes header
argument ensures that this code block
code in the body of the data the Org-mode document. On ex- is run only once and the data aren’t
code block. port to HTML, ASCII, LaTeX, downloaded again every time the code
6. This code is passed to an external or another Org-mode supported block’s results are referenced.
Python interpreter, which evalu- format, the linked figure will Next, the stat-headers Python
ates the Python code and returns be embedded into the exported code block returns a list of the names
its result to Org-mode. document. of the offensive statistics that we’ll
7. The data code block’s results are test for correlation with attendance
then assigned to the data variable Example Application (see Figure 7).
and passed to the R code in the To illustrate Org-mode’s application
body of the analyze code block. to RR, we use an analysis of base- Parsing
8. This code is then passed to an ex- ball statistics. The ordered nature of The next two shell code blocks,
ternal R interpreter, which gen- baseball games makes them particu- offensive-stats and attendance,
erates a figure that is written to larly amenable to statistical analysis. collect the offensive statistics and the
the file specified in :file fig. Baseball players’ performance and the attendance from the raw data file pro-
pdf. course of baseball games are routinely duced by the raw-data code block.
9. A reference to this figure is then captured in a few statistics that are
passed from the analyze code comparable across games and seasons. Analysis
block back to Org-mode, which In this example, we analyze the The analysis code block uses the
inserts a link marked by double correlation of several common offen- R statistical programming language
square brackets into the body of sive statistics with the attendance at to calculate correlations between the

4 Computing in Science & Engineering

CISE-13-3-SciProg.indd 4 25/03/11 5:32 PM


External #+Title: Example Org-mode File
data
source * Data Source
Data was gathered from... prose ...
4
#+source: raw
#+begin_src sh :var url=http://data.org
curl url...
#+end_src

* Data Processing Resolve


Raw data 5 Format data by... prose ... 3 reference
raw
#+source: data
#+begin_src python :var raw=raw
Python def format
6 ...
interpreter
#+end_src

* Data Analysis Resolve


Formatted
7 Analyze and graph data ... prose ... 2 reference
data
data
#+source: analyze
#+begin_src R :var data=data :file fig.pdf
R names(data)
8 1
interpreter ...
hist(data)
#+end_src
Figure 9
#+Caption: Histogram of Data
#+results: analyze
[[file:fig.pdf]]

Export

HTML ASCII LaTeX/ PDF


<title> Example Org-mode File \title{Example Org-mode File}
Example Org-mode File =====================
</title>
\section{Data Source}
Data Source
<h1>Data Source</h1> ----------- Data was gathered from...
<p>Data was gathered from...
Data was gathered from...

Figure 3. Active Org-mode document. Variables of the analyze code block reference the results of previous code blocks
(shown of the right), in resolving these references the referenced code blocks are evaluated, and their results are passed back
to the analyze code block (on the left side).

outputs of the offensive-stats which chooses to walk a danger-


and attendance code blocks, whose ous home-team hitter rather than #+source: season
values are saved into the stats and take the chance that the crowd will #+begin_src emacs-lisp
attendance variables respectively. spur him to a potentially damaging :exports none
The most correlated column, performance. 2010
intentional walk , can be men- #+end_src
tioned in the text using an inline Display
code block. The code below shows Using gnuplot, we can plot the num- Figure 4. The example application
the Org-mode syntax for an inline ber of forced walks and the atten- will correlate home team offensive
block. The results indicate that the dance for the five games with the statistics with attendance for the
fans’ belief in the effects of a large most forced walks (see Figures 10 2010 Major League Baseball
crowd is shared by the visiting team, and 11). season.

May/June 2011 5

CISE-13-3-SciProg.indd 5 25/03/11 5:32 PM


S cie n t i f ic P r o g r a m m i n g

#+source: url
#+begin_src sh :var season=season :exports none
echo "http://www.retrosheet.org/gamelogs/gl$season.zip"
#+end_src

Figure 5. The URL code block. This block translates the numerical 2010 season into the URL for the website that collects
Major League Baseball statistics.

#+source: raw-data
#+headers: :exports none
#+begin_src sh :cache yes :var url=url :file 2010.csv
wget $url && \
unzip -p gl2010.zip > 2010.csv && \
rm gl2010.zip
#+end_src

Figure 6. The raw-data shell code block. The zip file of statistics located at the specified url is downloaded and its contents
are unpacked into a local text file named 2010.csv.

#+source: stat-headers
#+headers: :exports none
#+begin_src python :results list :cache yes :return fields
import urllib2
url = 'http://www.retrosheet.org/gamelogs/glfields.txt'
fp = urllib2.urlopen(url)
fields = []
for line in fp:
if line.find('Visiting team offensive statistics') != -1:
line = fp.readline()
while line.find('Visiting team pitching statistics') == -1:
if line[13] != ' ':
fields.append(line.strip().split('.')[0].split('(')[0])
line = fp.readline()
#+end_src

#+results[97fdb2368b66e48faa6afb8b6eff34e00f05633b]: stat-headers
- at-bats
- hits
- doubles
- triples
- homeruns
- RBI
- sacrifice hits
- sacrifice flies
- hit-by-pitch
- walks
- intentional walks
- strikeouts
- stolen bases
- caught stealing
- grounded into double plays
- awarded first on catcher's interference
- left on base

Figure 7. The stat-headers Python code block. This block returns a list of the names of the offensive statistics to test
for correlation with attendance.

CISE-13-3-SciProg.indd 6 25/03/11 5:32 PM


#+source: offensive-stats
#+headers: :exports none
#+begin_src sh :var file=raw-data
awk '{for (x=50; x<=66; x++) { printf "%s ", $x } printf "\n" }' FS="," \
< $file
#+end_src

#+source: attendance
#+headers: :exports none
#+begin_src sh :var file=raw-data
awk '{ print $18 }’ FS="," < $file
#+end_src

Figure 8. The offensive-stats and attendance shell code blocks. These blocks collect the offensive statistics and
attendance from the raw data file produced by the raw-data code block (see Figure 6).

#+source: analysis
#+headers: :var headers=stat-headers :var stats=offensive-stats
#+begin_src R :var attendance=attendance :exports none
# apply the headers to the list
colnames(stats) <- headers

## The following lines are required because parsing bugs are causing
## corrupt data in these two rows.
badrows <- c(141, 674)
stats <- stats[-badrows,]
attendance <- attendance[-badrows,]
attendance <- as.integer(attendance)

# perform a simple correlation of each column with the attendance


corrln <- cor(stats, attendance)

# return the name of the most correlated column


rownames(corrln)[which.max(corrln)]
#+end_src

Figure 9. The analysis code block. This block uses the R statistical programming language to calculate correlations between
the outputs of the offensive-stats and attendance code blocks (see Figure 7) whose values are saved into the stats
and attendance variables respectively.

As this example demon-


O rg-mode has many features
6 50,000
Forcedwalks Attendance
strates, commingling code and 5 that make it a good choice
40,000
prose lets authors collect all for reproducible research; some
relevant information into a sin- 4 of these are essential for any RR
Forcedwalks

Attendance

30,000
gle place. This practice bene- 3 tool, and others alleviate com-
fits readers, who can reproduce 20,000 mon burdens of practicing RR.
2
the calculations performed Of the essential properties,
in the work and also extend 1 10,000 arguably the most important
the analysis, possibly within is that, as part of Emacs, the
0 0
Org-mode itself. For exam- Org-mode copyright is owned
CO

SL

SF

LA

N
YN

ple, readers of this article can by the Free Software Founda-


N

N
L-

-A

-L

-A

-A
S

AN
TL

tion.5 This ensures that Org-


FN

RI

rerun the analysis for another


RI

season by simply changing Figure 10. Forced walks and attendance for the mode is now and always will
the value of the season code top five games by forced walks. Results indicate be free and open source soft-
block above and re-exporting that the visiting team shares the fans’ belief in the ware. This directly relates to
the file. effects of a large crowd. two RR goals. First, Org-mode

May/June 2011 7

CISE-13-3-SciProg.indd 7 25/03/11 5:32 PM


S cie n t i f ic P r o g r a m m i n g

#+source: top-8
#+begin_src sh :var data=raw-data :exports none
cat $data|awk '{print $60,$18,$7"-"$4}'
FS=","|sed 's/"//g'|sort -rn |head -5
#+end_src

#+source: figure
#+begin_src gnuplot :var data=top-8 :file plot.png that a single Org-mode document can
:exports results be used for every stage of a research
# set term tikz project—from brainstorming, soft-
# set output 'plot.tex' ware development, and experimenta-
set yrange [0:6] tion to publication— Org-mode
set y2range [0:50000] largely relieves authors of the burden
set key above of tracking resources required for
set y2tics border reproducing their work. Although
set ylabel 'forced walks' this information volume can result
set y2label 'attendance' in extremely large files, Org-mode
set style fill pattern documents’ hierarchical folding lets
set style data histogram users comfortably read and edit such
set style histogram clustered files. The files themselves are encoded
set auto x in plain text, which enhances their
set xtic rotate by -45 scale 0 portability and makes them easy to in-
plot data using 1:xtic(3) title 'forced walks', \ tegrate with version control systems,
data using 2 axes x1y2 title 'attendance' allowing for revision tracking and
#+end_src collaboration.7
Org-mode documents run the gam-
#+label: fig:top-5 bit from simple collections of plain-
#+attr_latex: width=0.8\textwidth text notes, to complex laboratories
#+Caption: Top 5 games by forced walks, with forced walks housing data and analysis mechanisms,
and attendance shown. to publishing desks with facilities for
#+results: figure displaying and exporting scientific re-
[[file:plot.png]] sults. There’s a friendly community of
Org-mode users and developers who
Figure 11. The code for the number of forced walks and the attendance for the five
communicate on the Org-mode mail-
games with the most forced walks.
ing list (http://lists.gnu.org/mailman/
listinfo/emacs-orgmode). By answer-
ing questions and helping each other
is available free of charge to install incorporated into almost any com- master Org-mode’s many features,
by any user on any system, which puter work environment. Emacs is also this community helps to solve one of
ensures access to the software envi- widely used by the scientific com- the largest hurdles posed by any RR
ronment required for reproduction. munity for editing both prose docu- tool—learning how to use it.
Second, the source code specifying ments and source code. By leveraging
Org-mode’s inner workings is open to existing Emacs editing support, Org-
inspection, ensuring that the mecha- mode can offer its users a comfortable References
nisms through which Org-mode and familiar editing environment for 1. R.M. Stallman, “Emacs the Extensible,
generates scientific results are open all content types. Finally, given Org- Customizable Self-Documenting Dis-
to review and verification. mode’s implementation in the Emacs play Editor,” ACM Sigplan Notices,
In addition to its open source ped- extension language, E macs Lisp,6 vol. 16, no. 6, 1981, pp. 147–156.
igree, Org-mode benefits in other users can customize Org-mode’s behav- 2. S. Fomel and J.F. Claerbout, “Repro-
ways from its Emacs relationship. ior to their particular needs and support ducible Research,” Computing in Science
Emacs is one of the world’s most arbitrary new programming languages; & Eng., vol. 11, no. 1, 2009, pp. 5–7.
widely ported pieces of software, Org-mode currently supports more 3. J.B. Buckheit and D.L. Donoho,
with versions that run on all major than 30 programming languages. “Wave-Lab and Reproducible
operating systems. This ensures Org-mode addresses many com- Research,” Wavelets and Statistics,
that Org-mode documents can be mon problems in RR practice. Given Springer-Verlag, 1995.

8 Computing in Science & Engineering

CISE-13-3-SciProg.indd 8 25/03/11 5:32 PM


4. C. Dominik et al., The Org Mode 7 Eric Schulte is a doctoral student at the biology, population genomics, reproduc-
Reference Manual, Free Software University of New Mexico, where he is a ible research, and machine learning. Davi-
Foundation, 2010. research assistant in the Adaptive Systems son has a PhD in population genetics from
5. R. Stallman, “Free Software Founda- Lab. His research interests include the natu- the University of Chicago. Contact him at
tion,” Encyclopedia of Computer Science, ralization of computer software systems, [email protected].
John Wiley & Sons, 2003, pp. 732–733. both in exploring novel distributed architec-
6. B. Lewis, D. LaLiberte, and R. Stallman, tures that avoid privileged points in space
GNU Emacs Lisp Reference Manual, 3rd ed., and time and automated program repair
Free Software Foundation, 2010. using evolutionary techniques. Schulte has
7. K. Hinsen, K. Läufer, and G.K. a BA in mathematics from Kenyon College.
Thiruvathukal, “Essential Tools: Contact him at [email protected]. Selected articles and columns from
Version Control Systems,” Computing IEEE Computer Society publica-
in Science & Eng., vol. 11, no. 6, 2009, Dan Davison is a senior scientist at Counsyl. tions are also available for free at http://
pp. 84–91. His research interests include computational ComputingNow.computer.org.

May/June 2011 9

CISE-13-3-SciProg.indd 9 25/03/11 5:32 PM

You might also like