pandoc

Erik Hetzner

What is pandoc?

  • “a universal document converter”
  • at its core, a tool to turn transform documents to and from an internal document tree
  • so many useful features
    • math
    • complex citations
    • tables

Try it now!

http://johnmacfarlane.net/pandoc/try/

Input formats

  • docbook
  • docx
  • epub
  • html
  • latex
  • markdown
  • mediawiki
  • reStructuredText
  • … and more

Output formats

  • docbook
  • docx
  • epub, epub3
  • many html formats & templates
  • latex
  • man
  • markdown
  • mediawiki
  • odt
  • opendocument
  • … and much more

Usage

Basic

$ pandoc slides.md # outputs HTML

Advanced

$ pandoc -t revealjs -c extra.css -c reveal.js/css/theme/sky.css --mathjax \
  --section-divs --bibliography=slides.bibtex --slide-level=2 \
  --csl=plos.csl -s slides.md -o slides.html

Figures

Input

![Nine-Banded Armadillo, John Woodhouse Audubon](armadillo.jpeg)

Output

Footnotes

Input

This is a note[^1] and another^[Inline style].

[^1]: Here is a note.

Output

Code highlighting

Input

~~~ {.ruby}
def fibonacci( n )
  return n if (n <= 1)
  fibonacci(n - 1) + fibonacci(n - 2)
end
~~~

Output

Supports many languages, from Ada to zsh.

\(M^{a}_{(th)}\)

  • inputs from:
    • \(\TeX\)-math as input for markdown
    • docx native formulas
  • outputs to:
    • MathJAX
    • MathML
    • docx native formulas

Example

Input

The probability of getting $k$ heads when flipping $n$ coins is

$P(E) = {n \choose k} p^k (1-p)^{n-k}$

Output

Citations

BibTeX file:

@article{moran:2014eyeless,
  title = {Eyeless Mexican Cavefish Save Energy
    by Eliminating the Circadian Rhythm in Metabolism},
  volume = {9},
  url = {http://dx.doi.org/10.1371%2Fjournal.pone.0107877},
  …
}

Input

See [@moran:2014eyeless] for more information.

Output

Tables

Input

  Right Left     Center   Default
------- ------ ---------- -------
     12 12        12           12
    123 123       123         123
      1 1          1            1

Table:  Demonstration of simple table syntax.

Output

Internals

Pandoc uses an internal tree structure which different readers and writers either generate or write from

Readers

<p><i>hello world</i></p>

or

*hello world*

or

\emph{hello world}

or docx, docbook, … becomes:

[Para
  [Emph
    [Str "hello", Space, Str "world"]]]

Writers

[Para
  [Emph
    [Str "hello", Space, Str "world"]]]

becomes:

.PP
\f[I]hello world\f[]

(man) or

{\pard \ql \f0 \sa180 \li0 \fi0 {\i hello world}\par}

(rtf) or

<para><emphasis>hello world</emphasis></para>

(docbook) or markdown, html, …

JSON

pandoc has a JSON reader/writer that encodes the native format:

[
  { "unMeta": null },
  [ { "t": "Para",
      "c": [ { "t": "Emph",
               "c": [ { "t": "Str",
                        "c": "hello"
                      },
                      { "t": "Space",
                        "c": []
                      },
                      { "t": "Str",
                        "c": "world"
                        } ] } ] } ] ]

Filters

This functionality allows you to write filters that input one JSON tree and output another JSON tree.

Citations support is written as a filter.

Filter example

#!/usr/bin/env python
from pandocfilters import toJSONFilter, CodeBlock, Image, Para, Str
from matplotlib.pyplot import savefig
import uuid
from pylab import rcParams
rcParams['figure.figsize'] = 2, 2

def evalR(key, value, fmt, meta):
    if key == 'CodeBlock' and value[0][1][0] == "python":
        figfile = "%s.png" % (uuid.uuid4())
        exec(value[1])
        savefig(figfile)
        return Para([Image([Str("Output")], [figfile, "fig:"])])

if __name__ == "__main__":
    toJSONFilter(evalR)

The markdown

Here is a plot:

~~~{.python}
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 5, 0.1);
y = np.sin(x)
plt.plot(x, y)
~~~~

The output

Custom writers

You can write custom outputs writers in Lua. This is a snippet from Martin Fenner’s proof of concept JATS writer:

function Para(s)
  return "<p>" .. s .. "</p>"
end

function RawBlock(s)
  return "<preformat>" .. s .. "</preformat>"
end

Since Lua is an interpreted language, these can be loaded without compilation.

End

http://github.org/egh/pandoc-techtalk/