Archive

Posts Tagged ‘markup’

Pandoc – an essential tool for Markdown users

March 23, 2011 4 comments

Pandoc is a great tool to convert between various text based formats. For instance, with a single input Markdown file, I can generate an HTML page of that document, a LaTeX document, and a beautifully typeset PDF.

I had troubles installing it on Mac OSX via MacPorts; a simpler solution for me was to download and install the Haskell package and then use the commands:

cabal update
cabal install pandoc

This assumes, of course, that the cabal program that the Haskell package installs is accessible from your path.

The next step for me was to install the excellent Pandoc TextMate bundle. This gives you the standard things like syntax highlighting of your document, as well as a variety of useful snippets. For instance, when I am in Pandoc mode and press ⌃ ⌥ ⌘ P, I get the following popup from which I can easily choose options via mouse or keyboard:

Easy way to preview your document in various output formats

Easy way to preview your document in various output formats

Before you can start using the Pandoc TextMate bundle, you must ensure that the Pandoc executable is on the PATH exposed to TextMate, which is different than your global system path. In other words, just because you can execute pandoc in a shell and have it work, this doesn’t mean it will work in TextMate. For instance, on my computer, Pandoc is located in:

$ which pandoc
/Users/ndunn/Library/Haskell/bin/pandoc

Go to TextMate -> Preferences -> Advanced -> PATH and append :/Users/ndunn/Library/Haskell/bin to the end of the PATH variable.

Appending the Pandoc path to the PATH variable

Appending the Pandoc path to the PATH variable

Pandoc makes a few extensions to the Markdown syntax, which I really like. For instance, you can designate a section of text to be interpreted literally by surrounding it with three ~ characters. Furthermore, you can specify what language the source code is in, and the Pandoc converter will syntax highlight it in the final document (assuming the correct extensions have been installed).

I like this setup because it allows you to specify the language of the block of text, which means that you can force TextMate to interpret it the same way. As I’ve blogged about previously, one can add source code syntax highlighting embedded in HTML documents. I added the following lines to my HTML language grammar in order to have a few different languages recognized and interpreted as source code within these delimited blocks.

Here is the relevant section:

    {   name = 'source.java';
            comment = 'Use Java grammar';
            begin = '~~~\s*{.java}';
            end = '~~~';
            patterns = ( { include = 'source.java'; } );
        },
        {   name = 'text.xml';
            comment = 'Use XML grammar';
            begin = '~~~\s*{.xml}';
            end = '~~~';
            patterns = ( { include = 'text.xml'; } );
        },
        {   name = 'source.shell';
            comment = 'Use Shell grammar';
            begin = '~~~\s*{.shell}';
            end = '~~~';
            patterns = ( { include = 'source.shell'; } );
        },
        {   name = 'source';
            begin = '~~~';
            end = '~~~';
            patterns = ( { include = 'source'; } );
        },

(One tricky bit to get used to is that you need to have at least one blank space between surrounding text and a ~~~ delimited block, or else the ~ characters are interpreted as strikeouts through the text.)

Here is a screenshot of this working in TextMate:

Syntax highlighting of sourcecode within the Pandoc document

Syntax highlighting of sourcecode within the Pandoc document

Finally, just to get really meta on you here’s a screenshot of the text of this document

Text version of the document

Text version of the document

followed by a screenshot of the HTML that Pandoc produces: HTML version of the document

followed by a screenshot of the PDF that LaTeX formatted via Pandoc: PDF version of the document

I hope this has piqued your interest in Pandoc. I love the beautiful output of LaTeX but hate working with its syntax. With Pandoc I’m free to compose in Markdown, a language with a very lightweight syntax, and then convert into TeX when and if I want to.