Archive

Archive for the ‘unix’ Category

Pandoc – an essential tool for Markdown users

March 23, 2011 6 comments

Pandoc is a great tool to convert between various text based formats. For instance, with a single input Markdown file, I can generate an HTML page of that document, a LaTeX document, and a beautifully typeset PDF.

I had troubles installing it on Mac OSX via MacPorts; a simpler solution for me was to download and install the Haskell package and then use the commands:

cabal update
cabal install pandoc

This assumes, of course, that the cabal program that the Haskell package installs is accessible from your path.

The next step for me was to install the excellent Pandoc TextMate bundle. This gives you the standard things like syntax highlighting of your document, as well as a variety of useful snippets. For instance, when I am in Pandoc mode and press ⌃ ⌥ ⌘ P, I get the following popup from which I can easily choose options via mouse or keyboard:

Easy way to preview your document in various output formats

Easy way to preview your document in various output formats

Before you can start using the Pandoc TextMate bundle, you must ensure that the Pandoc executable is on the PATH exposed to TextMate, which is different than your global system path. In other words, just because you can execute pandoc in a shell and have it work, this doesn’t mean it will work in TextMate. For instance, on my computer, Pandoc is located in:

$ which pandoc
/Users/ndunn/Library/Haskell/bin/pandoc

Go to TextMate -> Preferences -> Advanced -> PATH and append :/Users/ndunn/Library/Haskell/bin to the end of the PATH variable.

Appending the Pandoc path to the PATH variable

Appending the Pandoc path to the PATH variable

Pandoc makes a few extensions to the Markdown syntax, which I really like. For instance, you can designate a section of text to be interpreted literally by surrounding it with three ~ characters. Furthermore, you can specify what language the source code is in, and the Pandoc converter will syntax highlight it in the final document (assuming the correct extensions have been installed).

I like this setup because it allows you to specify the language of the block of text, which means that you can force TextMate to interpret it the same way. As I’ve blogged about previously, one can add source code syntax highlighting embedded in HTML documents. I added the following lines to my HTML language grammar in order to have a few different languages recognized and interpreted as source code within these delimited blocks.

Here is the relevant section:

    {   name = 'source.java';
            comment = 'Use Java grammar';
            begin = '~~~\s*{.java}';
            end = '~~~';
            patterns = ( { include = 'source.java'; } );
        },
        {   name = 'text.xml';
            comment = 'Use XML grammar';
            begin = '~~~\s*{.xml}';
            end = '~~~';
            patterns = ( { include = 'text.xml'; } );
        },
        {   name = 'source.shell';
            comment = 'Use Shell grammar';
            begin = '~~~\s*{.shell}';
            end = '~~~';
            patterns = ( { include = 'source.shell'; } );
        },
        {   name = 'source';
            begin = '~~~';
            end = '~~~';
            patterns = ( { include = 'source'; } );
        },

(One tricky bit to get used to is that you need to have at least one blank space between surrounding text and a ~~~ delimited block, or else the ~ characters are interpreted as strikeouts through the text.)

Here is a screenshot of this working in TextMate:

Syntax highlighting of sourcecode within the Pandoc document

Syntax highlighting of sourcecode within the Pandoc document

Finally, just to get really meta on you here’s a screenshot of the text of this document

Text version of the document

Text version of the document

followed by a screenshot of the HTML that Pandoc produces: HTML version of the document

followed by a screenshot of the PDF that LaTeX formatted via Pandoc: PDF version of the document

I hope this has piqued your interest in Pandoc. I love the beautiful output of LaTeX but hate working with its syntax. With Pandoc I’m free to compose in Markdown, a language with a very lightweight syntax, and then convert into TeX when and if I want to.

Advertisements

ack – Better than grep?

December 28, 2010 3 comments

I stumbled onto a really nice command line tool named ack while reading a StackOverflow question yesterday.  Living at the domain betterthangrep.com/, it purports to .. be better than grep.  Or, as they put it

ack is a tool like grep, designed for programmers with large trees of heterogeneous source code

I’ve written previously about how to combine find and grep, and really, ack exists to obviate the use of find and grep.  It ignores commonly ignored directories by default (e.g. all those .svn metadata folders that SVN insists on creating), and with a simple command line flag you can tell ack what sort of files you want searched.  Furthermore, because it recurses by default, you don’t need to use the find command to traverse the tree.

Using the todo example, a basic way of searching for the TODOs in all of our java files is to use the command

find . -name "*.java" -exec grep -i -n TODO {} \;

In ack, this is accomplished much easier:

ack -i --java TODO

Furthermore, the matching results are highlighted right away, making it extremely apparent where the matches occur.

I’m going to start using this at work and see if it can replace my grep/find hackery.  Will let you know.  Very impressed so far.

 

If you want to give it a try, the easiest way to install it is with macports:

port install p5-app-ack
Categories: unix Tags: , , , , , , ,

Excel 2008 for Mac’s CSV export bug

December 6, 2010 7 comments
I ran into this at work a few weeks ago and thought I’d share.

Excel 2008’s CSV export feature is broken.  For instance, enter the following fake data into Excel:

Row Name Age
0 Nick 23
1 Bill 48
Save as -> CSV file

Full list of choices

When you use standard unix commands to view the output, the results are all garbled.

[Documents]$ cat Workbook1.csv
1,Bill,48[Documents]$
$ wc -l Workbook1.csv
0 Workbook1.csv
What is the issue?  The file command reveals the problem:
$ file Workbook1.csv
Workbook1.csv: ASCII text, with CR line terminators
CR stands for Carriage return, the ‘\r’ control sequence which, along with the newline character (‘\n’), is used to break up lines on Windows.  Unix OSes like Mac OS expect a single ‘\n’ new line character to terminate lines.
How can we fix this?

dos2unix.

# convert the Workbook1.csv file into a Unix appropriate file
dos2unix Workbook1.csv WithUnixLineEndings.csv
If you don’t have dos2unix on your Mac, and you don’t want to install it, you can fake it with the tr command:
tr '\15' '\n' < Workbook1.csv # remove the carriage returns, replace with a newline
Row,Name,Age
0,Nick,23
1,Bill,48
Very annoying that the Mac Excel doesn’t respect Unix line terminators.  Interestingly, I found a post that talks about ensuring that you choose a CSV file encoded for Mac, but that option seems missing from the Mac version itself.
If I’m missing something obvious, please correct me.

Bash: How to redirect standard error to standard out

November 9, 2010 Leave a comment

Problem:

You have a program which is outputting information to standard error that you wish to search through.  When commands are chained together in Unix via the pipe operator, standard out is connected to standard in.  Thus you cannot easily search the contents of the standard error.  How can you find what you’re looking for?

Solution

The first solution is to save the standard error as a file, and search through the file.

command_producing_standard_error 2> stderr.txt; grep "search string" stderr.txt; rm stderr.txt

This works but you have to remember to remove the text file that’s created in the process.

A better solution, and one that allows you to use the standard error in an existing pipeline is to instead redirect standard error to standard out.

command_producing_standard_error 2>&1 | grep "search string"

Recall that 2 refers to standard error and 1 refers to standard out; those familiar with C/C++ should recognize ‘&’ as the address operator, and it serves a similar role here.  After this command, both the standard out and standard error are in one stream, standard out, and can be connected via the pipe (|) symbol to other programs, such as grep.

This tip is modified from information found in the Bash Cookbook, in the recipe “Saving Output When Redirect Doesn’t Seem To Work”.  Additional solutions and discussion can be found on unix.stackexchange.com.

Categories: unix Tags: , , , , ,

Quotes, quotes, quotes: A primer for the command line

October 25, 2010 1 comment

In Bash programming, there are a lot of ways to get input into programs.  In particular, there are a slew of different quoting methods you should understand.  This article provides a quick reference of the difference between using No quotes, Double Quotes, Single Quotes, and Backticks

No quotes

Standard shell scripts assumes arguments are space delimited.  You can iterate over elements in this way:

 


for i in Hi how are you; do echo $i; done
Hi
how
are
you

 

This is why it is a problem to have spaces in your file names.  For instance,

 


$ ls
with spaces.txt

$ cat with spaces.txt
cat: with: No such file or directory
cat: spaces.txt: No such file or directory

 

Here I naively typed with spaces.txt thinking the cat program could handle it.  Instead, cat saw two arguments: with, and spaces.txt.  In order to handle this, you can either escape the space,

 


$ cat with\ spaces.txt

 

or use the double quotes method.  (Note that if you use tab autocompletion, the backslash escape will be added automatically)

 

Double quotes

Double quotes can be used when you want to group multiple space delimited words together as a single argument.  For instance

for i in "Hi how" "are you"; do echo $i; done
Hi how
are you

In the previous example, I could do

$ cat "with spaces.txt"

and the filename would be passed as a single unit to cat.

An important thing to note is that shell variables are expanded within double quotes.

name=Frank; echo "Hello $name"
Hello Frank

This is crucial to understand.  It also allows you to solve problems caused by having spaces in file names, especially when combined with the * globbing behavior of the shell.  For instance, let’s say we wanted to iterate over all the text files in a directory and do something to them.

$ ls
with spaces.txt   withoutspaces.txt
$ for i in *.txt; do cat $i; done
cat: with: No such file or directory
cat: spaces.txt: No such file or directory
# Surround the $i with quotes and our space problem is solved.
$ for i in *.txt; do cat "$i"; done

(Yes I know iterating over and calling cat on each argument is silly, as cat can accept a list of files (e.g. *.txt).  But it illustrates the point that commands will be confused by spaces in the name and should use double quotes to handle the problem).

Single quotes are also good when you need to embed single quotes in a string (you do not need to escape them)

$ echo "'Single quotes'"
'Single quotes'
$ echo "\"Escaped quotes\""
"Escaped quotes"

Double quotes are my default while I’m working in the terminal.

Single quotes

Single quotes act just like double quotes except that the text inside of them is interpreted literally; in other words, the shell does not attempt to do any more expansion or substitution.  For instance,

$ name=Frank; echo 'Hello $name'
Hello $name

This can save you some backslash escaping your normally would have to do.

Use it when:

 

  • You need double quotes embedded in your string
$ echo '"How are you doing?", she said'
"How are you doing?", she said
  • You do not need any literal single quotes in your string (it’s very difficult to get single quotes/apostrophe literals to appear in such a string)

Back ticks

Back ticks (“, the key to the left of the 1 and above the Tab key on a standard US keyboard), allow you to substitute in the output of another command.  For instance:

$ current_dir=`pwd`
$ echo $current_dir
/Users/nicholasdunn/Desktop/Scripts
[/sourecode]

This can be combined with the double quotes, but will be treated as literal characters in the single quotes:


echo "`pwd`"
/Users/nicholasdunn/Desktop/Scripts
$ echo '`pwd`'
`pwd`

Use when:

You want to capture the results of another command, usually for purposes of assigning a variable.

Hopefully this brief tour through the different types of quotes in bash has been useful.

Categories: Uncategorized, unix Tags: , , ,

bpython – an excellent interpreter for python

October 12, 2010 1 comment

If you use Python, you know that its interactive shell is a great way to test out ideas and iterate quickly.  Unfortunately, the basic interactive shell is very barebones – there is no syntax highlighting, autocompletion, or any of the features we come to expect from working in IDEs.  Fortunately if you’re on a Unix system, there is a great program called bpython which adds all of those missing features.

 

As you are typing, suggestions appear at the bottom. Press tab to take the suggestion

 

If you have easy_install, it’s the simplest thing in the world to install:

sudo easy_install bpython

I can’t recommend this product enough.  It’s free, so what’re you waiting for?

Categories: Python, unix Tags: , , , ,

Mac OSX – copy terminal output to clipboard

October 12, 2010 2 comments

Here’s a quick tip: If you want the results of some shell computation to be accessible to your clipboard (e.g. so you can paste the results into an e-mail or into some pastebin service), you can pipe the command into the `pbcopy` program.

echo "Hello world" | pbcopy
# "Hello world" is now in your clipboard

Apparently there is a way to do a similar thing on Ubuntu as well

Categories: Apple, Uncategorized, unix Tags: , , , , ,