Posts Tagged ‘R’

My ten essential Mac programs

March 24, 2010 3 comments
I recently had the good fortune of getting a new Macbook Pro at work and kept track of the first programs I downloaded and installed.  Here are the first ten applications I installed, as well as descriptions as to why they are so essential to my everyday work.


Firefox is my browser of choice.  No big surprise there.  Not much to say except tabbed browsing is great.


Quicksilver, if you are unfamiliar, is an application launcher for Mac OSX.  If you’re a fan of analogies:
Spotlight : Documents :: Quicksilver : Applications

That’s a bit simplistic, as Quicksilver can do more than just launch applications, but that’s 99% of what I use it for, so the analogy stands.


TinyGrab is an amazingly simple screen shot app for both Windows and Mac OS X (I use it on both platforms and it works better on Mac).  After registering for an account, you keep the app running in the background.  Any time you take a screenshot via Command Shift 3 (full screen capture) or Command Shift 4 (area of screen or window capture), the picture is automatically uploaded to the service, and a small url to the picture is copied to your clipboard.  All of the icons you see here are hosted on TinyGrab’s servers and were uploaded near instantaneously.  I say it works better on Mac than Windows because the Mac one merely hooks onto the act of capturing a screenshot using the already excellent Mac tools; when you press the hotkey to take a picture on Windows, it has to use its own “clip this area of the screen” feature, and it doesn’t work quite as seamlessly as on the Mac.


You may have already seen my previous R posts; R is a programming language intended for statistics.  It has dozens of high-quality open source code modules from mathematicians and scientists from around the world.  It is a great tool for doing exploratory data analysis.

R can be used both interactively through the R Console program, as well as through scripts.


Omnigraffle is a great program for creating vector graphics on the Mac.  It’s intended for use as a diagramming and charting tool, but it’s very versatile and I could imagine uses far outside those domains.  The interface is extremely slick, and the quality of the output is second to none; you can instantly tell when something has been created by Omnigraffle by its extensive use of drop shadows. I used it frequently in college to create diagrams for embedding within computer science and math problem sets.  For instance, I illustrated linked lists and other data structures, saved the output as PNGs, and then included the PNGs within my documents.


If you’re programming in Java and you’re not using an IDE, you are wasting your time.  Netbeans and Eclipse are the two biggies in the Java world; I prefer Netbeans due to its great built-in keyboard macros.  By memorizing a few keyboard shortcuts, you can save dozens of keystrokes from commonly typed phrases.  For instance, declaring constants is usually quite verbose in Java:

public static final int BUFFER_SIZE = 1024;

With netbeans you can shorten the 24 characters before the variable name to five: Psfi -> TAB.  There are a whole raft of such shortcuts, and they are indispensable for easing the pain of Java’s verbosity.

Other great and essential features include the ability to automatically determine which modules need to be imported; this feature alone makes an IDE superior to a dumb text editor.  The other feature that immediately springs to mind is the ability to easily refactor code; you can change the name of a variable in one file and have it propagate to all files that reference it, rather than having to find and replace the string in all the files.


Unlike the other tools in this post, MacPorts is a command line utility.  I use it when I need to install some open source library or project and there is no installer available for my platform.  If there’s a port version of the software available, it handles all the dependency management, installs the libraries where they need to go, and updates all the necessary environment variables.


Textmate is my text editor of choice for all things non-Java.  It makes it very easy to open a directory as a project and then jump around between files within it (with a very smart, intuitive search feature).  Just as netbeans has tab code completions, so does Textmate.  Common shortcuts (“snippets”) are bundled up and distributed with the software; it is also easy to add your own.  It seems to be the de facto standard for web development (every Ruby on Rails developer I’ve ever met uses it).

Two main complaints:

  • Some strange default behavior: If I select a bunch of text and hit tab, I would expect that to indent the text rather than delete the contents of it.  Similarly for shift tab.  Instead, you must hit option tab and option shift tab (that’s a bit of a finger stretcher)
  • You cannot split a window and look at two sections of it at the same time.


WriteRoom is the antithesis of Microsoft Word, or any modern text editor.  Whereas most programs throw feature after feature at you, WriteRoom strips it down to the barest of feature sets.  The minimalist nature extends to the presentation as well; when you boot it up you begin by staring at a full screen blank picture.  Text is monochrome green by default, though both the background and foreground colors can be changed.  By stripping all user interface elements out of the view, you are free to focus on the task of writing without any distractions.

Obviously this is not well suited to all tasks; if you are doing any sort of work in which you need to simultaneously reference other materials (e.g. look at a website or excel spreadsheet at the same time), this is not for you.  But if you need to brainstorm something and get some thoughts down onto paper, this is a great choice.

There is a free Windows clone called Dark Room, and there is a similar product for the Mac in beta called Ommwriter.

MacTex LaTeX distribution

LaTeX (unfortunately named for Google searching) is a typesetting language/program.  It’s used extensively by college professors and others looking for beautifully typeset text and equations.  Unlike Microsoft Word, composing a document using LaTeX is most certainly not WYSIWYG, but its creators see that as a feature and not a bug.  They claim that people waste an inordinate time fiddling with fonts and presentation rather than content.  By formatting your work as a Latex document, you can render it in multiple different ways just by changing a template.

The MacTeX package includes LaTeXIt, TeXShop, and BibDesk, as well as a few other programs I never touch.

LaTeXiT is a small program for creating equations and other snippets to embed in other sources.

TeXShop is a full fledged editor of LaTeX documents; if you’re doing any sort of serious document creation, you’re probably going to do it in TeXShop.  There’s nothing stopping you from composing your documents in any plaintext editor, but you will have to manually run the scripts that convert your text into PDF; TeXShop automates some of that hassle.

BibDesk is a program for managing bibliographic entries.


An excellent chat/IM client for Mac that supports all the big formats.  Recognize your favorite protocol from the icons it supports?

Why install a chat program on a work computer?  IM and chat is a big part of collaborative software development.


Some of these programs are fairly well known (Firefox, Adium, Netbeans), but I hope I have exposed you to some new programs.

Categories: Apple, Uncategorized Tags: , , , ,

R – Sorting a data frame by the contents of a column

February 12, 2010 7 comments

Let’s examine how to sort the contents of a data frame by the value of a column

> numPeople = 10
> sex=sample(c("male","female"),numPeople,replace=T)
> age = sample(14:102, numPeople, replace=T)
> income = sample(20:150, numPeople, replace=T)
> minor = age<18

This last statement might look surprising if you’re used to Java or a traditional programming language. Rather than becoming a single boolean/truth value, minor actually becomes a vector of truth values, one per row in the age column.  It’s equivalent to the much more verbose code in Java:

int[] age= ...;
for (int i = 0; i < income.length; i++) {
   minor[i] = age[i] < 18;

Just as expected, the value of minor is a vector:

> mode(minor)
[1] "logical"
> minor

Next we create a data frame, which groups together our various vectors into the columns of a data structure:

> population = data.frame(sex=sex, age=age, income=income, minor=minor)
> population
 sex age income minor
1    male  68    150 FALSE
2    male  48     21 FALSE
3  female  68     58 FALSE
4  female  27    124 FALSE
5  female  84    103 FALSE
6    male  92    112 FALSE
7    male  35     65 FALSE
8  female  15    117  TRUE
9    male  89     95 FALSE
10   male  26     54 FALSE

The arguments (sex=sex, age=age, income=income, minor=minor) assign the same names to the columns as I originally named the vectors; I could just as easily call them anything.  For instance,

> data.frame(a=sex, b=age, c=income, minor=minor)
 a  b   c minor
1    male 68 150 FALSE
2    male 48  21 FALSE
3  female 68  58 FALSE
4  female 27 124 FALSE
5  female 84 103 FALSE
6    male 92 112 FALSE
7    male 35  65 FALSE
8  female 15 117  TRUE
9    male 89  95 FALSE
10   male 26  54 FALSE

But I prefer the more descriptive labels I gave previously.

> population
     sex   age income minor
1    male  68    150 FALSE
2    male  48     21 FALSE
3  female  68     58 FALSE
4  female  27    124 FALSE
5  female  84    103 FALSE
6    male  92    112 FALSE
7    male  35     65 FALSE
8  female  15    117  TRUE
9    male  89     95 FALSE
10   male  26     54 FALSE

Now let’s say we want to order by the age of the people. To do that is a one liner:

> population[order(population$age),]
 sex age income minor
8  female  15    117  TRUE
10   male  26     54 FALSE
4  female  27    124 FALSE
7    male  35     65 FALSE
2    male  48     21 FALSE
1    male  68    150 FALSE
3  female  68     58 FALSE
5  female  84    103 FALSE
9    male  89     95 FALSE
6    male  92    112 FALSE

This is not magic; you can select arbitrary rows from any data frame  with the same syntax:

> population[c(1,2,3),]
 sex age income minor
1   male  68    150 FALSE
2   male  48     21 FALSE
3 female  68     58 FALSE

The order function merely returns the indices of the rows in sorted order.

> order(population$age)
 [1]  8 10  4  7  2  1  3  5  9  6

Note the $ syntax; you select columns of a data frame by using a dollar sign and the name of the column. You can retrieve the names of the columns of a data frame with the names function.

> names(population)
[1] "sex"    "age"    "income" "minor" 

> population$income
 [1] 150  21  58 124 103 112  65 117  95  54
> income
 [1] 150  21  58 124 103 112  65 117  95  54

As you can see, they are exactly the same.

So what we’re really doing with the command




Note the trailing comma; what this means is to take all the columns. If we only wanted certain columns, we could specify after this comma.

> population[order(population$age),c(1,2)]
 sex age
8  female  15
10   male  26
4  female  27
7    male  35
2    male  48
1    male  68
3  female  68
5  female  84
9    male  89
6    male  92
Categories: programming, R Tags: ,