Home > programming, R > R – Sorting a data frame by the contents of a column

R – Sorting a data frame by the contents of a column


Let’s examine how to sort the contents of a data frame by the value of a column

> numPeople = 10
> sex=sample(c("male","female"),numPeople,replace=T)
> age = sample(14:102, numPeople, replace=T)
> income = sample(20:150, numPeople, replace=T)
> minor = age<18

This last statement might look surprising if you’re used to Java or a traditional programming language. Rather than becoming a single boolean/truth value, minor actually becomes a vector of truth values, one per row in the age column.  It’s equivalent to the much more verbose code in Java:

int[] age= ...;
for (int i = 0; i < income.length; i++) {
   minor[i] = age[i] < 18;
}

Just as expected, the value of minor is a vector:

> mode(minor)
[1] "logical"
> minor
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE

Next we create a data frame, which groups together our various vectors into the columns of a data structure:

> population = data.frame(sex=sex, age=age, income=income, minor=minor)
> population
 sex age income minor
1    male  68    150 FALSE
2    male  48     21 FALSE
3  female  68     58 FALSE
4  female  27    124 FALSE
5  female  84    103 FALSE
6    male  92    112 FALSE
7    male  35     65 FALSE
8  female  15    117  TRUE
9    male  89     95 FALSE
10   male  26     54 FALSE

The arguments (sex=sex, age=age, income=income, minor=minor) assign the same names to the columns as I originally named the vectors; I could just as easily call them anything.  For instance,

> data.frame(a=sex, b=age, c=income, minor=minor)
 a  b   c minor
1    male 68 150 FALSE
2    male 48  21 FALSE
3  female 68  58 FALSE
4  female 27 124 FALSE
5  female 84 103 FALSE
6    male 92 112 FALSE
7    male 35  65 FALSE
8  female 15 117  TRUE
9    male 89  95 FALSE
10   male 26  54 FALSE

But I prefer the more descriptive labels I gave previously.

> population
     sex   age income minor
1    male  68    150 FALSE
2    male  48     21 FALSE
3  female  68     58 FALSE
4  female  27    124 FALSE
5  female  84    103 FALSE
6    male  92    112 FALSE
7    male  35     65 FALSE
8  female  15    117  TRUE
9    male  89     95 FALSE
10   male  26     54 FALSE

Now let’s say we want to order by the age of the people. To do that is a one liner:

> population[order(population$age),]
 sex age income minor
8  female  15    117  TRUE
10   male  26     54 FALSE
4  female  27    124 FALSE
7    male  35     65 FALSE
2    male  48     21 FALSE
1    male  68    150 FALSE
3  female  68     58 FALSE
5  female  84    103 FALSE
9    male  89     95 FALSE
6    male  92    112 FALSE

This is not magic; you can select arbitrary rows from any data frame  with the same syntax:

> population[c(1,2,3),]
 sex age income minor
1   male  68    150 FALSE
2   male  48     21 FALSE
3 female  68     58 FALSE

The order function merely returns the indices of the rows in sorted order.

> order(population$age)
 [1]  8 10  4  7  2  1  3  5  9  6

Note the $ syntax; you select columns of a data frame by using a dollar sign and the name of the column. You can retrieve the names of the columns of a data frame with the names function.

> names(population)
[1] "sex"    "age"    "income" "minor" 

> population$income
 [1] 150  21  58 124 103 112  65 117  95  54
> income
 [1] 150  21  58 124 103 112  65 117  95  54

As you can see, they are exactly the same.

So what we’re really doing with the command

population[order(population$age),]

is

population[c(8,10,4,7,2,1,3,5,9,6),]

Note the trailing comma; what this means is to take all the columns. If we only wanted certain columns, we could specify after this comma.

> population[order(population$age),c(1,2)]
 sex age
8  female  15
10   male  26
4  female  27
7    male  35
2    male  48
1    male  68
3  female  68
5  female  84
9    male  89
6    male  92
Categories: programming, R Tags: ,
  1. October 8, 2014 at 7:20 am

    Good day! This is my 1st comment here so I just wanted to give a quick shout out and tell you I really enjoy reading through your posts.
    Can you recommend any other blogs/websites/forums that go over the same topics?
    Thank you so much!

    • Yoyo
      August 11, 2015 at 8:36 pm

      No he can’t.

  2. Kyle Jones
    November 5, 2016 at 9:53 pm

    Fantastic – brilliant explanation

  3. August 3, 2017 at 8:00 am

    Thanks–i keep looking back to this

  4. ALOE
    May 24, 2018 at 7:09 am

    Thank You, did help me to fix a problem

  5. October 4, 2018 at 11:56 am

    Hello,nice share.

  6. November 27, 2018 at 3:47 am

    I have a IsHoliday, WeeklySales and DepartmnentCategory column and I want to know the total WeeklySales if my IsHoliday flag is true and DepartmentCategory is eadible.How to do it? Please suggest.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: