Archive

Archive for February, 2014

Go gotcha #0: Why taking the address of an iterated variable is wrong

February 25, 2014 6 comments

Golang mascot
Go mascot – by Renée French under Creative Commons Attribution-Share Alike 3.0 Unported from http://en.wikipedia.org/wiki/File:Golang.png

Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.

Go is my new favorite programming language. It’s compact, garbage collected, terse, and very easy to read. There are some things that trip me up even now after I’ve been using it for awhile. Today I’m going to discuss the range construct and how it has a surprising feature that might violate your assumptions.

Range

First, the range keyword is a way to iterate through the various builtin data structures in Go. For instance,

a := map[string]int {
    "hello": 1,
    "world": 2,
}
// 2 element range gets key and value
for key, value := range a {
    fmt.Printf("key %s value %d\n", key, value)
}
// 1 element is just the key
for key := range a {
    fmt.Printf("key %s\n", key)
}

// Works for slices (think of them as vectors/lists) too
b := []string {"hello", "world"}
// 2 element range gets the index as well as the entry
for i, s := range b {
    fmt.Printf("entry %d: %s\n", i, s)
}
// 1 element gets just the index (notice the pattern?)
for i := range b {
    fmt.Printf("entry %d\n", i)
}

This outputs

key hello value 1
key world value 2
key hello
key world
entry 0: hello
entry 1: world
entry 0
entry 1

Try this code in the Go Playground

Solution search – pointers

Imagine the case where we have a struct as follows

type Solution struct {
    Name string
    Cost int
    Complete bool
}

Say that we’re doing some sort of optimization where we’re looking for the minimum cost solution that meets some criteria; for simplicity’s sake, I’ve put that as the ‘complete’ bool. It’s possible that no such Solution matches, in which case we return a nil solution.

A reasonable implementation would be as follows

func FindBestSolution(solutions []Solution) *Solution {
    var best *Solution
    for _, solution := range solutions {
        if solution.Complete {
            if best == nil || solution.Cost < best.Cost {
                best = &solution
            }
        }
    }
    return best
}

Do you see the bug? Don’t worry if you don’t – I’ve made this mistake a few times now.

Let’s add some tests to find the problem. This is an example of a table driven test, where the test cases are given as a slice of struct literals. This makes it very easy to add new test cases.

func TestFindBestSolution(t *testing.T) {
    tests := []struct {
        name      string
        solutions []Solution
        want      *Solution
    }{
        {
            name:      "Nil list",
            solutions: nil,
            want:      nil,
        },
        {
            name: "No complete solution",
            solutions: []Solution{
                {
                    Name:     "Foo",
                    Cost:     25,
                    Complete: false,
                },
            },
            want: nil,
        },
        {
            name: "Sole solution",
            solutions: []Solution{
                {
                    Name:     "Bar",
                    Cost:     12,
                    Complete: true,
                },
            },
            want: &Solution{
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
        {
            name: "Multiple complete solution",
            solutions: []Solution{
                {
                    Name:     "Foo",
                    Cost:     25,
                    Complete: false,
                },
                {
                    Name:     "Bar",
                    Cost:     12,
                    Complete: true,
                },
                {
                    Name:     "Baz",
                    Cost:     25,
                    Complete: true,
                },
            },
            want: &Solution{
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
    }
    for _, test := range tests {
        got := FindBestSolution(test.solutions)
        if got == nil && test.want != nil {
            t.Errorf("FindBestSolution(%q): got nil wanted %v", test.name, *test.want)
        } else if got != nil && test.want == nil {
            t.Errorf("FindBestSolution(%q): got %v wanted nil", test.name, *got)
        } else if got == nil && test.want == nil {
            // This is OK
        } else if *got != *test.want {
            t.Errorf("FindBestSolution(%q): got %v wanted %v", test.name, *got, *test.want)
        }
    }
}

If you run the tests you’ll find that the last test fails:

--- FAIL: TestFindBestSolution (0.00 seconds)
    prog.go:82: FindBestSolution("One complete solution"): got {Baz 25 true} wanted {Bar 12 true}
FAIL
 [process exited with non-zero status]

This is strange – it works fine in the single element case, but not with multiple values. Let’s try adding a case where the correct value is last in the list.

    {
        name: "Multiple - correct solution is last",
        solutions: []Solution{
            {
                Name:     "Baz",
                Cost:     25,
                Complete: true,
            },
            {
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
        want: &Solution{
            Name:     "Bar",
            Cost:     12,
            Complete: true,
        },
    },

Sure enough, this test passes. So somehow if the element is last the algorithm works. What’s going on?

From the go-wiki entry on Range:

When iterating over a slice or map of values, one might try this:

items := make([]map[int]int, 10)
for _, item := range items {
        item = make(map[int]int, 1) // Oops! item is only a copy of the slice element.
        item[1] = 2                 // This 'item' will be lost on the next iteration.
}

The make and assignment look like they might work, but the value property of range (stored here as item) is a copy of the value from items, not a pointer to the value in items.

This is exactly what’s happening in this case. The solution variable is getting a copy of each entry, not the entry itself. Thus when you take the address of the entry, you end up with a pointer pointing at the LAST element in the slice (since the iteration stops at that point). To illustrate:

package main

import "fmt"

func main() {
    strings := []string{"some","value"}
    for i, s := range strings {
        fmt.Printf("Element %d: %s Pointer %v\n", i, s, &s)
    }
}

Element 0: some Pointer 0x10500168
Element 1: value Pointer 0x10500168

Note that the same pointer is used in both cases. This explains why the Solution pointer ended up pointing at the last element of the slice.
Playground

So how do we work around this problem? The key is to introduce a new variable whose address it’s safe to take; its contents won’t change out from underneath you.

Broken:

if solution.Complete {
    if best == nil || solution.Cost < best.Cost {
        best = &solution
    }
}

Fixed:

if solution.Complete {
    if best == nil || solution.Cost < best.Cost {
        tmp := solution
        best = &tmp
    }
}

With this patch the tests pass:

PASS

Program exited.

Alternative design

A great feature of Go is that you can return multiple values from a single function. Here’s an alternative implementation that doesn’t suffer from the previous problem.

func FindBestSolution(solutions []Solution) (Solution, bool) {
    var best Solution
    found := false
    for _, solution := range solutions {
        if solution.Complete {
            if !found || solution.Cost < best.Cost {
                best = solution
                found = true
            }
        }
    }
    return best, found
}

Since best is copying the VALUE of the solution variable, this works correctly. You can play with this example and see how the tests change in the Playground.

This illustrates one other nice feature of Go – all types have a ‘zero’ value that is legal to use. For strings this is the empty string, for pointers it’s nil, for ints it’s 0, for structs all of types are set to zero values. The line var best Solution implicitly sets best to be the zero solution. If I wanted to I could get rid of the found bool altogether and just compare the returned solution with another zero valued Solution.

Conclusion

I introduced some basic features of Go, including maps, slices, range, structs, and functions. I provided links to the amazingly useful Go playground which lets you easily test out code, format it, and share it with others.

I showed two implementations of a function that searches through a slice of struct values, searching for a solution that meets some criteria.

The first example using pointers led to a subtle bug that’s hard to find and solve unless you know how range works. I showed how to write unit tests that exercise the function and helped flush out the bug. I also explained what the bug was and how to work around it.

Finally I showed a version of the same function that uses Go’s multiple return types to return a found boolean rather than using a nil pointer to signify that the value wasn’t found.

Advertisements

The Pomodoro technique

February 17, 2014 1 comment

Pomodoro (tomato) kitchen timer

Erato, via Wikipedia licensed under Creative Commons Attribution-Share Alike 2.5 Generic

The Pomodoro technique is a productivity tool based on two premises:

  1. Multitasking is inherently inefficient
  2. One cannot maintain consistent performance at tasks for prolonged periods of time

You work for 25 minutes straight on one task; this is one pomodoro. If you are interrupted, you deal with the interruption and start over – there are no partial pomodoros, so you have to discard the one in progress.

After a 25 minute work period, you take a 5 minute break. Really – you are not allowed to keep working. Check email, look over your task list, whatever. I find it’s a good time to get up and walk around and shift my visual attention so I’m not constantly staring at a screen.

After 4 pomodoros, you are permitted a longer (15-30 minute) break.

For an added layer of complexity you can estimate how many pomodoros each task will take, and then compare it to how many it actually takes. This is a good way to train yourself to more accurately estimate task complexity.

I like this technique for a few reasons.

  • It forces me to take breaks, walk around, stretch and otherwise avoid melting into my chair for 8 hours at a time.
  • It forces me to break down complex tasks into small, manageable chunks. If I can’t complete a task in a few pomodoros, it’s probably too big.
  • It lets me track my productivity over time. It’s easy to say that I was constantly interrupted on Monday, but it’s easier to quantify if I can show that I only got 6 pomodoros done instead of my normal 10-12.

My problem with this technique is that it takes too long to get back into the coding mindset after a break. Some estimate it takes between 10-15 minutes to resume coding after an interruption; if you are interrupted by a break every 25 minutes, you’re not going to get much accomplished on a complicated piece of code. I sometimes find the break comes at an inopportune time, when I’m just on the verge of finishing something. I usually have to quickly dump some thoughts into the file I’m editing as to what I was doing and what my next step was.

Do you use the pomodoro technique while programming? If so, do you find the recommended 25/5 breakdown sufficient for getting work done? Do you increase it, decrease it?

Chrome’s sound indicator – a welcomed innovation for lessening embarrasment

February 13, 2014 1 comment

Which tab is playing music? It's easy to tell now

I sat at my desk early in the morning while the office was mostly empty. A coworker arrived, said hello and sat down to work. She put on her headphones and started blasting music. The only problem? Her headphones were unplugged and the music was playing for the whole office to hear.

Accidental noise like this is very embarrassing. I welcome all innovations that make this less likely to occur. I was blown away when I used an iPod for the first time and unplugged the headphones. Unlike the CD players I had used before, the music paused. With the iPod it didn’t make sense to continue to play music without the headphones plugged in, since there was no internal speaker.

I am happy that both iPhone and Android adopted this convention as well. You’re much less likely to embarrass yourself if the sound stops as soon as headphones are removed rather than automatically switching over to the external speakers. I wish that laptops did the same, but alas.

I live most of my work day in a browser. I often end up with a whole slew of tabs. Suddenly I hear some sort of noise – one of the dozen pages I have open is blaring an ad. I usually have to search each tab one by one to try to find the culprit. Chrome recently launched a new feature to visually distinguish which tab is making noise. This is a brilliant innovation, and one I welcome in my quest not to make a fool out of myself in front of others.

Disclaimer: I work at Google. The opinions expressed are my own and do not represent that of my employer.

Categories: iPod, UI Tags: , , , , , ,

“Everyone should be able to pull and analyze data”

February 11, 2014 Leave a comment

“Data overload” Islam Elsedoudi via flickr- cc http://creativecommons.org/licenses/by-sa/2.0/

Everyone should be able to write spaghetti code, and everyone should be able to pull and analyze data. And I’m not just talking about business-folk here.


Look at what’s going on in the digital humanities. Now, even literature, history, and religious scholars can use data to shed new insight on old texts. How awesome is that? But you have to be able to actually analyze the data. That means being able to query and scrub; that means knowing a bit of probability and statistics. The difference between a median and mean would be a start.

So yes, it’s no longer acceptable to say, “I suck at math!” and then ignore that part of the world.

I suck at physical exercise, but that doesn’t mean it’s OK for me to melt into a chair all day. We all need to work at the important stuff in life, and understanding data has become terribly important.

I agree with the overall sentiment of the quote, that more people should be able to do basic data scraping and analysis. Unfortunately, I don’t see it happening anytime soon for two reasons – the tools to analyze data are complicated to non-engineers and most people do not receive training in programming (to script and pull the data in the first place) or statistics (to crunch the data and draw valid insights).

Even if everyone had the skills and tools necessary to pull and analyze the data, there would still be a need for skilled analysts / data scientists. Executives and product managers often don’t have the time to do analysis themselves; it’s not efficient for them to do so. Analysts fulfill an important role by distilling raw data into products and insights.