Archive
Go gotcha #0: Why taking the address of an iterated variable is wrong
Go mascot – by Renée French under Creative Commons Attribution-Share Alike 3.0 Unported from http://en.wikipedia.org/wiki/File:Golang.png
Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.
Go is my new favorite programming language. It’s compact, garbage collected, terse, and very easy to read. There are some things that trip me up even now after I’ve been using it for awhile. Today I’m going to discuss the range
construct and how it has a surprising feature that might violate your assumptions.
Range
First, the range
keyword is a way to iterate through the various builtin data structures in Go. For instance,
a := map[string]int {
"hello": 1,
"world": 2,
}
// 2 element range gets key and value
for key, value := range a {
fmt.Printf("key %s value %d\n", key, value)
}
// 1 element is just the key
for key := range a {
fmt.Printf("key %s\n", key)
}
// Works for slices (think of them as vectors/lists) too
b := []string {"hello", "world"}
// 2 element range gets the index as well as the entry
for i, s := range b {
fmt.Printf("entry %d: %s\n", i, s)
}
// 1 element gets just the index (notice the pattern?)
for i := range b {
fmt.Printf("entry %d\n", i)
}
This outputs
key hello value 1
key world value 2
key hello
key world
entry 0: hello
entry 1: world
entry 0
entry 1
Try this code in the Go Playground
Solution search – pointers
Imagine the case where we have a struct as follows
type Solution struct {
Name string
Cost int
Complete bool
}
Say that we’re doing some sort of optimization where we’re looking for the minimum cost solution that meets some criteria; for simplicity’s sake, I’ve put that as the ‘complete’ bool. It’s possible that no such Solution matches, in which case we return a nil solution.
A reasonable implementation would be as follows
func FindBestSolution(solutions []Solution) *Solution {
var best *Solution
for _, solution := range solutions {
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
best = &solution
}
}
}
return best
}
Do you see the bug? Don’t worry if you don’t – I’ve made this mistake a few times now.
Let’s add some tests to find the problem. This is an example of a table driven test, where the test cases are given as a slice of struct literals. This makes it very easy to add new test cases.
func TestFindBestSolution(t *testing.T) {
tests := []struct {
name string
solutions []Solution
want *Solution
}{
{
name: "Nil list",
solutions: nil,
want: nil,
},
{
name: "No complete solution",
solutions: []Solution{
{
Name: "Foo",
Cost: 25,
Complete: false,
},
},
want: nil,
},
{
name: "Sole solution",
solutions: []Solution{
{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
want: &Solution{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
{
name: "Multiple complete solution",
solutions: []Solution{
{
Name: "Foo",
Cost: 25,
Complete: false,
},
{
Name: "Bar",
Cost: 12,
Complete: true,
},
{
Name: "Baz",
Cost: 25,
Complete: true,
},
},
want: &Solution{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
}
for _, test := range tests {
got := FindBestSolution(test.solutions)
if got == nil && test.want != nil {
t.Errorf("FindBestSolution(%q): got nil wanted %v", test.name, *test.want)
} else if got != nil && test.want == nil {
t.Errorf("FindBestSolution(%q): got %v wanted nil", test.name, *got)
} else if got == nil && test.want == nil {
// This is OK
} else if *got != *test.want {
t.Errorf("FindBestSolution(%q): got %v wanted %v", test.name, *got, *test.want)
}
}
}
If you run the tests you’ll find that the last test fails:
--- FAIL: TestFindBestSolution (0.00 seconds)
prog.go:82: FindBestSolution("One complete solution"): got {Baz 25 true} wanted {Bar 12 true}
FAIL
[process exited with non-zero status]
This is strange – it works fine in the single element case, but not with multiple values. Let’s try adding a case where the correct value is last in the list.
{
name: "Multiple - correct solution is last",
solutions: []Solution{
{
Name: "Baz",
Cost: 25,
Complete: true,
},
{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
want: &Solution{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
Sure enough, this test passes. So somehow if the element is last the algorithm works. What’s going on?
From the go-wiki entry on Range:
When iterating over a slice or map of values, one might try this:
items := make([]map[int]int, 10)
for _, item := range items {
item = make(map[int]int, 1) // Oops! item is only a copy of the slice element.
item[1] = 2 // This 'item' will be lost on the next iteration.
}
The make and assignment look like they might work, but the value property of range (stored here as item) is a copy of the value from items, not a pointer to the value in items.
This is exactly what’s happening in this case. The solution
variable is getting a copy of each entry, not the entry itself. Thus when you take the address of the entry, you end up with a pointer pointing at the LAST element in the slice (since the iteration stops at that point). To illustrate:
package main
import "fmt"
func main() {
strings := []string{"some","value"}
for i, s := range strings {
fmt.Printf("Element %d: %s Pointer %v\n", i, s, &s)
}
}
Element 0: some Pointer 0x10500168
Element 1: value Pointer 0x10500168
Note that the same pointer is used in both cases. This explains why the Solution pointer ended up pointing at the last element of the slice.
Playground
So how do we work around this problem? The key is to introduce a new variable whose address it’s safe to take; its contents won’t change out from underneath you.
Broken:
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
best = &solution
}
}
Fixed:
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
tmp := solution
best = &tmp
}
}
With this patch the tests pass:
PASS
Program exited.
Alternative design
A great feature of Go is that you can return multiple values from a single function. Here’s an alternative implementation that doesn’t suffer from the previous problem.
func FindBestSolution(solutions []Solution) (Solution, bool) {
var best Solution
found := false
for _, solution := range solutions {
if solution.Complete {
if !found || solution.Cost < best.Cost {
best = solution
found = true
}
}
}
return best, found
}
Since best
is copying the VALUE of the solution
variable, this works correctly. You can play with this example and see how the tests change in the Playground.
This illustrates one other nice feature of Go – all types have a ‘zero’ value that is legal to use. For strings this is the empty string, for pointers it’s nil, for ints it’s 0, for structs all of types are set to zero values. The line var best Solution
implicitly sets best to be the zero solution. If I wanted to I could get rid of the found
bool altogether and just compare the returned solution with another zero valued Solution
.
Conclusion
I introduced some basic features of Go, including maps, slices, range
, structs, and functions. I provided links to the amazingly useful Go playground which lets you easily test out code, format it, and share it with others.
I showed two implementations of a function that searches through a slice of struct values, searching for a solution that meets some criteria.
The first example using pointers led to a subtle bug that’s hard to find and solve unless you know how range
works. I showed how to write unit tests that exercise the function and helped flush out the bug. I also explained what the bug was and how to work around it.
Finally I showed a version of the same function that uses Go’s multiple return types to return a found boolean rather than using a nil pointer to signify that the value wasn’t found.
The Pomodoro technique
Erato, via Wikipedia licensed under Creative Commons Attribution-Share Alike 2.5 Generic
The Pomodoro technique is a productivity tool based on two premises:
- Multitasking is inherently inefficient
- One cannot maintain consistent performance at tasks for prolonged periods of time
You work for 25 minutes straight on one task; this is one pomodoro. If you are interrupted, you deal with the interruption and start over – there are no partial pomodoros, so you have to discard the one in progress.
After a 25 minute work period, you take a 5 minute break. Really – you are not allowed to keep working. Check email, look over your task list, whatever. I find it’s a good time to get up and walk around and shift my visual attention so I’m not constantly staring at a screen.
After 4 pomodoros, you are permitted a longer (15-30 minute) break.
For an added layer of complexity you can estimate how many pomodoros each task will take, and then compare it to how many it actually takes. This is a good way to train yourself to more accurately estimate task complexity.
I like this technique for a few reasons.
- It forces me to take breaks, walk around, stretch and otherwise avoid melting into my chair for 8 hours at a time.
- It forces me to break down complex tasks into small, manageable chunks. If I can’t complete a task in a few pomodoros, it’s probably too big.
- It lets me track my productivity over time. It’s easy to say that I was constantly interrupted on Monday, but it’s easier to quantify if I can show that I only got 6 pomodoros done instead of my normal 10-12.
My problem with this technique is that it takes too long to get back into the coding mindset after a break. Some estimate it takes between 10-15 minutes to resume coding after an interruption; if you are interrupted by a break every 25 minutes, you’re not going to get much accomplished on a complicated piece of code. I sometimes find the break comes at an inopportune time, when I’m just on the verge of finishing something. I usually have to quickly dump some thoughts into the file I’m editing as to what I was doing and what my next step was.
Do you use the pomodoro technique while programming? If so, do you find the recommended 25/5 breakdown sufficient for getting work done? Do you increase it, decrease it?
Chrome’s sound indicator – a welcomed innovation for lessening embarrasment
I sat at my desk early in the morning while the office was mostly empty. A coworker arrived, said hello and sat down to work. She put on her headphones and started blasting music. The only problem? Her headphones were unplugged and the music was playing for the whole office to hear.
Accidental noise like this is very embarrassing. I welcome all innovations that make this less likely to occur. I was blown away when I used an iPod for the first time and unplugged the headphones. Unlike the CD players I had used before, the music paused. With the iPod it didn’t make sense to continue to play music without the headphones plugged in, since there was no internal speaker.
I am happy that both iPhone and Android adopted this convention as well. You’re much less likely to embarrass yourself if the sound stops as soon as headphones are removed rather than automatically switching over to the external speakers. I wish that laptops did the same, but alas.
I live most of my work day in a browser. I often end up with a whole slew of tabs. Suddenly I hear some sort of noise – one of the dozen pages I have open is blaring an ad. I usually have to search each tab one by one to try to find the culprit. Chrome recently launched a new feature to visually distinguish which tab is making noise. This is a brilliant innovation, and one I welcome in my quest not to make a fool out of myself in front of others.
Disclaimer: I work at Google. The opinions expressed are my own and do not represent that of my employer.
“Everyone should be able to pull and analyze data”

“Data overload” Islam Elsedoudi via flickr- cc http://creativecommons.org/licenses/by-sa/2.0/
Everyone should be able to write spaghetti code, and everyone should be able to pull and analyze data. And I’m not just talking about business-folk here.
…
Look at what’s going on in the digital humanities. Now, even literature, history, and religious scholars can use data to shed new insight on old texts. How awesome is that? But you have to be able to actually analyze the data. That means being able to query and scrub; that means knowing a bit of probability and statistics. The difference between a median and mean would be a start.So yes, it’s no longer acceptable to say, “I suck at math!” and then ignore that part of the world.
I suck at physical exercise, but that doesn’t mean it’s OK for me to melt into a chair all day. We all need to work at the important stuff in life, and understanding data has become terribly important.
- John Foreman, chief data scientist at MailChimp. Read the full interview on chartio.com.
I agree with the overall sentiment of the quote, that more people should be able to do basic data scraping and analysis. Unfortunately, I don’t see it happening anytime soon for two reasons – the tools to analyze data are complicated to non-engineers and most people do not receive training in programming (to script and pull the data in the first place) or statistics (to crunch the data and draw valid insights).
Even if everyone had the skills and tools necessary to pull and analyze the data, there would still be a need for skilled analysts / data scientists. Executives and product managers often don’t have the time to do analysis themselves; it’s not efficient for them to do so. Analysts fulfill an important role by distilling raw data into products and insights.