Archive

Posts Tagged ‘garbage collection’

Go gotcha #0: Why taking the address of an iterated variable is wrong

February 25, 2014 6 comments

Golang mascot
Go mascot – by Renée French under Creative Commons Attribution-Share Alike 3.0 Unported from http://en.wikipedia.org/wiki/File:Golang.png

Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.

Go is my new favorite programming language. It’s compact, garbage collected, terse, and very easy to read. There are some things that trip me up even now after I’ve been using it for awhile. Today I’m going to discuss the range construct and how it has a surprising feature that might violate your assumptions.

Range

First, the range keyword is a way to iterate through the various builtin data structures in Go. For instance,

a := map[string]int {
    "hello": 1,
    "world": 2,
}
// 2 element range gets key and value
for key, value := range a {
    fmt.Printf("key %s value %d\n", key, value)
}
// 1 element is just the key
for key := range a {
    fmt.Printf("key %s\n", key)
}

// Works for slices (think of them as vectors/lists) too
b := []string {"hello", "world"}
// 2 element range gets the index as well as the entry
for i, s := range b {
    fmt.Printf("entry %d: %s\n", i, s)
}
// 1 element gets just the index (notice the pattern?)
for i := range b {
    fmt.Printf("entry %d\n", i)
}

This outputs

key hello value 1
key world value 2
key hello
key world
entry 0: hello
entry 1: world
entry 0
entry 1

Try this code in the Go Playground

Solution search – pointers

Imagine the case where we have a struct as follows

type Solution struct {
    Name string
    Cost int
    Complete bool
}

Say that we’re doing some sort of optimization where we’re looking for the minimum cost solution that meets some criteria; for simplicity’s sake, I’ve put that as the ‘complete’ bool. It’s possible that no such Solution matches, in which case we return a nil solution.

A reasonable implementation would be as follows

func FindBestSolution(solutions []Solution) *Solution {
    var best *Solution
    for _, solution := range solutions {
        if solution.Complete {
            if best == nil || solution.Cost < best.Cost {
                best = &solution
            }
        }
    }
    return best
}

Do you see the bug? Don’t worry if you don’t – I’ve made this mistake a few times now.

Let’s add some tests to find the problem. This is an example of a table driven test, where the test cases are given as a slice of struct literals. This makes it very easy to add new test cases.

func TestFindBestSolution(t *testing.T) {
    tests := []struct {
        name      string
        solutions []Solution
        want      *Solution
    }{
        {
            name:      "Nil list",
            solutions: nil,
            want:      nil,
        },
        {
            name: "No complete solution",
            solutions: []Solution{
                {
                    Name:     "Foo",
                    Cost:     25,
                    Complete: false,
                },
            },
            want: nil,
        },
        {
            name: "Sole solution",
            solutions: []Solution{
                {
                    Name:     "Bar",
                    Cost:     12,
                    Complete: true,
                },
            },
            want: &Solution{
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
        {
            name: "Multiple complete solution",
            solutions: []Solution{
                {
                    Name:     "Foo",
                    Cost:     25,
                    Complete: false,
                },
                {
                    Name:     "Bar",
                    Cost:     12,
                    Complete: true,
                },
                {
                    Name:     "Baz",
                    Cost:     25,
                    Complete: true,
                },
            },
            want: &Solution{
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
    }
    for _, test := range tests {
        got := FindBestSolution(test.solutions)
        if got == nil && test.want != nil {
            t.Errorf("FindBestSolution(%q): got nil wanted %v", test.name, *test.want)
        } else if got != nil && test.want == nil {
            t.Errorf("FindBestSolution(%q): got %v wanted nil", test.name, *got)
        } else if got == nil && test.want == nil {
            // This is OK
        } else if *got != *test.want {
            t.Errorf("FindBestSolution(%q): got %v wanted %v", test.name, *got, *test.want)
        }
    }
}

If you run the tests you’ll find that the last test fails:

--- FAIL: TestFindBestSolution (0.00 seconds)
    prog.go:82: FindBestSolution("One complete solution"): got {Baz 25 true} wanted {Bar 12 true}
FAIL
 [process exited with non-zero status]

This is strange – it works fine in the single element case, but not with multiple values. Let’s try adding a case where the correct value is last in the list.

    {
        name: "Multiple - correct solution is last",
        solutions: []Solution{
            {
                Name:     "Baz",
                Cost:     25,
                Complete: true,
            },
            {
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
        want: &Solution{
            Name:     "Bar",
            Cost:     12,
            Complete: true,
        },
    },

Sure enough, this test passes. So somehow if the element is last the algorithm works. What’s going on?

From the go-wiki entry on Range:

When iterating over a slice or map of values, one might try this:

items := make([]map[int]int, 10)
for _, item := range items {
        item = make(map[int]int, 1) // Oops! item is only a copy of the slice element.
        item[1] = 2                 // This 'item' will be lost on the next iteration.
}

The make and assignment look like they might work, but the value property of range (stored here as item) is a copy of the value from items, not a pointer to the value in items.

This is exactly what’s happening in this case. The solution variable is getting a copy of each entry, not the entry itself. Thus when you take the address of the entry, you end up with a pointer pointing at the LAST element in the slice (since the iteration stops at that point). To illustrate:

package main

import "fmt"

func main() {
    strings := []string{"some","value"}
    for i, s := range strings {
        fmt.Printf("Element %d: %s Pointer %v\n", i, s, &s)
    }
}

Element 0: some Pointer 0x10500168
Element 1: value Pointer 0x10500168

Note that the same pointer is used in both cases. This explains why the Solution pointer ended up pointing at the last element of the slice.
Playground

So how do we work around this problem? The key is to introduce a new variable whose address it’s safe to take; its contents won’t change out from underneath you.

Broken:

if solution.Complete {
    if best == nil || solution.Cost < best.Cost {
        best = &solution
    }
}

Fixed:

if solution.Complete {
    if best == nil || solution.Cost < best.Cost {
        tmp := solution
        best = &tmp
    }
}

With this patch the tests pass:

PASS

Program exited.

Alternative design

A great feature of Go is that you can return multiple values from a single function. Here’s an alternative implementation that doesn’t suffer from the previous problem.

func FindBestSolution(solutions []Solution) (Solution, bool) {
    var best Solution
    found := false
    for _, solution := range solutions {
        if solution.Complete {
            if !found || solution.Cost < best.Cost {
                best = solution
                found = true
            }
        }
    }
    return best, found
}

Since best is copying the VALUE of the solution variable, this works correctly. You can play with this example and see how the tests change in the Playground.

This illustrates one other nice feature of Go – all types have a ‘zero’ value that is legal to use. For strings this is the empty string, for pointers it’s nil, for ints it’s 0, for structs all of types are set to zero values. The line var best Solution implicitly sets best to be the zero solution. If I wanted to I could get rid of the found bool altogether and just compare the returned solution with another zero valued Solution.

Conclusion

I introduced some basic features of Go, including maps, slices, range, structs, and functions. I provided links to the amazingly useful Go playground which lets you easily test out code, format it, and share it with others.

I showed two implementations of a function that searches through a slice of struct values, searching for a solution that meets some criteria.

The first example using pointers led to a subtle bug that’s hard to find and solve unless you know how range works. I showed how to write unit tests that exercise the function and helped flush out the bug. I also explained what the bug was and how to work around it.

Finally I showed a version of the same function that uses Go’s multiple return types to return a found boolean rather than using a nil pointer to signify that the value wasn’t found.

Advertisements

Javascript 101 Week 2: Functions, Encapsulation, Augmentation

February 5, 2011 4 comments

Here’s the second week of homework/ reflection for the JS 101 course offered through Peer 2 Peer University.  You can read my week one homework as well.

Why do languages provide the switch statement, when we can achieve the same thing with multiple if… elseif statements? Show one example of how you might use the switch statement.

Languages provide the switch statement for two main reasons. The first is that they are arguably more readable than multiple if/else if statements. Secondly, in some languages the compiler can optimize switch statements to be more efficient than the equivalent if/else if/else if construct. See the wikipedia article on Switch statements and Branch tables if you’re interested in the details.

Unfortunately, switch statements can be more error prone due to the ability to ‘fall through’ switch statements. Consider the following:

var BOY = 1;
var GIRL = 2;
var sentence = "Congratulations, it's a ";
var gender = GIRL;
switch (gender) {
    case BOY:
        sentence += "boy!";
    // ERROR: no break statement; will fall through to girl case, ending
    // with the sentence "Congratulations, it's a boy!girl!" if gender == BOY
    case GIRL:
        sentence += "girl!";
}

(Note: the above example is somewhat contrived, but forgetting to put a break at the end of a case is a very common error. Please be cognizant of it when using switch.)

Furthermore, programmers frequently forget to provide a default case to handle the case when none of the alternative options match. This frequently happens in Java with enumerations, since the compiler won’t mark this as an error:

enum Foo {
    FOO_1,
    FOO_2
}

Foo x = …;
switch (x) {
    case FOO_1:
        …
        break;
    case FOO_2:
        …
        break;
}

// Everything is fine.  Later, the Foo enum is modified, and a new option is added

enum Foo {
    FOO_1,
    FOO_2,
    FOO_3
}

With the addition of the new enumerated value, we can inadvertently fall through all the options without having executed any code. It is for this reason that I suggest you always provide a default case, even if all it does is throw an exception indicating that this should never happen. Trust me, you’ll find your bugs much earlier/easier this way.

What is encapsulation, and what do functions encapsulate?

Encapsulation is a programming technique to manage complexity and reduce duplication. If I am writing a geometric library I might have code to calculate the area of a circle in a few different places. If this happens once or twice, that might not be the end of the world. The problem is that it becomes very difficult to keep the code in sync in multiple places. A better solution is to break out the logic for calculating the area into a separate function, and then call that function from various places.

This has three main benefits.

  1. This encapsulation of the logic reduces code duplication. Code duplication is bad, as it means that any time you find a bug or want to change something, you need to remember to change it in multiple places. This is a maintenance headache and is extremely error prone.
  2. It makes the code easier to read by better expressing the intent of what you are trying to do and by hiding the details. For instance, instead of having a few lines of code to calculate some number, you can move these lines into a small helper function whose name expresses what the calculation does.
  3. By encapsulating the implementation details within a function, you can change the underlying algorithm without necessitating any client code to change. For instance, you might change a recursive function to an iterative approach for efficiency purposes, but as long as the method name and parameter list stay constant, dependent code is unaffected.

Encapsulation is extremely useful for all the above reasons. It allows the programmer to work at higher levels of abstraction, by hiding implementation details and allowing complex functionality to be built from multiple small, simple function calls.

What is a pure function? Is the function show() provided in Eloquent Javascript a pure function?

A pure function is one without side effects, i.e. a function in the mathematical sense of the word. (It doesn’t make use of variables other than those passed in to the function, so that you are always guaranteed to get the same output for the same input) show is not a pure function, as it has the side effect of writing to the screen.

What do we mean when we say a variable in a function is shadowing a top level variable?

If two variables have the same name, with one being in the global scope and one being in the function scope, the function scoped variable shadows that of the global, top level variable. This means that when we refer to the variable within the function, we access the function scoped variable rather than the top level variable. For instance,


// No var declaration; this is global
name = "Nick";

function greet() {
    // This name variable shadows the global name declaration.
    var name = "Kitty";
    alert("Hi " + name);
}
// Says "Hi Kitty"
greet();

A recursive function, must have some sort of an end condition. Why would we get a “out of stack space” error message if a recursive function does not have an end condition?

A running computer program has memory space allocated both for a stack and for a heap. The heap consists of global or static variables and those things that must remain throughout the life of the program. The stack is where local variables reside. When we call a function, all of the local variables necessary for that function reside are pushed to the top of the stack. When a function exits, the space on the stack is able to be reclaimed.

When we have unbounded recursion, more and more variables are added to the stack. Since none of the functions are exiting, nothing is freed from the stack, and we soon run out of stack space. See the following screenshot and the accompanying article on Stack vs Heap allocation for more information

Organization of the stack

By the way, the fact that function calls are added to this stack is the reason that you can always replace a recursive function with an iterative solution using an explicit stack data structure. This is often not worth the added complexity, but it’s something to be aware of. (You can often replace a recursive algorithm with an iterative solution without using a stack, but the code can be much more complicated. Iterative solutions are often more efficient than recursive solutions, so as always it’s a tradeoff. See the recursion article on Wikipedia for more information.)

Reflect about the difference between object inheritance and class inheritance

Class inheritance, in a language like Java, means that an instance of a subclass inherits all of the members and methods of its superclass; in a sense it has its own ‘copies’. In object inheritance, we get the members and methods via a hidden pointer to another object. Thus changes in that linked object implicitly change the values in the other object.

What is object augmentation, and how do we do it?

Object augmentation is the act of adding fields or functions to an existing object. For instance,

var nick = {name:"Nick",age:23};
// Augment the nick object with a new field
nick.profession = "programmer";
// Remove that field
nick.profession = undefined

Why is this useful? Well it’s useful in the case I just presented, since you do not need to know the full set of fields in your object at initialization, but instead can add and remove them after the fact.

It’s a more powerful concept when combined with the idea of object prototypes. Inheritance is achieved in Javascript through a hidden pointer to another object. All objects have their pointer set to the Object.prototype, so we can augment all objects with a new method, simply by augmenting the Object.prototype object:

nick = {name:"Nick",age:23};

// This function returns a string representation of whatever
// is invoking it
function selfStringRep() {
  var str = "";
  for (var v in this) {
    // This is not an inherited property
    if (this.hasOwnProperty(v)) {
      str += "'" + v + "': " + this[v] +"\n";
    }
  }
  return str;
}
Object.prototype.toString = selfStringRep;
print(nick.toString());
// prints
‘name': Nick
‘age': 23

This is an incredibly powerful feature, and one that I could see being extremely useful.

There is a way to add a method to String, such as any new String we create will have that augmented method (this is a bit different from object augmentation). How would you do this?

To augment all strings, we need to augment String.prototype, since all string instances inherit from this object. For instance, let’s add the ability to reverse any string.

function reverseString(str) {
  var result = "";
  for (var i = str.length - 1; i >= 0; i—) {
    result +=  str.charAt(i);
  }
  return result;
}

String.prototype.reversed = function() { return reverseString(this); }
print("Nick".reversed());
// prints "kciN"

What is garbage collection?

Garbage collection refers to the fact that objects that go out of scope and no longer are referenced are automatically found and their memory is restored. Garbage collection is a feature that makes programmers’ lives easier, as we do not have to manually keep track of freeing the memory of each and every object as it goes in and out of scope. Languages like C do not have garbage collection, introducing a whole potential of errors for programmers. The most common error is a memory leak which might not immediately crash a program, but leads to an increasing amount of memory usage over time.

What is the difference between an array and an object?

While arrays are very similar to objects, there are at least 3 differences. (See jsfiddle for an illustrative example)

  1. Literal construction syntax
    Arrays are formed by square brackets ([]), whereas general objects are formed with curly braces ({})
  2. “Secret link”
    Arrays are automatically linked with Array.prototype; Objects are automatically linked with Object.prototype. This affects the fields available to an array vs an object; for instance, an array will have a length field that automatically reflects the size of the array, whereas a general object will not. Arrays also have methods defined on them, such as concat(), join(), pop(), etc. (see w3schools for more)
  3. Objects’ entries can be accessed with dot syntax or using the bracket notation (e.g. o.name or o[“name”]); arrays can only be accessed with bracket notation (e.g. array[1])

Homework

3.1

Write a function called absolute, which returns the absolute value of the number it is given as its argument. The absolute value of a negative number is the positive version of that same number, and the absolute value of a positive number (or zero) is that number itself.

function abs(x) {
    if (x < 0) {
        return -x;
    }
    else {
        return x;
    }
}

3.2

Write a function greaterThan, which takes one argument, a number, and returns a function that represents a test. When this returned function is called with a single number as argument, it returns a boolean: true if the given number is greater than the number that was used to create the test function, and false otherwise.

function greaterThan(x) {
    return function(y) {
        return y>x;
    }
}

3

Shown below is some code which does something useful. The function ‘iterateAndOperate’ is the one which accomplishes something useful. The remaining code helps this function. Try to understand what the function accomplishes and solve the problems in part a, b, and c. The code can be done inside the console in Javascript, or in the web browser. Please see this comment, for hints on how you may do it inside a web page(remember, HTML has special codes for spaces and newlines).

var pictureArray = ["++++@++++", "+++@@@+++", "++@@@@@++", "+++@@@+++", "++++@++++"];
iterateAndOperate(pictureArray, print)
++++@++++
+++@@@+++
++@@@@@++
+++@@@+++
++++@++++

var triangleArray = ["*", "***", "*****", "***", "*"];
iterateAndOperate(triangleArray, print);
*
***
*****
***
*

try {
    iterateAndOperate();
}
catch (err) {
    alert("Error, you must provide an array and function argument to iterateAndOperate");
}

/*#p2pu-Jan2011-javascript101*/