Archive
Go gotcha #1: variable shadowing within inner scope due to use of := operator
Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.
Last week I described how the range
keyword in conjunction with taking the address of the iterator variable will lead to the wrong result. This week I’ll discuss how it’s possible to accidentally shadow one variable with another, leading to hard to find bugs.
Let’s take the same basic setup as last week; we have a Solution
struct, and we’re searching for the best (lowest cost) one in a slice of potential candidates.
package main
import "fmt"
type Solution struct {
Name string
Cost int
Complete bool
}
func FindBestSolution(solutions []*Solution) *Solution {
var best *Solution
for _, solution := range solutions {
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
best := solution
fmt.Printf("new best: %v\n", *best)
}
}
}
return best
}
func main() {
solutions := []*Solution{
&Solution{
Name: "foo",
Cost: 10,
Complete: true,
},
}
fmt.Printf("Best solution is: %v", FindBestSolution(solutions))
}
Output:
new best: {foo 10 true}
Best solution is: <nil>
Program exited.
What’s going on? We see that we have a good candidate solution from the debugging information. Why does the function return the wrong value?
The bug is in this line:
best := solution
The problem is that we’re declaring and initializing a new variable (with the :=
operator) rather than assigning to the existing best
variable in the outer scope. The corrected line is
best = solution
Use =
to change the value of an existing variable, use :=
to create a new variable.
If I had not referenced the new variable with the debug print statement, this code would not have compiled:
if best == nil || solution.Cost < best.Cost {
best := solution
}
prog.go:16: best declared and not used
[process exited with non-zero status]
Why is this shadowing of variables in other scopes allowed at all?
There is a long thread on the subject on Go-nuts, debating this subject.
Arguments For
type M struct{}
func (m M) Max() int {
return 5
}
func foo() {
math := M{}
fmt.Println(math.Max())
}
If shadowing didn’t work, importing math would suddenly break this program.
…
My point was about adding an import after writing a lot of code (when
adding features or whatever), and that without shadowing, merely importing
a package now has the potential to break existing code….The current shadowing rules insulate code inside functions from what
happens at the top level of the file, so that adding imports to the file
will never break existing code (now waiting for someone to prove me wrong
on this 😉
There is a simpler and better solution: use a short variable declaration
when you actually want to declare a variable, and use an assignment
operator when all you want to do is assign a value to a variable which
you’ve previously declared. This doesn’t require any change to either
the language or the compiler, particularly one which is that cryptic.
Arguments Against
See it this way. I can carry a gun in my hand aiming towards a target. I
pull the trigger and hit the target. Everything happens exactly the whay
it is expected to happen.Suddenly an inner block jumps in … the instructor. Me, a gun in my
hand, the instructor in between and on the other side the target. I pull
the trigger.Still … everything happens exactly the way it is told to behave. Which
still makes the end results not a desirable result. Adding an “inner
block”, which by itself is behaving in a fully specified way,
influences the whole.Somewhat odd I admit, but you may get what I mean?
Conclusion
I don’t think that the shadowing should be an error but I do think there should be a warning. The go vet tool already helps find common mistakes, such as forgetting to include arguments to printf. For instance:
example.go:
package main
import "fmt"
func main() {
fmt.Printf("%v %v", 5)
}
Run:
go vet example.go
example.go:6: missing argument for Printf verb %v: need 2, have 1
If the vet tool were modified to add this warning, it would occasionally yield false positives for those cases where the shadowing is done intentionally. I think this is a worthwhile tradeoff. Virtually all Go programmers I’ve talked with have made this mistake, and I would be willing to bet that these cases are far more frequent than intentional shadowing.
Go gotcha #0: Why taking the address of an iterated variable is wrong
Go mascot – by Renée French under Creative Commons Attribution-Share Alike 3.0 Unported from http://en.wikipedia.org/wiki/File:Golang.png
Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.
Go is my new favorite programming language. It’s compact, garbage collected, terse, and very easy to read. There are some things that trip me up even now after I’ve been using it for awhile. Today I’m going to discuss the range
construct and how it has a surprising feature that might violate your assumptions.
Range
First, the range
keyword is a way to iterate through the various builtin data structures in Go. For instance,
a := map[string]int {
"hello": 1,
"world": 2,
}
// 2 element range gets key and value
for key, value := range a {
fmt.Printf("key %s value %d\n", key, value)
}
// 1 element is just the key
for key := range a {
fmt.Printf("key %s\n", key)
}
// Works for slices (think of them as vectors/lists) too
b := []string {"hello", "world"}
// 2 element range gets the index as well as the entry
for i, s := range b {
fmt.Printf("entry %d: %s\n", i, s)
}
// 1 element gets just the index (notice the pattern?)
for i := range b {
fmt.Printf("entry %d\n", i)
}
This outputs
key hello value 1
key world value 2
key hello
key world
entry 0: hello
entry 1: world
entry 0
entry 1
Try this code in the Go Playground
Solution search – pointers
Imagine the case where we have a struct as follows
type Solution struct {
Name string
Cost int
Complete bool
}
Say that we’re doing some sort of optimization where we’re looking for the minimum cost solution that meets some criteria; for simplicity’s sake, I’ve put that as the ‘complete’ bool. It’s possible that no such Solution matches, in which case we return a nil solution.
A reasonable implementation would be as follows
func FindBestSolution(solutions []Solution) *Solution {
var best *Solution
for _, solution := range solutions {
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
best = &solution
}
}
}
return best
}
Do you see the bug? Don’t worry if you don’t – I’ve made this mistake a few times now.
Let’s add some tests to find the problem. This is an example of a table driven test, where the test cases are given as a slice of struct literals. This makes it very easy to add new test cases.
func TestFindBestSolution(t *testing.T) {
tests := []struct {
name string
solutions []Solution
want *Solution
}{
{
name: "Nil list",
solutions: nil,
want: nil,
},
{
name: "No complete solution",
solutions: []Solution{
{
Name: "Foo",
Cost: 25,
Complete: false,
},
},
want: nil,
},
{
name: "Sole solution",
solutions: []Solution{
{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
want: &Solution{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
{
name: "Multiple complete solution",
solutions: []Solution{
{
Name: "Foo",
Cost: 25,
Complete: false,
},
{
Name: "Bar",
Cost: 12,
Complete: true,
},
{
Name: "Baz",
Cost: 25,
Complete: true,
},
},
want: &Solution{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
}
for _, test := range tests {
got := FindBestSolution(test.solutions)
if got == nil && test.want != nil {
t.Errorf("FindBestSolution(%q): got nil wanted %v", test.name, *test.want)
} else if got != nil && test.want == nil {
t.Errorf("FindBestSolution(%q): got %v wanted nil", test.name, *got)
} else if got == nil && test.want == nil {
// This is OK
} else if *got != *test.want {
t.Errorf("FindBestSolution(%q): got %v wanted %v", test.name, *got, *test.want)
}
}
}
If you run the tests you’ll find that the last test fails:
--- FAIL: TestFindBestSolution (0.00 seconds)
prog.go:82: FindBestSolution("One complete solution"): got {Baz 25 true} wanted {Bar 12 true}
FAIL
[process exited with non-zero status]
This is strange – it works fine in the single element case, but not with multiple values. Let’s try adding a case where the correct value is last in the list.
{
name: "Multiple - correct solution is last",
solutions: []Solution{
{
Name: "Baz",
Cost: 25,
Complete: true,
},
{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
want: &Solution{
Name: "Bar",
Cost: 12,
Complete: true,
},
},
Sure enough, this test passes. So somehow if the element is last the algorithm works. What’s going on?
From the go-wiki entry on Range:
When iterating over a slice or map of values, one might try this:
items := make([]map[int]int, 10)
for _, item := range items {
item = make(map[int]int, 1) // Oops! item is only a copy of the slice element.
item[1] = 2 // This 'item' will be lost on the next iteration.
}
The make and assignment look like they might work, but the value property of range (stored here as item) is a copy of the value from items, not a pointer to the value in items.
This is exactly what’s happening in this case. The solution
variable is getting a copy of each entry, not the entry itself. Thus when you take the address of the entry, you end up with a pointer pointing at the LAST element in the slice (since the iteration stops at that point). To illustrate:
package main
import "fmt"
func main() {
strings := []string{"some","value"}
for i, s := range strings {
fmt.Printf("Element %d: %s Pointer %v\n", i, s, &s)
}
}
Element 0: some Pointer 0x10500168
Element 1: value Pointer 0x10500168
Note that the same pointer is used in both cases. This explains why the Solution pointer ended up pointing at the last element of the slice.
Playground
So how do we work around this problem? The key is to introduce a new variable whose address it’s safe to take; its contents won’t change out from underneath you.
Broken:
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
best = &solution
}
}
Fixed:
if solution.Complete {
if best == nil || solution.Cost < best.Cost {
tmp := solution
best = &tmp
}
}
With this patch the tests pass:
PASS
Program exited.
Alternative design
A great feature of Go is that you can return multiple values from a single function. Here’s an alternative implementation that doesn’t suffer from the previous problem.
func FindBestSolution(solutions []Solution) (Solution, bool) {
var best Solution
found := false
for _, solution := range solutions {
if solution.Complete {
if !found || solution.Cost < best.Cost {
best = solution
found = true
}
}
}
return best, found
}
Since best
is copying the VALUE of the solution
variable, this works correctly. You can play with this example and see how the tests change in the Playground.
This illustrates one other nice feature of Go – all types have a ‘zero’ value that is legal to use. For strings this is the empty string, for pointers it’s nil, for ints it’s 0, for structs all of types are set to zero values. The line var best Solution
implicitly sets best to be the zero solution. If I wanted to I could get rid of the found
bool altogether and just compare the returned solution with another zero valued Solution
.
Conclusion
I introduced some basic features of Go, including maps, slices, range
, structs, and functions. I provided links to the amazingly useful Go playground which lets you easily test out code, format it, and share it with others.
I showed two implementations of a function that searches through a slice of struct values, searching for a solution that meets some criteria.
The first example using pointers led to a subtle bug that’s hard to find and solve unless you know how range
works. I showed how to write unit tests that exercise the function and helped flush out the bug. I also explained what the bug was and how to work around it.
Finally I showed a version of the same function that uses Go’s multiple return types to return a found boolean rather than using a nil pointer to signify that the value wasn’t found.
Python Gotcha: Word boundaries in regular expressions
TL;DR
Be careful trying to match word boundaries in Python using regular expressions. You have to be sure to either escape the match sequence or use raw strings.
Word boundaries
Word boundaries are a great way of performing regular expression searches for whole words while avoiding partial matches. For instance, a search for the regular expression “the” would match both the word “the” and the start of the word “thesaurus”.
>>> import re >>> re.match("the", "the") # matches >>> re.match("the", "thesaurus") # matches
The way to match a word boundary is with ‘\b’, as described in the Python documentation. I wasted a few minutes wrestling with trying to get this to work.
>>> re.match("\bthe\b", "the") # no match
It turns out that \b is also used as the backspace control sequence. Thus in order for the regular expression engine to interpret the word boundary correctly, you need to escape the sequence:
>>> re.match("\\bthe\\b", "the") # match
You can also use raw string literals and avoid the double backslashes:
>>> re.match(r"\bthe\b", "the") # match
In case you haven’t seen the raw string prefix before, here is the relevant documentation:
String literals may optionally be prefixed with a letter ‘r’ or ‘R’; such strings are called raw strings and use different rules for interpreting backslash escape sequences.
Conclusion
Make sure you are familiar with the escape sequences for strings in Python, especially if you are dealing with regular expressions whose special characters might conflict. The Java documentation for regular expressions makes this warning a bit more explicit than Python’s:
The string literal “\b”, for example, matches a single backspace character when interpreted as a regular expression, while “\\b” matches a word boundary.
Hopefully this blog post will help others running into this issue.
Mule 3 Deployment Gotchas / Workarounds
Mule is an open source enterprise service bus written in Java. I’ve worked with Mule 2.2 quite a bit but only recently have started to work with Mule 3. This post details some of the pains involved with the transition, none of which are well documented or hinted at in the Migration guide.
Gotchas/Workarounds
Mule IDE specific
The Mule IDE is really a misnomer – it’s not a standalone product, but instead an Eclipse plugin. See the installation guide for more information.
XML validation warnings
By default, Eclipse 3.5 will flag all sorts of spurious errors in your XML configuration file. See the blog post for more details, but here’s the short version on how to solve it:
General
These issues exist whether you use the IDE to deploy the app or deploy the app via the command line.
Failure to launch / Timeouts
Mule is configured via XML. You must declare the namespaces and schema locations in order to make use of the built-in Mule constructs. For instance, here’s a snippet of one of my Mule configurations:
<mule xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:spring="http://www.springframework.org/schema/beans" xmlns:vm="http://www.mulesoft.org/schema/mule/vm" xmlns:script="http://www.mulesoft.org/schema/mule/scripting" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns:cxf="http://www.mulesoft.org/schema/mule/cxf" xmlns:xm="http://www.mulesoft.org/schema/mule/xml" xmlns:pattern="http://www.mulesoft.org/schema/mule/pattern" xmlns:servlet="http://www.mulesoft.org/schema/mule/servlet" xmlns:jetty="http://www.mulesoft.org/schema/mule/jetty" xmlns:test="http://www.mulesoft.org/schema/mule/test" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/3.1/mule.xsd http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/3.1/mule-http.xsd http://www.mulesoft.org/schema/mule/cxf http://www.mulesoft.org/schema/mule/cxf/3.1/mule-cxf.xsd http://www.mulesoft.org/schema/mule/scripting http://www.mulesoft.org/schema/mule/scripting/3.1/mule-scripting.xsd http://www.mulesoft.org/schema/mule/pattern http://www.mulesoft.org/schema/mule/pattern/3.1/mule-pattern.xsd http://www.mulesoft.org/schema/mule/xml http://www.mulesoft.org/schema/mule/xml/3.1/mule-xml.xsd http://www.mulesoft.org/schema/mule/vm http://www.mulesoft.org/schema/mule/vm/3.1/mule-vm.xsd http://www.mulesoft.org/schema/mule/servlet http://www.mulesoft.org/schema/mule/servlet/3.1/mule-servlet.xsd http://www.mulesoft.org/schema/mule/test http://www.mulesoft.org/schema/mule/test/3.1/mule-test.xsd http://www.mulesoft.org/schema/mule/jetty http://www.mulesoft.org/schema/mule/jetty/3.1/mule-jetty.xsd" >
Make absolutely sure that the version of the xsd that you include matches the major version of mule that you’re using! If you accidentally place a 3.0 instead of a 3.1 in any of those entries, your app will mysteriously fail to launch and you’ll get a stack trace like the following:
INFO 2011-06-09 17:21:20,015 [main] org.mule.MuleServer: Mule Server initializing...
INFO 2011-06-09 17:21:20,298 [main] org.mule.lifecycle.AbstractLifecycleManager: Initialising RegistryBroker
INFO 2011-06-09 17:21:20,355 [main] org.mule.config.spring.MuleApplicationContext: Refreshing org.mule.config.spring.MuleApplicationContext@19bb5c09: startup date [Thu Jun 09 17:21:20 EDT 2011]; root of context hierarchy
WARN 2011-06-09 17:22:36,265 [main] org.springframework.beans.factory.xml.XmlBeanDefinitionReader: Ignored XML validation warning
java.net.ConnectException: Operation timed out
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.warning(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.xs.traversers.XSDHandler.reportSchemaWarning(Unknown Source)
at org.apache.xerces.impl.xs.traversers.XSDHandler.getSchemaDocument1(Unknown Source)
at org.apache.xerces.impl.xs.traversers.XSDHandler.getSchemaDocument(Unknown Source)
at org.apache.xerces.impl.xs.traversers.XSDHandler.parseSchema(Unknown Source)
at org.apache.xerces.impl.xs.XMLSchemaLoader.loadSchema(Unknown Source)
Deploying via command line
While it’s nice to be able to use an IDE to develop Mule applications, I prefer to deploy from the command line. This allows me to script the launch of the applications. Furthermore, this approach works in a headless (screenless) remote server, whereas the IDE approach will not. The basic way to deploy an app has changed from Mule 2.2 to Mule 3. It used to be that you would call mule -config /path/to/your/config.xml
. Now you move your application to the $MULE_HOME/apps
folder and run mule
, which in turn will deploy all the apps in the apps
folder. This can be very handy, especially when coupled with the Hot Deployment features of Mule; you no longer need to have one terminal per mule app you’re running. From the article, “Mule 3: A New Deployment Model”, here are the ostensible steps you must take to deploy your application in this manner:
- Create a directory under: $MULE_HOME/apps/foo
- Jar custom classes (if any), and put them under: $MULE_HOME/apps/foo/lib
- Put the master Mule config file at: $MULE_HOME/apps/foo/mule-config.xml (note that it has to be named: mule-config.xml
- Start your app with: mule -app foo
While these instructions are correct, there are a lot of gotchas involved. Let me detail them below.
Relative paths
There is often a need to make reference to resources within your configuration file. For instance, you might need to configure an embedded Jetty webserver and tell Jetty where its configuration file is located. When you do this, it’s crucial that you prepend relative paths in the XML configuration file with ${app.home}
.
The reason for this is that the current working directory in which you launch the mule
process becomes the current working directory for all of your application configuration files. So if you have mule-config.xml
in the root of your folder, and conf/jetty.xml
in that same folder, then your reference to the jetty.xml
should be ${app.home}/conf/jetty.xml
. Otherwise, if you just use conf/jetty.xml
and launch mule from a folder that’s not the same as the root folder of your application, all of your paths will break.
Property files / Resources
As the step #2 above says, you must jar up all of your compiled classes and include them in the lib folder of your project. If you don’t do this, you’ll get an exception when your component / custom classes are attempted to be instantiated.
What should be emphasized is that all resources that you reference from within your code must end up in the jar as well. By default, that won’t happen. You can use something like the solution presented in Ant Build: copy properties file to jar file to get this to happen.
Unintentional Application Deletion
When you deploy an app by copying a zip or folder into the apps directory and then running mule, Mule will launch it and then create a text file called ‘$APP_NAME-anchor.text’. If you delete this file, Mule will “undeploy this app in a clean way”. What isn’t noted by this is that it will delete the corresponding zip/folder. So be careful not to accidentally delete your whole project. (Not that I did that or anything).
JDBC drivers problems
One nice feature of the hot deploy process is that Mule will automatically load all of the jars in the lib
folder and ensure that they’re on the classpath. Unfortunately there is an extremely annoying problem with JDBC drivers, in which they corresponding jar will be loaded correctly, but then will fail to be found at runtime.
At startup:
Loading the following jars:
=============================
file:/opt/local/Mule/mule-standalone-3.1.1/apps/XMLPlayer/lib/mysql-connector-java-5.1.13-bin.jar
=============================
<!-- snip -->
WARN 2011-06-09 15:56:12,130 [http://XMLPlayer].connector.http.mule.default.receiver.2 org.hibernate.cfg.SettingsFactory: Could not obtain connection to query metadata
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/db
The exact same project works perfectly in the Mule IDE. The only solution I’ve found is to copy the mysql-connector-java-5.1.13-bin.jar
into $MULE_HOME/lib/endorsed
. There is a similar bug report but it was closed for some reason. It most certainly does not work the way you would intuitively expect.
Conclusion
Mule 3 has many improvements over Mule 2, particular with the introduction of Flows. Unfortunately, deployment is a much tricker problem than it was in Mule 2, and the resources online are woefully inadequate for the task at hand. I hope this blog post helps some poor soul going through the same frustration I went through to get a Mule 3 application deployed.
Java gotcha: Splitting a string
CSV, or comma separated value, files are the workhorse of flat file processing. They’re human readable, you can edit them in any spreadsheet editor worth its salt, they’re portable, and they’re easy to parse and read into your programs. Unfortunately, text you might want to store in the columns might also have commas in it, which complicates the parsing significantly – instead of splitting on all of the commas, you have to know whether the comma is in a quoted string, in which case it doesn’t signify the end of a field … it gets messy, and you’re probably going to get it wrong if you roll your own. For a quick and dirty script, sometimes it’s better to delimit the columns with a different character, one that comes up much less often in text. One good choice is the pipe character, |.
Foolishly I processed lines of text like the following in Scala:
val FIELD_DELIMITER = "|"; val FILE_NAME_INDEX = 0 val AVG_COLOR_INDEX = 1 val THUMBNAIL_IMAGE_INDEX = 2 def parseLine(indexRow:String) = { Console.println("Row: " + indexRow) val entries = indexRow.split(FIELD_DELIMITER) // extract the file name, average color, thumbnail }
Unfortunately, this code is subtly broken. The problem is that the String split method does not accept a plain string on which to split – rather, it treats the string as a regular expression. Since the pipe character has special meaning in regular expressions (namely “OR”), this has the effect of splitting the string on every character, rather than just at the pipe delimiters.
To get around this, you need to tell the regular expression engine that you want to treat the pipe as a literal, which you do by backslash escaping the pipe. Unfortunately “\|” is not a valid string in Java, because it will attempt to interpret it as a control character (like the newline \n). Instead, you need to backslash escape the backslash, leaving “\\|”
Conclusion:
Be careful when you’re using String.split in Java/Scala. You have to be aware that it’s treating your string as a regular expression. Read this StackOverflow discussion for a better understanding of the pros and cons of Java using regular expressions in many of its core String libraries.
Python Gotcha #1: Default arguments and mutable data structures
I ran into an issue in Python the other day that I thought others might find instructive. The problem involves default arguments and mutable data structures.
Named arguments
In Python, you can enumerate the arguments to a method explicitly, rather than simply by putting the arguments in the order expected by the method. For instance
>>> def printit(a,b,c): ... print a,b,c ... # Calling the method with the arguments in the expected order; 1 is assigned to a, 2 to b, 3 to c >>> printit(1,2,3) 1 2 3 # Explicitly assigning the values to the different argument variables. Now you can call the method however you want. >>> printint(c='c',b='b',a='a') a b c
Default Arguments
Another feature of Python not present in Java is that of default arguments
>>> def defaulted(a,b='b',c='c'): ... print a,b,c ... >>> defaulted(1,2,3) 1 2 3 >>> defaulted(1,2) 1 2 c >>> defaulted(1) 1 b c # This will fail because I do not explicitly define what a is >>> defaulted(b=6) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: defaulted() takes at least 1 non-keyword argument (0 given)
These default arguments are very succinct and a nice feature. In Java, the best way to emulate this feature is through method overloading. From the linked article:
public void printFavorites(Color favoriteColor, String favoritePhrase, int favoriteNumber) { System.out.println("Favorite color: " + favoriteColor); System.out.println("Favorite phrase: " + favoritePhrase); System.out.println("Favorite number: " + favoriteNumber); } public void printFavorites(Color favoriteColor, String favoritePhrase) { printFavorites(favoriteColor, favoritePhrase, 0); } public void printFavorites(Color favoriteColor) { printFavorites(favoriteColor, "The best phrase evar", 0); }
This increases the amount of boilerplate code, and is not quite as flexible as the python version – in Python you could define a default argument for each variable and then the user could choose which if any to override.
>>> def printFavorites(faveColor="Blue", favePhrase="This is awesome",faveNumber=5): ... print "Favorite color: ", faveColor ... print "Favorite phrase: ", favePhrase ... print "Favorite number: ", faveNumber >>> printFavorites() Favorite color: Blue Favorite phrase: This is awesome Favorite number: 5 >>> printFavorites(favePhrase="Just the phrase") Favorite color: Blue Favorite phrase: Just the phrase Favorite number: 5 >>> printFavorites(faveColor="orange") Favorite color: orange Favorite phrase: This is awesome Favorite number: 5
Regardless, it’s usually not too much of a hurdle in Java. But given that it’s such a nice feature in Python, I wanted to make use of it.
Real world example: Graph traversal
Let’s examine how to represent directed (potentially cyclic) graphs in Python, and how to do a depth first search of all paths between two
First let’s create a graph, and then show the code to find all the paths between two nodes.
Graph representation
In Python we can represent a graph as a nested dictionary:
g = {'A': ['B', 'C'], 'B': ['C', 'D'], 'C': ['A']}
Better visualized as
(Side note: this graph was created using the DOT language, something I plan to post more about in the future.)
The nested dictionary structure defined above maps nodes to their children (nodes in this case are simply represented as single letters, but they could be arbitrarily complex).
Search
I mentioned that the graphs might be cyclic; if we do not detect a cycle, the code would similarly loop until running out of memory. Thus we need to keep track of the path we’ve already explored to avoid cycling.
Here is code to find all the paths between two nodes (slightly modified from http://www.python.org/doc/essays/graphs.html; I will highlight exactly what I changed later in the article).
def find_all_paths(graph, start, end, path=[]): path.append([start]) if start == end: return [path] if not graph.has_key(start): return [] paths = [] for node in graph[start]: if node not in path: # recursive call newpaths = find_all_paths(graph, node, end, path) for newpath in newpaths: paths.append(newpath) return paths
Let’s test:
>>> graph.find_all_paths(g, 'A','B') [['A', 'B']] >>> graph.find_all_paths(g, 'A','C') [['A', 'B', 'C'], ['A', 'C']]
Both of these are correct; the path between A and B is just A and B but between A and C we have two potential paths – either the two hops from A to B to C, or directly from A to C. The code I presented contains a bug. What happens when I run the queries again?
>>> graph.find_all_paths(g, 'A','B') [] >>> graph.find_all_paths(g, 'A','C') []
Huh? What’s going on. It worked fine the first time, but subsequent method calls return the wrong result!
Solution
The problem, I discovered with a little bit of searching, is spelled out in the Python documents.
Important warning: The default value is evaluated only once. This makes a difference when the default is a mutable object such as a list, dictionary, or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent calls:
def f(a, L=[]): L.append(a) return L print f(1) print f(2) print f(3)
This will print
[1] [1, 2] [1, 2, 3]
If you don’t want the default to be shared between subsequent calls, you can write the function like this instead:
def f(a, L=None): if L is None: L = [] L.append(a) return L
I do not like the workaround presented in the docs, since it obfuscates what the argument is supposed to be. (compare visitedNodes=[] vs visitedNodes=None). There is another workaround for this, and it’s the approach taken originally on the site where the graph traversal code came from:
def find_all_paths(graph, start, end, path=[]): path = path + [start] # my buggy version had path.append(start) ...
Fortunately the Python docs are excellent and I was able to find the solution without too much trouble. Python programmers need to be aware that their default arguments are only evaluated once, especially when working with potentially mutable data structures like dictionaries and lists. Programmers must take care to realize that the following two calls are NOT equivalent, especially with respect to objects contained in default arguments.
exploredNodes.append("x") exploredNodes = exploredNodes + ["x"]