Archive

Posts Tagged ‘python’

Datetimes in Python – gotchas and workarounds

April 9, 2011 Leave a comment

I’ve written previously about working with dates in Java; as I mentioned there it’s very easy to get dates/times incorrect. I feel like I have a fairly good handle on how things work in Java, but today I was faced with learning how to deal with dates/times in Python. It wasn’t an altogether pleasant experience, but I’m going to show what I learned, so hopefully this is of use to you.

The task I was trying to accomplish was to convert between Unix timestamps (seconds/milliseconds since the Epoch) and more user friendly data objects. Creating a datetime is easy:

>>> from datetime import *
>>> datetime.now()
datetime.datetime(2011, 4, 5, 19, 36, 18, 894325)
>>> datetime(2004, 1, 24)
datetime.datetime(2004, 1, 24, 0, 0)

One gotcha to note is that both the month and day fields are 1 based (1 <= month <= 12, 1 <= day <= number of days in the given month and year), whereas in Java, the month field is 0 indexed.

By default, these datetime objects will use a naïve time zone understanding that ignores offsets from UTC/day light savings time. To fix this, you need to implement your own subclass of tzinfo. It’s kind of unfortunate that they don’t make this easier. Here is an example implementation representing the UTC timezone from the previously linked page:

from datetime import tzinfo, timedelta, datetime

ZERO = timedelta(0)
HOUR = timedelta(hours=1)

class UTC(tzinfo):

    def utcoffset(self, dt):
        return ZERO

    def tzname(self, dt):
        return "UTC"

    def dst(self, dt):
        return ZERO

OK that’s not the end of the world. Now assuming we’ve created out datetime object correctly, how do we retrieve its corresponding Unix style timestamp? Let’s look at the available methods.

>>> [x for x in dir(datetime) if not x.startswith("__")]
["astimezone", "combine", "ctime", "date", "day", "dst", "fromordinal", "fromtimestamp", "hour", "isocalendar", "isoformat", "isoweekday", "max", "microsecond", 
"min", "minute", "month", "now", "replace", "resolution", "second", "strftime", "strptime", "time", "timetuple", "timetz", "today", "toordinal", "tzinfo", "tznam
e", "utcfromtimestamp", "utcnow", "utcoffset", "utctimetuple", "weekday", "year"]

Well, there’s a bunch of methods that convert from a timestamp to a datetime object. But going back the other direction is a little harder. After digging, I found a way to do so:

>>> from datetime import datetime
>>> from time import mktime
>>> dt = datetime(2008, 5, 1, 13, 35, 41, 567777)
>>> seconds = mktime(dt.timetuple())
>>> seconds += (dt.microsecond / 1000000.0)
>>> seconds
1209663341.5677769
>>> 
>>> dt2 = datetime.fromtimestamp(seconds)
>>> dt == dt2
True

Well, there you have it. To convert from a unix timestamp to a datetime object, use datetime.fromtimestamp. To convert the other direction, use time.mktime(datetime_instance.timetuple()). I wish that the library authors had seen fit to maintain symmetry (i.e. datetime should implement a totimestamp method), but fortunately there is an easy workaround. The last thing to note if you’re used to Java is that the timestamps in Python measure seconds from the epoch, as opposed to Java which deals in milliseconds from the epoch.

Java annoyances

January 28, 2011 Leave a comment
Having had Java as the programming language of the vast majority of my undergraduate courses, as well as the language I program in every day, I am most comfortable and fluent in it.  When I return to Java after using different languages such as AWKPython, or Ruby, I’m always left with a bitter taste in my mouth.  There are some things Java just makes way too hard, verbose, and painful to accomplish.  It’s for that reason that I’m learning Scala, what could be (simplistically) described as a cleaned up, more succint version of Java.

Asymmetry in standard libraries

Symmetry is an important feature of a library; it basically means that methods come in pairs.  For instance, you’d expect that a class with a read method has a write method, or one with a set method has a get method.  (That’s not always the case, certainly, but API writers often strive for symmetry.  See Practical API Design: Confessions of a Java Framework Architect) As another example, there’s both a method to convert from an array to a list (Arrays.asList) and there is a method to go the other direction (List.toArray()).  Unfortunately, not all of the Java library APIs adhere to this convention.  The one that bothers me the most is in the String library.  There is a String split method that breaks a String up around a given regular expression, but no corresponding method to reconstitute a String from a collection of other String objects, with a specified separator between them.  This leads to code like the following to comma separate a collection of strings:
String[] strings = ...;
StringBuilder b = new StringBuilder();
for (int i = 0; i < strings.length; i++) {
 b.append(strings[i]);
 if (i != strings.length -1) {
 b.append(",");
 }
}
System.out.println(b.toString());
This whole mess could be replaced with one line of Python code
print(",".join(strings))
or in Scala:
println(strings.mkString(","))
It’s pretty sad that you have to either write that ugly mess, or turn to something like Apache StringUtils.

Different treatment of primitives and objects

It is a lot harder to deal with variable length collections of primitive types than it should be.  This is because you cannot create collections out of things that are not objects.  You can create them out of the boxed primitive type wrappers, but then you have to iterate through and convert back into the primitive types.

In other words, you can create a List<Double> but you cannot create a List<double>.  This leads to code like the following:
// Need a double[] but don't know how long it's going to be
List<Double> doubles = new LinkedList<Double>();
for (...) {
 doubles.append(theComputedValue);
}
// Option 1: Use for loop and iterate over Double list, converting to primitive values through
// auto unboxing
// Bad: leads to O(n^2) running time with LinkedList
double[] doubleArray = new double[doubles.size()];
for (int i = 0; i < doubleArray.length; i++) {
 doubleArray[i] = doubles.get(i);
}
// Option 2: Use enhanced for loop syntax (described below), along with
// additional index variable.
// Better performance but extraneous index value hanging around
int index = 0;
// Automatic unboxing
for (double d : doubles) {
 doubleArray[index++] = d;
}
...
b[index] // Oops
As I blogged about previously, there is a library called Apache Commons Primitives that can be used for variable sized lists for primitive types, but it is a shame one has to turn to third party libraries for such a common task.

Patchwork iteration support

Java 5 introduced the “Enhanced for loop” syntax which allows you to replace
Collection<String> strings = new ArrayList<String>();
Iterator<String> it = strings.iterator();
while (it.hasNext()) {
 String theString = it.next();
}
with the much simpler
for (String s : strings) {
 // deal with the String
}
Here’s the rub: this syntax is supported for arrays and Iterable objects.  But guess what?  Iterators are not Iterable.  Why is this a problem?  Well, you might want to return read-only iterators to your data.  If you do this, the client code cannot use the enhanced for loop syntax, and is stuck with the earlier hasNext() code.  If you want to use the enhanced for loop syntax to work for Iterators, you need to introduce a wrapper around the Iterator which implements the Iterable interface.  From the previously linked blog post:
class IterableIterator<T> implements  Iterable<T> {
    private Iterator<T> iter;
    public IterableIterator(Iterator<T> iter) {
        this.iter = iter;
    }
    // Fulfill the Iterable interface
    public Iterator<T> iterator() {
        return iter;
    }
}
I hope this strikes you as inelegant as well.
Furthermore, arrays are not iterable either, despite the fact that you can use the enhanced for loop syntax with them.
What this all boils down to is that there’s no great way to accept an Iterable collection of objects.  If you accept an Iterable<E>, you close yourself off to arrays and iterators.  You’d have to convert the arrays to a suitable collection type by using the Arrays.asList method.  It would be great if we could treat arrays, collections, etc., agnostically when all we want to do is iterate over their elements.

Lack of type inference for constructors with generics

Yes, we all know we should program to an interface rather than to a specific implementation; doing so will allow our code to be much more flexible and easily changed later.  Furthermore, we also know we should use generics for our collections rather than raw collections of objects; this allows us to catch typing errors before they occur.  So in other words

// BAD: Raw hashmap and programming to the implementation!
HashMap b = new HashMap();
// Good
Map<String, Integer> wordCounts = new HashMap<String, Integer>();
In fact, this lack of type inference is one reason why Joshua Bloch suggests that static factory methods can be better that constructors – it is possible to have a static factory method that can infer the correct types and instantiate the object, without making you explicitly repeat the type parameters.  For instance, Google Guava provides many static methods to instantiate maps:
Map<String, Integer> wordCounts = Maps.newHashMap();
Fortunately, the problem of having to repeat type parameters twice for constructors is being fixed in JDK 7 with something called the Diamond Operator.  It will allow you to replace
Map<String, Integer> wordCounts = new HashMap<String, Integer>();
with
Map<String, Integer> wordCounts = new HashMap<>();
This improvement to the language can’t come fast enough.

Conclusion

I use Java on a daily basis for about 90% of the work I need to do.  I’m comfortable with it, I understand its syntax, it’s fast, it’s powerful.  After being exposed to languages like python and scala, certain issues in Java stand out in stark contrast, and thus I’ve enumerated a few of the reasons that Java annoys me on a daily basis.  Fortunately excellent libraries exist to correct many of the annoyances, but it’s painful to have to use them to do such basic things as joining a list of Strings with a given separator character, or creating a variable sized list of primitive types. Fortunately Java continues to evolve, and at least some of my irritations will be fixed in JDK 7.
Post in the comments if you have better workarounds than those that I’ve suggested, you have other languages that make these tasks easy and would like to highlight them, or any other reason you can think of.

Car Talk Puzzler #4: Flipping Ages

October 26, 2010 2 comments

RAY: This was sent in many weeks ago by Wendy Gladstone, and as usual I tweaked it a little bit.

She writes: “Recently I had a visit with my mom and we realized that the two digits that make up my age when reversed resulted in her age. For example, if she’s 73, I’m 37. We wondered how often this has happened over the years but we got sidetracked with other topics and we never came up with an answer.

“When I got home I figured out that the digits of our ages have been reversible six times so far. I also figured out that if we’re lucky it would happen again in a few years, and if we’re really lucky it would happen one more time after that. In other words, it would have happened 8 times over all. So the question is, how old am I now?”

Question
Answer

Here’s the fourth in my Car Talk Puzzler series; today I’m going to be using Python because it’s my current favorite language, and because it’s well suited to filtering, mapping, etc.  I won’t put too much commentary here.

#  Find all the ages such that the second age is the reverse of the first  age.  Don't worry that there are a lot of impossibilities; we'll fix it  through filtering
# Note that [::-1] is the slice operator that says iterate backwards through the string; this effectively reverses the list.
 matching_ages = map(lambda x:(x, int(str(x)[::-1])), range(0,100))
 matching_ages
 # OUT: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7),  (8, 8), (9, 9), (10, 1), (11, 11), (12, 21), (13, 31), (14, 41), (15,  51), (16, 61), (17, 71), (18, 81), (19, 91), (20, 2), (21, 12), (22,  22), (23, 32), (24, 42), (25, 52), (26, 62), (27, 72), (28, 82), (29,  92), (30, 3), (31, 13), (32, 23), (33, 33), (34, 43), (35, 53), (36,  63), (37, 73), (38, 83), (39, 93), (40, 4), (41, 14), (42, 24), (43,  34), (44, 44), (45, 54), (46, 64), (47, 74), (48, 84), (49, 94), (50,  5), (51, 15), (52, 25), (53, 35), (54, 45), (55, 55), (56, 65), (57,  75), (58, 85), (59, 95), (60, 6), (61, 16), (62, 26), (63, 36), (64,  46), (65, 56), (66, 66), (67, 76), (68, 86), (69, 96), (70, 7), (71,  17), (72, 27), (73, 37), (74, 47), (75, 57), (76, 67), (77, 77), (78,  87), (79, 97), (80, 8), (81, 18), (82, 28), (83, 38), (84, 48), (85,  58), (86, 68), (87, 78), (88, 88), (89, 98), (90, 9), (91, 19), (92,  29), (93, 39), (94, 49), (95, 59), (96, 69), (97, 79), (98, 89), (99,  99)]

# Here we filter by only allowing matches in which the  mother's age is greater than that of the child.  Note the use of a  lambda expression, basically an anonymous function.
filtered1 = filter(lambda (mother,child):mother > child, matching_ages)
 filtered1
 # OUT: [(10, 1), (20, 2), (21, 12), (30, 3), (31, 13), (32, 23), (40,  4), (41, 14), (42, 24), (43, 34), (50, 5), (51, 15), (52, 25), (53, 35),  (54, 45), (60, 6), (61, 16), (62, 26), (63, 36), (64, 46), (65, 56),  (70, 7), (71, 17), (72, 27), (73, 37), (74, 47), (75, 57), (76, 67),  (80, 8), (81, 18), (82, 28), (83, 38), (84, 48), (85, 58), (86, 68),  (87, 78), (90, 9), (91, 19), (92, 29), (93, 39), (94, 49), (95, 59),  (96, 69), (97, 79), (98, 89)]

# Assume that the mother was at least 15 when she had the kid, and no more than 60
filtered2 = filter(lambda(mother, child):mother-child >= 15 and mother-child < 60, filtered1)
 filtered2
 # OUT: [(20, 2), (30, 3), (31, 13), (40, 4), (41, 14), (42, 24), (50,  5), (51, 15), (52, 25), (53, 35), (60, 6), (61, 16), (62, 26), (63, 36),  (64, 46), (71, 17), (72, 27), (73, 37), (74, 47), (75, 57), (82, 28),  (83, 38), (84, 48), (85, 58), (86, 68), (93, 39), (94, 49), (95, 59),  (96, 69), (97, 79)]
 len(filtered2)
 # OUT: 30

# Create a new list comprised of the differences in age between mother and child
 age_diff = map(lambda(mother,child):mother-child, filtered2)
 age_diff
 # OUT: [18, 27, 18, 36, 27, 18, 45, 36, 27, 18, 54, 45, 36, 27, 18, 54, 45, 36, 27, 18, 54, 45, 36, 27, 18]
 sorted(age_diff)
 # OUT: [18, 18, 18, 18, 18, 18, 18, 27, 27, 27, 27, 27, 27, 36, 36, 36, 36, 36, 45, 45, 45, 45, 54, 54, 54]

# The puzzler states that it's will happen a total of 8 times; that matches the age difference of 18 years

 filter(lambda(mother,child):mother-child == 18, filtered3)
# OUT: [(20, 2), (31, 13), (42, 24), (53, 35), (64, 46), (75, 57), (86, 68), (97, 79)]

Thus the mother is currently 75 years old and the daughter is 57.  Tada

 

Ternary operator in bash

October 20, 2010 9 comments

Here’s a really quick tip for bash programmers.

In languages like C++, Java, Python, and the like, there’s the concept of a ternary operator.  Basically it allows you to assign one value if a condition is true, else another.

In C/Java:

int x = valid ? 0 : 1;

In Python:

x = 0 if valid else 1

In Scala:

val x = if (valid) 0 else 1

Well, there’s no ternary operator in Bash, but there is a way to fake it.

valid=1
[ $valid ] && x=1 || x=0

Where whatever conditional you want is within the brackets.

If it’s valid, then the branch after the AND is followed, otherwise that after the OR is followed.

This is equivalent though perhaps a bit less readable then

if [ $valid ]; then x=1; else x=0; fi

So you should be aware of the construct in case you ever run into it, but it’s arguably less readable than just listing out explicitly what you’re doing.

Thanks to experts-exchange for making me aware of this little tip.

Categories: Uncategorized Tags: , , , , , , ,

bpython – an excellent interpreter for python

October 12, 2010 1 comment

If you use Python, you know that its interactive shell is a great way to test out ideas and iterate quickly.  Unfortunately, the basic interactive shell is very barebones – there is no syntax highlighting, autocompletion, or any of the features we come to expect from working in IDEs.  Fortunately if you’re on a Unix system, there is a great program called bpython which adds all of those missing features.

 

As you are typing, suggestions appear at the bottom. Press tab to take the suggestion

 

If you have easy_install, it’s the simplest thing in the world to install:

sudo easy_install bpython

I can’t recommend this product enough.  It’s free, so what’re you waiting for?

Categories: Python, unix Tags: , , , ,

Python Gotcha #1: Default arguments and mutable data structures

August 23, 2010 2 comments

I ran into an issue in Python the other day that I thought others might find instructive. The problem involves default arguments and mutable data structures.

Named arguments

In Python, you can enumerate the arguments to a method explicitly, rather than simply by putting the arguments in the order expected by the method. For instance

>>> def printit(a,b,c):
... print a,b,c
...
# Calling the method with the arguments in the expected order; 1 is assigned to a, 2 to b, 3 to c
>>> printit(1,2,3)
1 2 3
# Explicitly assigning the values to the different argument variables. Now you can call the method however you want.
>>> printint(c='c',b='b',a='a')
a b c

Default Arguments

Another feature of Python not present in Java is that of default arguments

>>> def defaulted(a,b='b',c='c'):
... print a,b,c
...
>>> defaulted(1,2,3)
1 2 3
>>> defaulted(1,2)
1 2 c
>>> defaulted(1)
1 b c
# This will fail because I do not explicitly define what a is
>>> defaulted(b=6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: defaulted() takes at least 1 non-keyword argument (0 given)

These default arguments are very succinct and a nice feature. In Java, the best way to emulate this feature is through method overloading. From the linked article:

public void printFavorites(Color favoriteColor, String favoritePhrase, int favoriteNumber) {
     System.out.println("Favorite color: " + favoriteColor);
     System.out.println("Favorite phrase: " + favoritePhrase);
     System.out.println("Favorite number: " + favoriteNumber);
}

public void printFavorites(Color favoriteColor, String favoritePhrase) {
    printFavorites(favoriteColor, favoritePhrase, 0);
}

public void printFavorites(Color favoriteColor) {
    printFavorites(favoriteColor, "The best phrase evar", 0);
}

This increases the amount of boilerplate code, and is not quite as flexible as the python version – in Python you could define a default argument for each variable and then the user could choose which if any to override.

>>> def printFavorites(faveColor="Blue", favePhrase="This is awesome",faveNumber=5):
... print "Favorite color: ", faveColor
... print "Favorite phrase: ", favePhrase
... print "Favorite number: ", faveNumber
>>> printFavorites()
Favorite color: Blue
Favorite phrase: This is awesome
Favorite number: 5
>>> printFavorites(favePhrase="Just the phrase")
Favorite color: Blue
Favorite phrase: Just the phrase
Favorite number: 5
>>> printFavorites(faveColor="orange")
Favorite color: orange
Favorite phrase: This is awesome
Favorite number: 5

Regardless, it’s usually not too much of a hurdle in Java. But given that it’s such a nice feature in Python, I wanted to make use of it.

Real world example: Graph traversal

Let’s examine how to represent directed (potentially cyclic) graphs in Python, and how to do a depth first search of all paths between two
First let’s create a graph, and then show the code to find all the paths between two nodes.

Graph representation

In Python we can represent a graph as a nested dictionary:

g = {'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['A']}

Better visualized as

(Side note: this graph was created using the DOT language, something I plan to post more about in the future.)

The nested dictionary structure defined above maps nodes to their children (nodes in this case are simply represented as single letters, but they could be arbitrarily complex).

Search

I mentioned that the graphs might be cyclic; if we do not detect a cycle, the code would similarly loop until running out of memory. Thus we need to keep track of the path we’ve already explored to avoid cycling.

Here is code to find all the paths between two nodes (slightly modified from http://www.python.org/doc/essays/graphs.html; I will highlight exactly what I changed later in the article).

def find_all_paths(graph, start, end, path=[]):
  path.append([start])
  if start == end:
    return [path]
  if not graph.has_key(start):
    return []
  paths = []
  for node in graph[start]:
    if node not in path:
      # recursive call
      newpaths = find_all_paths(graph, node, end, path)
      for newpath in newpaths:
        paths.append(newpath)
  return paths

Let’s test:

>>> graph.find_all_paths(g, 'A','B')
[['A', 'B']]
>>> graph.find_all_paths(g, 'A','C')
[['A', 'B', 'C'], ['A', 'C']]

Both of these are correct; the path between A and B is just A and B but between A and C we have two potential paths – either the two hops from A to B to C, or directly from A to C. The code I presented contains a bug. What happens when I run the queries again?

>>> graph.find_all_paths(g, 'A','B')
[]
>>> graph.find_all_paths(g, 'A','C')
[]

Huh? What’s going on. It worked fine the first time, but subsequent method calls return the wrong result!

Solution

The problem, I discovered with a little bit of searching, is spelled out in the Python documents.

Important warning: The default value is evaluated only once. This makes a difference when the default is a mutable object such as a list, dictionary, or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent calls:

def f(a, L=[]):
  L.append(a)
  return L

print f(1)
print f(2)
print f(3)

This will print

[1]
[1, 2]
[1, 2, 3]

If you don’t want the default to be shared between subsequent calls, you can write the function like this instead:

def f(a, L=None):
  if L is None:
    L = []
  L.append(a)
  return L

I do not like the workaround presented in the docs, since it obfuscates what the argument is supposed to be. (compare visitedNodes=[] vs visitedNodes=None). There is another workaround for this, and it’s the approach taken originally on the site where the graph traversal code came from:

def find_all_paths(graph, start, end, path=[]):
path = path + [start]
# my buggy version had path.append(start)
...

Fortunately the Python docs are excellent and I was able to find the solution without too much trouble. Python programmers need to be aware that their default arguments are only evaluated once, especially when working with potentially mutable data structures like dictionaries and lists. Programmers must take care to realize that the following two calls are NOT equivalent, especially with respect to objects contained in default arguments.

exploredNodes.append("x")
exploredNodes = exploredNodes + ["x"]
Categories: Java, Python Tags: , , , , ,

Car Talk Puzzler #3: The Palindromic Odometer

May 15, 2010 Leave a comment

The Palindromic Odometer

Driving along, Terry notices that the last four digits on the odometer are palindromic. A mile later, the last five digits are palindromic. [A mile later, the middle four digits are palindromic.] [One mile after that], all six are palindromic. What was the odometer reading when Terry first looked at it?

Full question
Answer

This is the third in my series of Car Talk puzzler solutions using whatever programming language strikes my fancy. Just like the last one I wrote, this one uses Scala. This is a much shorter post, as I am assuming some familiarity with Scala; go back and read the last post if you haven’t already.

object PalindromicPuzzlerSolver {
    // Pad out numbers to 6 places
    // Note that I can cal Java library functions seamlessly
    val df = new java.text.DecimalFormat("000000")

    // A string is a palindrome if it reversed equals itself
    def isPalindrome(word: String): Boolean = {
        // Reverse is a method of a RichString, which a String can be implicitly converted
        // into. Necessary to convert the RichString back into String after the reversal
        word == word.reverse.toString()
    }

    def fitsDescription(miles: Int): Boolean = {
        // Last 4 are palindome (dropping first 2)
        isPalindrome(df.format(miles).drop(2)) &&
        // Add one, last 5 are palindrome
        isPalindrome(df.format(miles + 1).drop(1)) &&
        // Add two more, all six are a palindrome
        isPalindrome(df.format(miles + 3))
    }

    def main(args: Array[String]) {
        val max = 1000000
        println( 0 until max filter fitsDescription )
        // Prints out
        // RangeF(198888, 199999)

        // This is the same as the java style:
        // println( (0.until(max)).filter(fitsDescription) )
        // This syntax only works when you are dealing with
        // methods that take one argument.
    }
}

Thus the solution is 198888, just as the answer proclaims. Another win for brute force.

There are three noteworthy things in this example, which didn’t come up in the last post (or weren’t explained fully)

1. Interoperability with Java
2. Use of drop
3. Convenience methods for specifying a range

Interoperability with Java

Any Java class can be imported and used with Scala; that is one of its strongest selling points. Scala runs in the same JVM as Java, so it makes sense that it should have access to Java classes. In fact many Scala classes are thin wrappers around Java objects, with syntactic sugar built in. For instance, Scala has a BigInt class which wraps a Java BigInteger, while providing overloaded methods to allow them to be used more like standard numbers. For instance,

scala> BigInt("103058235287305927350") + BigInt("288888203959230529352")
res16: BigInt = 391946439246536456702

scala> new java.math.BigInteger("103058235287305927350").add(new java.math.BigInteger("288888203959230529352"))
res21: java.math.BigInteger = 391946439246536456702

Note that the + is actually a method call; this is exactly the same thing as calling

scala> BigInt("103058235287305927350").+(BigInt("288888203959230529352"))
res22: BigInt = 391946439246536456702

In Scala, method names don’t have to be alphanumeric, which is how it gets away with what looks like operator overloading. This makes it a lot nicer to work with classes like BigDecimal or BigInteger, as you can use the more commonly used addition and subtraction symbols rather than explicitly calling “add” and “subtract” methods. Just realize that you are really calling methods like any other under the hood. When you understand this, you’ll also understand why Scala uses an underscore instead of * for wildcard – * can be used in method names, as it is used for multiplication!

Drop

Scala has full support for slicing iterable objects; see my previous discussion on slices if this term is unfamiliar.

scala> List(1,2,3,4).slice(1,3)
res27: List[Int] = List(2, 3)

scala> List(1,2,3,4).slice(1)
warning: there were deprecation warnings; re-run with -deprecation for details
res28: Seq[Int] = List(2, 3, 4)

scala> List(1,2,3,4).drop(1)
res29: List[Int] = List(2, 3, 4)

So drop(n) is exactly the same as slice(n) – it provides a new list with the first n elements removed. I use drop instead of slice due to the deprecation of the single argument form of slice.

Specifying a range
If you need to iterate over a range of numbers in Java, you’d do something like
for (int i = 0; i < max; i++) {
}

whereas in Python you can do something like
for x in range(max):

Scala similarly provides methods to create a range of numbers to iterate over.

scala> 0 until 10
res9: Range = Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> 0 to 10
res10: Range.Inclusive = Range(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Again, I want to stress that until is a method like any other; you are just free to leave off the . and the parens when it is unambiguous what you mean. Note that these commands are actually creating a sequence in memory. If you only need to access each element one at a time, and you are dealing with a large range, you might instead create an Iterator rather than a sequence. The Iterator will create each subsequent number as necessary, rather than building up the whole list ahead of time. This is much more space efficient.

Note that Python makes this distinction as well:

>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xrange(10)
xrange(10)

xrange returns a generator object, as opposed to range which returns the actual list.

Returning to Scala, if I wanted an iterator instead I would do

scala> Iterator.range(0,10)
res11: Iterator[Int] = non-empty iterator

scala> Iterator.range(0,10).foreach(println)
0
1
2
3
4
5
6
7
8
9

Conclusion
I have shown another solution using Scala, while illustrating a few more features. In particular one needs to understand that Scala’s syntactic sugar is actually masking standard function calls under the hood; x + y is the same as x.+(y), where “+” is a valid function name.

Car Talk Puzzler #2: What do these words have in common? An imperative and functional approach using Java and Scala

May 15, 2010 1 comment

Here’s the second in my series of Car Talk puzzler solutions. Today I’m going to be illustrating a solution first in Java, and then in Scala in order to show how functional programming features  can drastically reduce the number of lines of code, while still expressing the developer’s intent.

Puzzler:
What do these words have in common: pig, table, cab, real, yet, and ride?

Answer:

RAY: Here’s the answer. If you notice, the first letter in each of the words is from farther down in the alphabet than all the subsequent letters. And if we assigned a number to every letter in the alphabet, so that A is one, B is two, C is three and so on…

TOM: Yeah, yeah, yeah.

RAY: Then the numerical place in the alphabet of the first letter is equal to the sum of the numerical places of all the other letters. For example, Table: T is 20, A is 1, B is 2, L is 12, E is 5, so 1 plus 2 plus 12 plus 5 -

TOM: Is 20.

So, we have the answer.  But as the guys mention on the show, there are probably other words as well.  Let’s do a brute force search through the dictionary to find all such words.  The dictionary I”ll be using is available here, as a new line separated list of words, one per line.  This makes it very easy to deal with.

Here’s the rather straightforward solution in Java:

 import java.io.FileReader;
 import  java.io.BufferedReader;
 import java.io.FileNotFoundException;
 import java.io.IOException;

/**
* Determines all the words such that the first letter's ordinal position
* equals the sum of the remaining letters' ordinal positions.  For instance,
* "table"'s ordinal numbers are [20,1,2,12,5], and 20 = 1+2+12+5.  Thus
* "table" has this property.
*/
public class OrdinalPositionPuzzlerSolver {

    /**
    * Assumes the string consists just of A-Z, a-z.
    */
    public static boolean hasProperty(String s) {
        if (s.length() < 2) {
            return false;
        }
        char[] letters = s.toCharArray();
        int sum = 0;
        for (int i = 1; i < letters.length; i++) {
            int ordinal = ordinal(letters[i]) ;
            sum += ordinal;
        }
        return sum == ordinal(letters[0]);
    }

    /**
    * @return the 1 based ordinal position of the letter.  For instance,
    * a is 1, b is 2, etc.
    */
    public static int ordinal(char letter) {
        return Character.toLowerCase(letter) - 'a' + 1;
    }

    public static void main(String[] args) {
        String dictionaryPath = "dict.txt";
        /**

http://www.javapractices.com/topic/TopicAction.do?Id=42

        */
        try {
            BufferedReader input = new BufferedReader(new FileReader(dictionaryPath));
            try {
                String line = null;
                while (( line = input.readLine()) != null){
                    if (hasProperty(line)) {
                        System.out.println(line);
                    }
                }
            }
            catch (FileNotFoundException e) {
                System.err.println("Error, file " + dictionaryPath + " didn't exist.");
            }
            finally {
                input.close();
            }
        }
        catch (IOException e) {
            System.err.println("IO Error:" + e);
        }

    }
}
 

The only place that bears some explanation might be the ordinal function:

/**
 * @return the 1 based ordinal position of the letter.  For instance,
 * a is 1, b is 2, etc.
 */
public static int ordinal(char letter) {
    return Character.toLowerCase(letter) - 'a' + 1;
}

Basically, letters (characters) in Java are really just numbers under the hood which get mapped back into letters when printed to the screen.  So, we can do arithmetic using letters.  ‘a’ + 1 = ‘b’, ‘b’ + 2 = ‘d’, and so on.  (This only works because the letters a-z and A-Z are mapped to a continuous range of integers).  So if we want to convert a->1, first we figure out a’s position from the start of the alphabet (‘a’ -’a’ == 0, it’s the first), and add 1 to get the 1-based number.

The other thing to notice is how much junk code there is just to read the lines from our file:

try {
    BufferedReader input = new BufferedReader(new FileReader(dictionaryPath));
    try {
        String line = null;
        while (( line = input.readLine()) != null){
            if (hasProperty(line)) {
                System.out.println(line);
            }
        }
    }
    catch (FileNotFoundException e) {
        System.err.println("Error, file " + dictionaryPath + " didn't exist.");
    }
    finally {
        input.close();
    }
}
catch (IOException e) {
    System.err.println("IO Error:" + e);
}

Believe it or not, this is relatively idiomatic Java code to accomplish this task.  First we wrap a file reader in a buffered reader so that we do not continuously access the file; rather we read chunks into the file and buffer them for in-memory access.  That’s 20 lines of code just to iterate over the lines of a file, determine if they match a pattern, and print it to the terminal.

It really should not be this hard.

As I’ve said before, Java’s verbosity is one of my biggest complaints about it.  The program works, and it’s relatively straightforward, but it can’t really be described as elegant.  In that same post I mentioned my interest in Scala as an alternative to Java.  It’s a compiled language that runs in the JVM, but through its type inferencing and functional language aspects, it is much more concise; the creator Martin Odersky claims a typical Source Line of Code (SLOC) reduction of 2x between equivalent Scala and Java programs. Let’s reimplement the program in Scala, and see if this claim rings true.  (Note: I’m not an expert in scala so there’s probably even more concise, idiomatic ways to accomplish what I’m trying to do.  But let’s start somewhere).

The syntax of scala is very similar to that of Java.  One of the main differences is that type declarations occur after the variable rather than before.

 int x = 0; // Java
 val x: Int = 0 // Scala

Another syntactic difference is that generic type arguments occur within square brackets rather than angle brackets

 List<Integer> intList = ...; // java
 val intList: List[Int] =  ..// scala

Because of this, array indexing is done through use of parentheses rather than square brackets.

 // Java
 int[] x = {1,2,3,4};
 x[0] // 1
 // Scala
 val x: List[Int] = List(1,2,3,4)
 x(0) // 1

Due to type inferencing, you often don’t need to declare the type of your variables; the compiler can figure it out.  Thus the last declaration could be rewritten as

 val x = List(1,2,3,4)

I highly recommend you watch Martin Odersky’s introduction to Scala video; it’s about an hour long and goes over the design considerations and differences from Java.

There are no static functions in Scala; rather you create a singleton object through the keyword ‘object’ and access the method in the same way.

Finally, there is a read/eval/execute interpreter built-in, accessible by calling ‘scala‘ in the command line.  This allows you to experiment without having to explicitly compile any code; it allowed me to create the pieces iteratively before plunking them into a file for later compilation.

 scala
 Welcome to Scala version 2.7.3.final (Java HotSpot(TM) Client VM, Java  1.5.0_22).
 Type in expressions to have them evaluated.
 Type  :help for more information.

 scala> 1+1
 res0: Int = 2

 scala> print("hello world")
 hello world
 scala>
 

OK, enough beating around the bush.  Let’s solve this puzzler!

We’ll tackle this in 3 parts.
1) Summing a list of numbers
2) Completing the ordinal number function
3) Iterating over all the lines in the dictionary file

Summing a list of numbers

A first pass at a sum function might be as follows:

def sum(nums: Iterable[Int]): Int = {
    var sum = 0
    for (num <- nums) {
        sum += num
    }
    sum
}

Note the declaration of sum as a var as opposed to a val; a val is equivalent to a final value while a var can be reassigned.  Why did I declare the nums argument to be of type Iterable rather than List?  Well, a List is iterable.  So is an array.  So are any number of things.  We want to keep this function as generic as possible. Pasting this into our interpreter, we can test this out.

scala> def sum(nums: Iterable[Int]): Int = {
     |     var sum = 0
     |     for (num <- nums) {
     |         sum += num
     |     }
     |     sum
     | }
sum: (Iterable[Int])Int

scala> sum(List())
res4: Int = 0

scala> sum(List(1,2,3,4,5))
res5: Int = 15

scala> sum(Array(1,2,3,4))
res6: Int = 10

This works, but let’s look at another way to solve this problem – a functional rather than imperative solution.  Java is inherently imperative, meaning you’ll see a lot of explicit while and for loops.  Here’s just a quick excerpt of each the wikipedia articles to define each programming paradigm:

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state.

In computer science, imperative programming is a programming paradigm that describes computation in terms of statements that change a program state. In much the same way that imperative mood in natural languages expresses commands to take action, imperative programs define sequences of commands for the computer to perform.

So whereas the imperative manner is completely explicit in how the sum is created, using a mutable state variable (sum), a more functional approach will eliminate the use of mutable state altogether.

The reduce function is absolutely crucial to understand in order to program in a functional manner; let’s examine Python’s definition of it before applying it to the problem at hand.

Here is the Python definition of reduce function:

reduce(function, iterable[, initializer])

Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y:x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned.

And the description of the built-in sum function is as follows:

sum(iterable[, start])

Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable‘s items are normally numbers, and are not allowed to be strings. The fast, correct way to concatenate a sequence of strings is by calling ”.join(sequence). Note that sum(range(n), m) is equivalent to reduce(operator.add,range(n), m) To add floating point values with extended precision, see math.fsum().

While Scala doesn’t have a built-in sum function, it does have support for reducing a list through use of a function.  It has both a reduce left (left to right reduction) and reduce right (right to left reduction).  Here is Scala’s definition of reduceLeft:

def reduceLeft[B >: A](op : (B, A) => B) : B
Combines the elements of this iterable object together using the binary operator op, from left to right
Notes Will not terminate for infinite-sized collections.
Parameters
op – The operator to apply
Returns op(… op(a0,a1), …, an) if the iterable object has elements a0, a1, …, an.
Throws
Predef.UnsupportedOperationException – if the iterable object is empty.

This might be a little confusing but bear with me.

First, we need to write a binary function for our sum operator – after all, we want to pairwise reduce the list by using an add operator.  A first pass is as follows:

scala> def add(x: Int, y: Int): Int = { x + y }
add: (Int,Int)Int

scala> List(1,2,3,4).reduceLeft(add)
res12: Int = 10
scala> List().reduceLeft(add)
java.lang.UnsupportedOperationException: Nil.reduceLeft
at scala.List.reduceLeft(List.scala:1093)
at .(:6)
at .()
at RequestResult$.(:3)
at RequestResult$.()
at RequestResult$result()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethod...
scala> List(-5,5).reduceLeft(add)
res14: Int = 0

Note that you cannot call sum on an empty list any more; doing so results in an UnsupportedOperationException.  This is arguably more correct than returning 0, but we will have to ensure that we don’t attempt to sum an empty iterable object.

So a working sum definition would be

def sum(nums: Iterable[Int]): Int = {
    def add(x: Int, y: Int): Int = { x + y }
    nums.reduceLeft(add)
}

Note that you can nest function definitions, as functions are objects in Scala, and thus can be defined wherever a variable could.  Note too that I never have to explicitly return a value; the last evaluated value is implicitly returned by the function.

Finally, if you want to be even more terse, you can replace the explicitly defined add function with an anonymous one:

def sum(nums: Iterable[Int]): Int = {
    nums.reduceLeft(_+_)
}

Read the Daniel Spiewak’s excellent blog post for a better explanation for what’s going on here.  Here’s a quick explanation:

(_+_) is actually a Scala shorthand for an anonymous function taking two parameters and returning the result obtained by invoking the + operator on the first passing the second.  Less precisely: it’s addition.  Likewise, (_*_) would define multiplication.

Completing the ordinal number functional

Recall the function that we had in Java for computing the ordinal position of each letter:

    public static boolean hasProperty(String s) {
        if (s.length() < 2) {
            return false;
        }
        char[] letters = s.toCharArray();
        int sum = 0;
        for (int i = 1; i < letters.length; i++) {
            int ordinal = ordinal(letters[i]) ;
            sum += ordinal;
        }
        return sum == ordinal(letters[0]);
    }

    /**
    * @return the 1 based ordinal position of the letter.  For instance,
    * a is 1, b is 2, etc.
    */
    public static int ordinal(char letter) {
        return Character.toLowerCase(letter) - 'a' + 1;
    }

Again, we could directly translate this code into Scala but doing so would not be making full use of Scala’s functional underpinnings.  We are converting from characters of the string to their ordinal positions in the alphabet.  While the Java code does this one letter at a time, keeping a sum, it’s just as correct to do the entire mapping process and then sum over the new list.  Pictorially:

Just as there is a reduce function to convert an iterable object into a single entry, so too is there a function to convert an Iterable of one type into an Interable of another; it is the map function.

// The underscore in an anonymous function like  this is a placeholder for the entry in the iterable collection currently  being processed
scala> List(1,3,4).map(_+5)
res19: List[Int] =  List(6, 8, 9)
scala> List("hi","there").map(_.toUpperCase())
res22:  List[java.lang.String] = List(HI, THERE)

The above examples map from Int-> Int and String->String but there’s nothing stopping us from converting types.

scala> List(100,300,400).map(_.toString())
res23:  List[java.lang.String] = List(100, 300, 400)

Finally, let’s apply the ordinal function to a string:

scala> "cat".map(Character.toLowerCase(_) - 'a' +  1)
res24: Seq[Int] = ArrayBufferRO(3, 1, 20)

Thus the entire function for determining if the word is a match is

[sourecode language="scala"]
/**
* Determines if the word has the desired property of the ordinal position
* of first letter equalling the sum of the ordinal positions of the other
* letters
* Expects letters in the range [a-zA-Z];
*/
def matches(word: String): Boolean = {
if (word.length() < 2) {
false
}
else {
val ordinals = word.map(Character.toLowerCase(_) – 'a' + 1)
val ordinalSum = sum(ordinals.drop(1))
ordinals(0) == ordinalSum
}
}
[/sourecode]

Iterating over the lines in a file

Remember how it took 20 lines to iterate over all the lines in a file?  Fortunately Scala makes it incredibly easy:

val lines =  scala.io.Source.fromFile("dict.txt").getLines

The variable is actually not an array or a list; it is an iterator over the lines in the file.  But iterators are still iterable, so all of our previously mentioned functions for dealing with iterable objects still work.  Again, rather than looping over each element, explicitly testing, and printing whether it matched, we will use a functional approach, namely the ‘filter’ function.

scala>  List(1,2,3,4).filter(_%2==0)
res25: List[Int] = List(2, 4)

The function passed into filter must return Boolean; since our matches() function does that, we can insert it right in:

val lines =  scala.io.Source.fromFile("dict.txt").getLines
val matching =  lines.filter(matches)
matching.foreach(print(_))

When we run the complete code, we get the following list of matches:

  • acknowledgements
    approximation
    bumptiously
    capitalizations
    circuitously
    conceptualizing
    consciousness
    consistently
    credulousness
    crystallizing
    disconcertingly
    discriminatory
    electrocutions
    excommunication
    existentialism
    experimentally
    exploitations
    expressively
    extraneously
    governmentally
    impersonations
    incomprehensibly
    indispensability
    initializations
    intercommunicated
    interpolations
    notwithstanding
    organizationally
    oversimplifying
    phenomenologically
    sportswriting

Wait a minute… what’s going on here?  These obviously don’t follow the pattern.  Let’s open a terminal and do some testing.

First we paste our two helper method definitions in

 Welcome to Scala version 2.7.3.final (Java  HotSpot(TM) Client VM, Java 1.5.0_22).
 Type in expressions to have  them evaluated.
 Type :help for more information.

 scala>  def sum(nums: Iterable[Int]): Int = {
 |          nums.reduceLeft[Int](_+_)
 |     }
 sum: (Iterable[Int])Int

 scala>

 scala>     /**
 |      * Determines  if the word has the desired property of the ordinal position
 |      * of first letter equalling the sum of the ordinal positions of  the other
 |      * letters
 |      * Expects letters in  the range [a-zA-Z];
 |      */

 scala>     def  matches(word: String): Boolean = {
 |         val ordinals =  word.map(Character.toLowerCase(_) - 'a' + 1)
 |         val  ordinalSum = sum(ordinals.drop(1))
 |         ordinals(0) ==  ordinalSum
 |     }
 matches: (String)Boolean

 scala> matches("table")
 res0: Boolean = true

 scala>  matches("chair")
 res1: Boolean = false

 scala>  matches("consistently")
 res2: Boolean = false
 

This is very strange.  “consistently” is one of the words on the list, but it doesn’t pass the test on its own.  In fact, none of them do.

What’s the problem?  After googling and debugging for a bit, I realized that the precondition I was assuming in my matches method was being violated; the letters were not in the range [A-Za-z].  How could that be?  Well, it turns out that the getLines() method returns the lines in the file – with new lines included!

scala> matches("consistently\r\n")
res5: Boolean = true

(The file had been created on Windows, and thus has a carriage feed as well as a new line marker).

Fortunately there is a method to eliminate whitespace (including control characters like these); it’s known as trim.

We can put the trim in one of two places, either in the matches() method itself,

def matches(word: String): Boolean = {
     val ordinals = word.trim().map(Character.toLowerCase(_) - 'a' + 1)
     val ordinalSum = sum(ordinals.drop(1))
     ordinals(0) == ordinalSum
}

scala> matches("consistently\r\n")
res6: Boolean = false

scala> matches("consistently")
res7: Boolean = false

Or we can introduce the trim while iterating over the lines in the collection.  I’ve opted for this second approach; I’m making it a precondition of calling the method matches() that the input is already sanitized.  We introduce an implicit mapping from the string to the trimmed version of the string:

// An iterator of lines in the file; note that they have newline control characters
val lines = scala.io.Source.fromFile("dict.txt").getLines
// Filter the lines, eliminating the new lines through trim mapping
val matching = lines.filter(i => matches(i.trim()))
matching.foreach(print(_))

After making the change, the result is

  • cab
  • hag
  • jade
  • leaf
  • leg
  • maced
  • mica
  • mid
  • need
  • pig
  • real
  • ride
  • same
  • sand
  • seam
  • sod
  • table
  • toad
  • toe
  • vial
  • ward
  • whack
  • who
  • wick
  • win
  • yeas
  • yet
  • zebra
  • zinc

Tada!

The final version of the program is

object OrdinalPositionSolver {
    /**
     * Determines if the word has the desired property of the ordinal position
     * of first letter equalling the sum of the ordinal positions of the other
     * letters
     * Expects letters in the range [a-zA-Z];
     */
     def matches(word: String): Boolean = {
         if (word.length() < 2) {
            false
         }
         else {
             val ordinals = word.map(Character.toLowerCase(_) - 'a' + 1)
             val ordinalSum = sum(ordinals.drop(1))
             ordinals(0) == ordinalSum
         }
     }

    def sum(nums: Iterable[Int]): Int = {
        nums.reduceLeft[Int](_+_)
    }

    def main(args: Array[String]) {
        // An iterator of lines in the file; note that they have newline control characters
        val lines = scala.io.Source.fromFile("dict.txt").getLines
        // Filter the lines, eliminating the new lines through trim mapping
        val matching = lines.filter(i => matches(i.trim()))
        matching.foreach(print(_))
    }
}

Conclusion

I have solved the puzzler in two different ways, once in an imperative Java style and once in a more functional Scala style.  The functional way is extremely concise and expressive, doing in 30 lines of code what it takes the Java code to do in over 60, confirming the claim that a typical Scala program is at least twice as compact as the corresponding Java program.  I have introduced some of the salient syntactic differences between Java and Scala. I have also introduced some of the mainstays of the functional programming paradigm, namely map, reduce, and filter.  While there are efforts to make Java more functional, the fact that you cannot pass functions as objects severely inhibits the use of functional programming paradigms.

Car Talk Puzzler #1: The Bank Temperature Sign

May 1, 2010 1 comment

One of my favorite podcasts is NPR’s Car Talk, and one of my favorite segments of the show is the weekly puzzler, in which listeners have a week to solve a puzzle that is usually automotive in nature.  It struck me that this would be an ideal way to illustrate some beginning programming techniques, while also illustrating some real world problem solving uses of programming.  Initially I will be posting solutions in Python, but I will probably show how to do them in other ways as well.

You can see the whole archive of 2010 puzzles, but the specific one I’m starting with can be found here.

In short:

PREVIOUS PUZZLER: Stevie and His Moto
Stevie’s riding his motorcycle to work when he sees a big sign displaying the temperature in Fahrenheit and in centigrade. The digits are exactly reversed. He notices the same thing on the way home. What were the temperatures?

This is an ideal choice for a programming solution because it can be brute forced; a computer can try all possible solutions near instantaneously, something they are very good at.  Not all of the puzzlers are so conducive to a programmatic solution.

For those who already know the basics of programming, I’ll post my program first.  Afterwards I post explanations more geared towards beginners, but there still is something that more advanced programmers might not know in Python.

Here’s my solution:

def fahrenheitToCelsius(temp):
 return  (temp - 32) / 1.8

def reverseString(s):
 # Step size is -1,  so start at the end and work backwards
 return s[::-1]

#  Presumably it was above freezing if it's a spring day
LOW_TEMP_FAHRENHEIT  = 32
# Presumably it was below 100 degrees
HIGH_TEMP_FAHRENHEIT =  100

def main():
 for temp in range(LOW_TEMP_FAHRENHEIT,  HIGH_TEMP_FAHRENHEIT):
 # Round the float to nearest whole number,  cast to integer
 celsius = int(round(fahrenheitToCelsius(temp)))

 # Convert the integers to strings so as to be able to reverse the  digits
 fahrenheitString = str(temp)
 celsiusString =  str(celsius)

 # See if the reversed digits match
 if  fahrenheitString == reverseString(celsiusString):
 print temp,  celsius

if __name__ == '__main__':
 main()

Spoiler:

python Temperature.py
61 16
82 28

Tada!  The program calculated the correct solution.

Let’s break the problem into a few logical steps:

1) Convert a Fahrenheit temperature to Celsius (we’re going to be able to do this in order to determine what the temperature on the signs must have been)
2) Round a number to the nearest whole number (converting Fahrenheit to Celsius will often leave a fractional part, but most signs don’t display temperature in increments smaller than 1 degree)
3) Reverse the digits of a number (in order to tell whether the temperatures are the reverse of each other)
4) Test a whole bunch of Fahrenheit temperatures to discover the matching Celsius ones, using steps 1 – 3.

Let’s explain each part in order.

Convert Fahrenheit to Celsius

The first part is the most straightforward:

def fahrenheitToCelsius(temp):
 return (temp  - 32) / 1.8

There’s not a lot to say about this part; this is the formula to convert from Fahrenheit to Celsius.  If you’re new to Python or programming as a whole, this whole block is known as a method declaration; basically methods take in data, do some calculation with the data, and more often than not, return a new piece of data.  In our case, we are passing in the temperature in Fahrenheit, and what comes out of it is a temperature in Celsius.  Just to show that this indeed works:

Python  2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on  win32
Type "help", "copyright", "credits" or "license" for more  information.
>>> def fahrenheitToCelsius(temp):
...      return (temp - 32) / 1.8
...
>>> fahrenheitToCelsius(212)
100.0
>>>  fahrenheitToCelsius(32)

Round a number to nearest whole number

OK, how about rounding a fractional number to the nearest whole number?

Python has a function called, easily enough, round which does exactly that.  You choose how many decimal places you want; by default it rounds to 0 (i.e. the nearest whole number)

>>> round (2.252,  2)
2.25
>>> round (2.252 )
2.0

Notice how even though we round to 0 decimal places, the answer is “2.0″.  This is because round returns a double rather than an integer.  Integers are what we would think of as “whole numbers” – -1023, 2035, 733, etc. etc.  Doubles are an (inexact) representation of a number with a fractional part.  It’s a little beyond the scope of this tutorial, but suffice to say there are numbers that a computer cannot represent exactly

>>> round (2.1, 1)
2.1000000000000001

The reason I make the distinction is that we want to treat our numbers as integers; after all we’re rounding them to 0 decimal places, so they are in fact integers.  We need the computer to treat them that way.  We do this with the int() method

>>>  round(2.23501)
2.0
>>> int(2.0)
2

Note too that we can combine these two expressions into one; you’ll often see computer programs written this way, as it allows us to express thoughts more concisely.  So the above could be replaced by

>>> int(round(2.23501))
2

This expression can be read as “after rounding the number to 0 decimal places, convert it to an integer”.

Reverse the digits of a number

How do we reverse the digits of a number?  Well, first, we’re going to treat the number as if it were an arbitrary string of letters (characters) rather than numbers.  We are going to cast the integer into a string.  Strings are a programming concept used to express textual information; they are usually enclosed in quotes.

>>> "hello"
'hello'

The string representation of a number is different than the number itself.  For instance, when two strings are “added” together, they become joined (concatenated).  When two numbers are added together, they are added in the mathematical sense.  For instance

>>>  "2" + "2"
'22'
>>> 2 + 2
4

We are going to treat the numbers as strings because it is fairly easy to reverse the characters of a string, but hard to do so for a number.  With a string, you can access each individual character; you cannot (easily) do that with a number.  To understand how to reverse the characters of a string, you need to understand a bit about iterating over collections in Python, as well as a concept known as “slice”.

Let’s start with the basic Python data structure known as a list.  You can put a heterogenous (mixed) set of objects into a list:

a = [2,3,5,6,"hello","world"]

In this case we’ve put a bunch of numbers, and then two strings into the list data structure.  We can get out each element in the list by ‘indexing into’ the array.  Unlike in the normal world, in Python you must count starting at 0.  So instead of counting 1,2,3,…, you count 0,1,2,… .  In other words, if we want to get out the first item we put into the list (the 2), we have to ask for the element at position 0.

>>> a[0]
2

The number within brackets is the index of the element you are accessing from the list.

Now, what if we want the first 3 elements of the list instead of just the first one?  To do that, we need to use what’s known as a “slice” in Python.  The syntax is [startIndex:endIndex+1].  In other words, if we want the first three elements, those at position 0, 1, and 2, we would ask for

>>>  a[0:3]
[2, 3, 5]

This can be read as, “start at the element with index 0 and stop before you get to the element with index 3″.  Yes, this is a little counter-intuitive – but just stick with me.

What if you want all of the items after a certain point?  For instance, I want all the elements including and after the 2nd element.  To do that, you omit the second number, but keep the colon.  This is saying “start and keep going till you reach the end of the collection”

>>>  a[1:]
[3, 5, 6, 'hello', 'world']

You can also omit the first number, in which case it is assumed that you want to start at the beginning of the collection

>>> a[:5]
[2, 3, 5, 6, 'hello']
>>>  a[0:5]
[2, 3, 5, 6, 'hello']

What happens if we omit both numbers?

>>>  a[:]
[2, 3, 5, 6, 'hello', 'world']
>>> a
[2, 3, 5,  6, 'hello', 'world']

We get back all the items of the original list!  This makes sense if you think about it; we assume we’re starting at the beginning, and we assume we’re ending at the end.  Why do I bring this up?  Because there’s one last element you can add to take a new slice out of array: the stride argument.

Whereas we’ve been taking some contiguous subset of the list, there’s nothing preventing us from skipping more than one number between every element, in effect taking every second, or every nth number.

>>> a[::2]
[2, 5, 'hello']

Interestingly, this number doesn’t have to be positive either; if it’s negative it means we start at the end of the collection and work our way towards the front of the list:

>>>  a[::-2]
['world', 6, 3]

Putting this all together then, one way to get the reverse of a list is to take a new slice out of it, taking the whole thing but backwards:

>>> a[::-1]
['world', 'hello', 6, 5,  3, 2]

(More advanced readers: an alternate way, and arguably easier, is to use a list comprehension and the reversed method:

>>> [i for i in reversed(a)]
['world',  'hello', 6, 5, 3, 2]

)

Hopefully you understand how to take slices out of a list now; you can read a whole lot more about slices at any number of sites, including this great introduction to Python.

But why did I start talking about lists?  Weren’t we trying to reverse a string, not a list?  Well, it turns out you can view a string as a list of individual letters (characters), and treat it exactly the same as we did for lists, including indexing and slicing.

>>> "hello"[0]
'h'
>>>  "hello"[:3]
'hel'
>>> "hello"[::-1]
'olleh'

So, the method for reversing a string (or really any subscriptable object, but that’s another issue) is

def  reverseString(s):
 # Step size is -1, so start at the end and work  backwards
 return s[::-1]

The only thing left to say in this regard is that if we want to convert a number to a string, this is accomplished by the str() method.

Iterate over a range of numbers

We want to test a whole range of temperatures, looking for those that yield the properties desired in the puzzle.  Fortunately there is an easy way to generate a list of numbers in a given range in Python: the range command.

</div>
>>>  range(5)
[0, 1, 2, 3, 4]
>>> range(5,10)
[5, 6, 7, 8,  9]
<div>
Once we have our list of numbers, we want to iterate over them, pulling out each temperature.  We do that with a simple for loop:
>>> for x in range(5):
...     print  x
...
0
1
2
3
4
Or, for our purposes
for  temp in range(LOW_TEMP_FAHRENHEIT, HIGH_TEMP_FAHRENHEIT):

Where LOW_TEMP_FAHRENHEIT and HIGH_TEMP_FARENHEIT are two constants defined to limit the range of numbers we are looking at.

Conclusion

I’ve illustrated a brute-force solution to a recent puzzler.  Despite being a fairly simple challenge, it illustrates some important features of Python, including range generation, list comprehension, slicing to reverse selections, and converting between different data types.

I’d love to see some other solutions to this problem, so post in the comments if you have ideas.

Human fallibility – static analysis tools

February 9, 2010 5 comments

The theme of this post is human fallibility: no one’s perfect and we’re bound to make mistakes while coding.  The following tools can help statically analyze source or markup and find mistakes early.

Python

I am a huge fan of Python as a scripting language, as its syntax and language features allow you to code quickly and with minimal fuss. (I will definitely use it as a springboard for future blog discussion, especially with respect to its differences from Java).  I am used to statically compiled languages like the aforementioned Java, where the compiler will catch typos and uses of uninitialized variables; not having this ability in Python always makes me feel a bit hesitant to use it for more than quick scripts and small programming tasks.  (Clearly Python is well-suited to large scale production environments; this is more my hangup and lack of expertise in the language than anything else.)

Enter PyChecker, an open-source project that aims to detect some of the most common coding mistakes, including references to variables before assignment, passing wrong numbers of arguments to functions, and calling methods that don’t exist.   It won’t find all your mistakes, but if you have any long-running Python scripts, you’d much prefer to catch a typo before you start running than 90% through the computation.

JSON

JSON is an alternative to XML as a “lightweight data-interchange format”.  Unlike XML with its opening and closing angle brackets, JSON has a very clean syntax with few extraneous marks.  Here’s an example JSON file:

{
    "number": 1,
    "array": [
        5,
        6.7,
        "string",
        [
            "nested list"
        ]
    ],
    "birthdayMap": {
        "Nick": "3/24",
        "Andrew": "12/1"
    }
}

I was working on a project using JSON as its means of representing data when I ran into problems with my hand-generated JSON – I had made mistakes, omitting brackets, not closing quotation marks, or other silly mistakes.  The Java library I was using to parse the JSON read in the whole file as a String, meaning all the contextual information was lost.  When an error ocurred, I was left with a cryptic error message like “syntax error, unexpected $end, expecting ‘}’ at character 2752″ with no idea where in the file the error lay.

Thanks to my coworker Dave, I found the excellent tool JSONLint which not only highlights the exact line and location of your syntax error, it also reformats your JSON to a consistent amount of indentation for each nested level.  JSONLint is indispensible if you’re fat-fingering your JSON code.

Java

PMD is a plugin for NetBeans, Eclipse, and a host of other Java IDEs and standard text editors that warns you of bad coding practices in your source code, as well as alerting you to potential errors.  There are a variety of rules specified in either XQuery notation or Java code that can be turned on and off at will; for instance if you are doing a lot of coding with interfaces to C and need to use shorts and bytes, you probably won’t want the “Don’t use shorts” warning popping up on every line you use a short in.

Some of the most useful rules I’ve found are the fall through in switch statements,
Not all of the rules are cut-and-dried, which the website acknowledges with the addition of a Controversial Rules section.  Some might be due to stylistic differences or just disagreements over whether or not certain constructs are bad practice.  For instance,

OnlyOneReturn

Since: PMD 1.0

A method should have only one exit point, and that should be the last statement in the method.

This rule is defined by the following Java class: net.sourceforge.pmd.rules.design.OnlyOneReturnRule

Example:

public class OneReturnOnly1 {
    public void foo(int x) {
        if (x > 0) {
            return "hey"; // oops, multiple exit points!
        }
        return "hi";
    }
}

Doing a search for “java multiple exit points” reveals that there is definitely a lot of discussion as to best practices in this regard, hence its inclusion in the Controversial Rules section.

Hopefully you’ve gained at least one new tool from this post; if not I’ll be posting more in days to come.