Archive

Archive for May, 2012

Solving Scramble With Friends – a tale of three data structures

May 30, 2012 11 comments

Solving Scramble with Friends – A tale of three data structures

This post aims to illustrate how to solve Scramble With Friends/Boggle and the techniques and data structures necessary. In particular, we will see how to prune an enormous search space, how to use recursion to simplify the task, and the relative performance of three different data structures: an unsorted list, a sorted list with binary search, and a trie.

Problem statement

Given an N x N board, (4×4 in case of Boggle and Scramble with Friends), find all of the paths through the board which create a valid word. These paths must use each tile at most once and be contiguous; tile N must be adjacent to tile N – 1 in any of 8 directions (North, Northeast, East, Southeast, South, Southwest, West, Northwest).

Scramble Screenshot

In this example, ANT is a valid solution in the bottom left corner, but NURSE is not because the Ns are not located next to the U.

Most naive approach

First, imagine that we have a method that, given a path through the board returns all valid words that starts with that path, called DoSolve. For instance, DoSolve(['Qu']) would return all the valid words (and their locations) that start with Qu. DoSolve(['N', 'E']) would return all the valid paths that start with N followed by an E. With such a method, it is trivial to solve the problem. In pseudocode:

Solve(board):
    solutions = empty list
    for each location on board:
        append DoSolve(location) to solutions
    return solutions

The tricky part is, how do we make DoSolve work? Remember that it must

  1. return only valid words
  2. return only words that are formed from contiguous tiles
  3. must not repeat any tiles.

The easiest way to solve this problem is through recursion. As you probably remember, recursion is simply when a method calls itself. In order to avoid an infinite loop (and eventual stack overflow), there must be some sort of stopping criteria. As it was taught to me, you have to be sure that the problem becomes smaller at each step, until it becomes so small as to be trivially solved. Here’s that basic algorithm:

DoSolve(board, tiles_used)
    solutions = empty list

    current word = empty string
    for letter in tiles_used:
        append letter to current word

    # 1 - ensure that the word is valid
    if the current word is a valid word:
        append current word to solutions

    most recent tile = last tile in tiles_used

    # 2 - ensure that the next tile we use is adjacent to the last one
    for each tile adjacent to most recent tile:

        # 3 - ensure that we do not repeat any tiles
        if tile is on the board and tile hasn't been used previously:
            new_path = append tile to copy of tiles_used list
            solutions_starting_from_tile = DoSolve(board, new_path)
            append solutions_starting_from_tile to solutions

    return solutions

This will work, but it suffers from an incredible flaw. Do you see it?

The problem is that this method will waste an inordinate amount of time exhaustively searching the board for solutions even when the current path is completely useless. It will continue to search for words starting with QuR, QuRE, etc., etc., even though no words in the English language start with these letters. The algorithm is still correct, but it can be optimized very simply.

Pruning the search space

Those with some algorithms background might recognize the code above as a modified form of a depth first search. As described previously, a depth first search will exhaustively explore every possible path through the board. We can vastly improve the efficiency of this algorithm by quitting the search early when we know it won’t be fruitful. If we know all of the valid words, we can quit when we know that no word starts with the current string. Thus in the previous example, if the current word built up so far were “QuR”, the algorithm could determine that no words start with QuR and thus fail fast and early. This optimization will not affect the correctness of the algorithm because none of the potential paths we eliminate could possibly lead to a valid word; by constraint #1 this means that none of those paths would have ended up in the list of solutions in the first place.

How much does this save? On random 3×3 boards, The fastest implementation I have is sped up by a factor of 75. Solving 4×4 boards without pruning is infeasible.

Basic Python implementation

Assume as a black box we have two methods IsWord(string) and HasPrefix(string). The pseudocode above can be expressed with Python (forgive the slightly modified parameter list; I found it easier to write it this way):

def DoSolve(self, board, previous_locations, location, word_prefix):
"""Returns iterable of FoundWord objects.

Args:
  previous_locations: a list of already visited locations
  location: the current Location from where to start searching
  word_prefix: the current word built up so far, as a string
"""
solutions = []

new_word = word_prefix + board[location.row][location.col]
previous_locations.append(location)

if PREFIX_PRUNE and not self.HasPrefix(new_word):
  if DEBUG:
    print 'No words found starting with "%s"' %(new_word)
  return solutions

# This is a valid, complete words.
if self.IsWord(new_word):
  new_solution = FoundWord(new_word, previous_locations)
  if DEBUG:
    print 'Found new solution: %s' %(str(new_solution))
  solutions.append(new_solution)

# Recursively search all adjacent tiles
for new_loc in location.Adjacent():
  if board.IsValidLocation(new_loc) and new_loc not in previous_locations:
    # make a copy of the previous locations list so our current list
    # is not affected by this recursive call.
    defensive_copy = list(previous_locations)
    solutions.extend(self.DoSolve(board, defensive_copy, new_loc, new_word))
  else:
    if DEBUG:
      print 'Ignoring %s as it is invalid or already used.' %(str(new_loc))

return solutions

Data structures

The data structure and algorithms used to implement the IsWord and HasPrefix methods are incredibly important.

I will examine three implementations and discuss the performance characteristics of each:

  • Unsorted list
  • Sorted list with binary search
  • Trie

Unsorted list

An unsorted list is a terrible data structure to use for this task. Why? Because it leads to running time proportional to the number of elements in the word list.

class UnsortedListBoardSolver(BoardSolver):
  def __init__(self, valid_words):
    self.valid_words = valid_words

  def HasPrefix(self, prefix):
    for word in self.valid_words:
      if word.startswith(prefix):
        return True
    return False

  def IsWord(self, word):
    return word in self.valid_words

While this is easy to understand and reason about, it is extremely slow, especially for a large dictionary (I used approximately 200k words for testing).

Took 207.697 seconds to solve 1 boards; avg 207.697 with <__main__.UnsortedListBoardSolver object at 0x135f170>

(This time is with the pruning turned on).

Sorted list

Since we know the valid words ahead of time, we can take advantage of this fact and sort the list. With a sorted list, we can perform a binary search and cut our running time from O(n) to O(log n).

Writing a binary search from scratch is very error prone; in his classic work Programming Pearls, Jon Bently claims that fewer than 10% of programmers can implement it correctly. (See blog post for more).

Fortunately, there’s no reason whatsoever to write our own binary search algorithm. Python’s standard library already has an implementation in its bisect module. Following the example given in the module documentation, we get the following implementation:

class SortedListBoardSolver(BoardSolver):
  def __init__(self, valid_words):
    self.valid_words = sorted(valid_words)

  # http://docs.python.org/library/bisect.html#searching-sorted-lists
  def index(self, a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect.bisect_left(a, x)
    if i != len(a) and a[i] == x:
      return i
    raise ValueError

  def find_ge(self, a, x):
    'Find leftmost item greater than or equal to x'
    i = bisect.bisect_left(a, x)
    if i != len(a):
      return a[i]
    raise ValueError  

  def HasPrefix(self, prefix):
    try:
      word = self.find_ge(self.valid_words, prefix)
      return word.startswith(prefix)
    except ValueError:
      return False

  def IsWord(self, word):
    try:
      self.index(self.valid_words, word)
    except ValueError:
      return False
    return True

Running the same gauntlet of tests, we see that this data structure performs far better than the naive, unsorted list approach.

Took 0.094 seconds to solve 1 boards; avg 0.094 with <__main__.SortedListBoardSolver object at 0x135f1d0>
Took 0.361 seconds to solve 10 boards; avg 0.036 with <__main__.SortedListBoardSolver object at 0x135f1d0>
Took 2.622 seconds to solve 100 boards; avg 0.026 with <__main__.SortedListBoardSolver object at 0x135f1d0>
Took 25.065 seconds to solve 1000 boards; avg 0.025 with <__main__.SortedListBoardSolver object at 0x135f1d0>

Trie

The final data structure I want to illustrate is that of the trie. The Wikipedia article has a lot of great information about it.

Trie image

From Wikipedia:

In computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.

Snip:

Looking up a key of length m takes worst case O(m) time. A BST performs O(log(n)) comparisons of keys, where n is the number of elements in the tree, because lookups depend on the depth of the tree, which is logarithmic in the number of keys if the tree is balanced. Hence in the worst case, a BST takes O(m log n) time. Moreover, in the worst case log(n) will approach m. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines

Again, I don’t want to reinvent what’s already been done before, so I will be using Jeethu Rao’s implementation of a trie in Python rather than rolling my own.

Here is a demonstration of its API via the interactive prompt:

>>> import trie
>>> t = trie.Trie()
>>> t.add('hell', 1)
>>> t.add('hello', 2)
>>> t.find_full_match('h')
>>> t.find_full_match('hell')
1
>>> t.find_full_match('hello')
2
>>> t.find_prefix_matches('hello')
[2]
>>> t.find_prefix_matches('hell')
[2, 1]

Unfortunately, there’s a bug in his code:

>>> t = trie.Trie()
>>> t.add('hello', 0)
>>> 'fail' in t
False
>>> 'hello' in t
True
# Should be false; 'h' starts a string but
# it is not contained in data structure
>>> 'h' in t
True

His __contains__ method is as follows:

def __contains__( self, s ) :
  if self.find_full_match(s,_SENTINEL) is _SENTINEL :
      return False
  return True

The find_full_match is where the problem lies.

def find_full_match( self, key, fallback=None ) :
  "'
  Returns the value associated with the key if found else, returns fallback
  "'
  r = self._find_prefix_match( key )
  if not r[1] and r[0]:
    return r[0][0]
  return fallback

_find_prefix_match returns a tuple of node that the search terminated on, remainder of the string left to be matched. For instance,

>>> t._find_prefix_match('f')
[[None, {'h': [None, {'e': [None, {'l': [None, {'l': [None, {'o': [0, {}]}]}]}]}]}], 'f']
>>> t.root
[None, {'h': [None, {'e': [None, {'l': [None, {'l': [None, {'o': [0, {}]}]}]}]}]}]

This makes sense, ‘f’ doesn’t start any words in the trie containing just the word ‘hello’, so the root is returned with the ‘f’ string that doesn’t match. find_full_match correctly handles this case, since r[1] = ‘f’, not r[1] = False, and the fallback is returned. That fallback is used by contains to signify that the given string is not in the trie.

The problem is when the string in question starts a valid string but is itself not contained in the trie. As we saw previously, ‘h’ is considered to be in the trie.

>>> r = t._find_prefix_match('h')
>>> r
[[None, {'e': [None, {'l': [None, {'l': [None, {'o': [0, {}]}]}]}]}], "]
>>> r[0]
[None, {'e': [None, {'l': [None, {'l': [None, {'o': [0, {}]}]}]}]}]
>>> r[1]
"
>>> bool(not r[1] and r[0])
True
>>> r[0][0]
# None

The issue is that his code does not check that there is a value stored in the given node. Since no such value has been stored, the code returns None, which is not equal to the SENTINEL_ value that his __contains__ method expects. We can either change findfullmatch to handle this case correctly, or change the __contains__ method to handle the None result as a failure. Let’s modify the find_full_match method to obey it’s implied contract (and be easier to understand):

def find_full_match( self, key, fallback=None ) :
  "'
  Returns the value associated with the key if found else, returns fallback
  "'
  curr_node, remainder = self._find_prefix_match(key)
  stored_value = curr_node[0]
  has_stored_value = stored_value is not None
  if not remainder and has_stored_value:
    return stored_value
  return fallback

Let’s make sure this works:

>>> reload(trie)
<module 'trie' from 'trie.py'>
>>> t = trie.Trie()
>>> t.add('hello', 2)
>>> 'f' in t
False
>>> 'hell' in t
False
>>> 'hello' in t
True

OK, with that minor patch, here’s a first pass implementation of the solution using the trie:

class TrieBoardSolver(BoardSolver):
  def __init__(self, valid_words):
    self.trie = trie.Trie()
    for index, word in enumerate(valid_words):
      # 0 evaluates to False which screws up trie lookups; ensure value is 'truthy'.
      self.trie.add(word, index+1)

  def HasPrefix(self, prefix):
    return len(self.trie.find_prefix_matches(prefix)) > 0

  def IsWord(self, word):
    return word in self.trie

Unfortunately, this is slow. How slow?

Took 2.626 seconds to solve 1 boards; avg 2.626 with <__main__.TrieBoardSolver object at 0x135f070>
Took 22.681 seconds to solve 10 boards; avg 2.268 with <__main__.TrieBoardSolver object at 0x135f070>

While this isn’t as bad as the unsorted list, it’s still orders of magnitudes slower than the binary search implementation.

Why is this slow? Well, it’s doing a whole lot of unnecessary work. For instance, if we want to determine if ‘h’ is a valid prefix, this implementation will first construct the list of all words that start with h, only to have all that work thrown away when we see that the list is not empty.

A much more efficient approach is to cheat a little and use the previously discussed method _find_prefix_match which returns the node in the tree that the search stopped at and how much of the search string was unmatched.

By using this method directly, we can avoid creating the lists of words which we then throw away. We modify the HasPrefix method to the following:

def HasPrefix(self, prefix):
  curr_node, remainder = self.trie._find_prefix_match(prefix)
  return not remainder

With this optmization, the trie performance becomes competitive with the binary search:

Took 0.027 seconds to solve 1 boards; avg 0.027 with <__main__.TrieBoardSolver object at 0x135f230>
Took 0.019 seconds to solve 1 boards; avg 0.019 with <__main__.SortedListBoardSolver object at 0x135f330>
Took 0.199 seconds to solve 10 boards; avg 0.020 with <__main__.TrieBoardSolver object at 0x135f230>
Took 0.198 seconds to solve 10 boards; avg 0.020 with <__main__.SortedListBoardSolver object at 0x135f330>
Took 2.531 seconds to solve 100 boards; avg 0.025 with <__main__.TrieBoardSolver object at 0x135f230>
Took 2.453 seconds to solve 100 boards; avg 0.025 with <__main__.SortedListBoardSolver object at 0x135f330>

It is still slower, but not nearly as bad as before.

Solving the board in the screenshot

With all this machinery in place, we can run the original board in the screenshot through the algorithms:

words = ReadDictionary(sys.argv[1])
print 'Read %d words' %(len(words))

trie_solver = TrieBoardSolver(words)
sorted_list_solver = SortedListBoardSolver(words)
b = Board(4, 4)
b.board = [
    ['l', 'qu', 'r', 'e'],
    ['s', 'l', 'u', 's'],
    ['a', 't', 'i', 'c'],
    ['n', 'r', 'e', 'n']
]
print 'Solving with binary search'
sorted_solutions = sorted_list_solver.Solve(b)
print 'Solving with trie'
trie_solutions = trie_solver.Solve(b)

# Results should be exactly the same
assert sorted_solutions == trie_solutions

for solution in sorted(sorted_solutions):
    print solution
words = sorted(set([s.word for s in sorted_solutions]))
print words

The results appear after the conclusion.

Conclusion

I have presented both pseudocode and Python implementations of an algorithm for solving the classic Boggle/Scramble With Friends game. In the process, we saw such concepts as recursion, depth first search, optimizing code through pruning unfruitful search branches, and the importance of using the right data structure. This post does not aim to be exhaustive; I hope I have piqued your interest for learning more about tries and other lesser known data structures.

For results, the patched trie implementation, and the driver program, see below. To run the driver program, pass it a path to a list of valid words, one per line. e.g.

python2.6 solver.py /usr/share/dict/words

For mobile users, the link can be found at https://gist.github.com/2833942

Advertisements

Hello {planet_name}: Creating strings with dynamic content in Python

May 17, 2012 1 comment

2012-05-16

The ability to create strings with dynamic content is very important in creating applications. Python makes this easy, but it’s not always clear what the correct approach is. Let us go through four ways of creating strings:

  • Implicit concatenation
  • Explicit concatenation
  • String formatting via % operator
  • String formatting via format method

Implicit concatenation

If whitespace alone separates adjacent string literals, they will be concatenated together.

>>> 'hello' 'world'
'helloworld'

This can be useful if you have a long string that you want to break up

>>> a_long_string = ('hello this is a very very very very'
...                  'long string')
>>> print a_long_string
hello this is a very very very verylong string

Implicit concatenation does not work for inserting variables into strings:

>>> name = 'Nick'
>>> 'hello' name
  File "<stdin>", line 1
    'hello' name

In that case, we need to be more explicit.

Explicit concatenation

You can also use the + operator to concatenate strings.

>>> 'hello' + 'world'
'helloworld'

Unlike in Java, types are not automatically coerced; it is an error to attempt to concatenate non-strings to a string:

>>> num_examples = 4
>>> 'I have ' + num_examples + ' examples to discuss.'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects

You must explicitly convert non-string types into strings before using this operator.

>>> 'I have ' + str(num_examples) + ' examples to discuss.'
'I have 4 examples to discuss.'

String formatting via % operator

Creating longer strings by concatenating together different substrings works, but it’s not the most elegant solution. It is a common mistake to forget to put a space before or after whatever is inserted in the middle of two other strings, wasting time on the programmer’s part. It is also somewhat hard to scan while reading the source code; the + signs can distract from what the string is trying to say.

If you have used C before, you are probably familiar with printf and sprintf, which allow you to embed special characters within your strings which are subsituted with later arguments. For instance,

printf("hello %s\n", "world");

would produce “hello world”. Python has this feature built in to strings with the % operator. You can read more in depth about the [String Formatting Operations][] and its syntax, but at the very least you should memorize the flags %d (for integer types), %f (for floating point), and %s (for strings). For instance,

>>> 'I have %d things to talk about' %num_topics
'I have 4 things to talk about'

If you have more than one substitution to make, you must surround the substituted values in parentheses:

>>> 'I have %d things to talk about.  Number 1: %s' %(num_topics, 'Implicit concatenation')
'I have 4 things to talk about.  Number 1: Implicit concatenation'

This is a very powerful technique, especially when you learn the formatting flags that this operator supports. For instance,

>>> 'It is %.2f degrees out' %(98.63483)  # display 2 decimal places
'It is 98.63 degrees out'

This technique does not work as well when there are many substitutions to make:

query = """SELECT %s
FROM %s
WHERE %s
GORUP BY %s
ORDER BY %s""" %(columns, table, where_clause, order_by_clause, group_by_clause)

Did you catch the mistake? (Hint – check the order of args in parentheses) It’s easy to make a mistake like this in production. If you have more than 2 or 3 substitutions, I recommend the following technique – string format.

String formatting via format function

Strings have a format function which can also play the part of variable substiution. The benefit of this approach is that each substitution is named, and it is impossible to provide the arguments in the wrong order.

query = """SELECT {columns}
FROM {table}
WHERE {where_clause}
GORUP BY {group_by_clause}
ORDER BY {order_by_clause}""".format(columns = 'sum(num_users) AS actives',
                                    table = '1_day_actives',
                                    where_clause = 'country = "US"',
                                    order_by_clause = 'actives',
                                    # Note the wrong order - it doesn't matter, just as for
                                    # providing keyword arguments
                                    group_by_clause = 'actives')

If you prefer, you can use a dictionary to power the variable replacement.

data_dict = {
    'columns': 'sum(num_users) AS actives',
    'table': '1_day_actives',
    'where_clause': 'country = "US"',
    'order_by_clause': 'actives',
    'group_by_clause': 'actives'
}
query = """SELECT {columns}
FROM {table}
WHERE {where_clause}
GORUP BY {group_by_clause}
ORDER BY {order_by_clause}""".format(**data_dict)

Note the format method is only available in Python 2.6 and newer. As the documentation states,

This method of string formatting is the new standard in Python 3, and should be preferred to the % formatting described in String Formatting Operations in new code.

Hopefully I have shown that this is easier to read and less error prone than the alternatives.

Conclusion

It’s easy to get stuck in a rut and do things exactly the same way, even when alternatives exist. I only just recently learned about the string.format function, and find it preferable to all of the alternatives I have laid out. I hope you also find it useful.

Categories: programming, Python Tags: , ,

Glazed Lists – an essential Java library for lists and tables

May 2, 2012 9 comments

Swing is the built in toolkit for creating user interfaces for Java programs. While these types of standalone desktop applications are becoming less prevalent, perhaps due to increasing functionality of webapps, there are still some industries which are highly reliant on them. If you find yourself creating a Java desktop application, you will probably have to learn Swing, and you will also probably have to learn to display information to the user in list or table form. In standard Java Swing applications, it is difficult, or at least annoying, to do the following three tasks:

  1. Displaying domain specific models
  2. Filtering
  3. Sorting

Glazed Lists is an open source project that makes all three of these tasks trivial. Its primary author, Jesse Wilson, is a current Google employee. Let’s examine each of these aspects in turn.

Provides a simplified API for representing objects within a JTable

Swing uses the Model View Controller paradigm throughout. Thus the table or list merely presents a view for an underlying model data structure. Part of your job in displaying data in a Swing table is to define the TableModel implementation which provides the data for the JTable to display.

Swing provides an AbstractTableModel that does most of the work for you, requiring you only to implement the following methods:

public int getRowCount();
public int getColumnCount();
public Object getValueAt(int row, int column);

Here’s a simple domain model object we might want to visualize in a table:

public class Person {
    int age;
    String name;

    public Person(String name, int age) {
        this.name = name;
        this.age = age;
    }

    public int getAge() { return age; }

    public String getName() { return name; }
}

The logical way of doing that would be two have two columns, one for the age, one for the name. Let’s make a table model for this case:

public class PersonTableModel extends AbstractTableModel {
    private static final String[] columns = {"Name", "Age"};
    private final List people;

    public PersonTableModel(List people) {
        // Make a defensive copy
        this.people = new ArrayList(people);
    }

    public int getRowCount() {
        return people.size();
    }
    public int getColumnCount() {
        return columns.length;
    }
    public Object getValueAt(int row, int column) {
        Person p = people.get(row);
        if (column == 0) {
            return p.getName();
        } else {
            return p.getAge();
        }
    }
}

This certainly works, but it requires a fair bit of boilerplate. Furthermore, the code above does not provide any way of modifying the list of people after it is copied by the TableModel.

Glazed Lists simplifies your life by treating the table not as an arbitrary two dimensional grid, but instead as a collection of rows, where the rows are kept in sync with changes to the domain models that they represent. All you have to do is define how a row is laid out, and Glazed Lists takes care of the rest.

The interface you need to use in order to define how the table looks and which aspects of your model objects are exposed is called [TableFormat][].

The interface is as follows:

  • int getColumnCount() – The number of columns to display.
  • String getColumnName(int column) – Gets the title of the specified column.
  • Object getColumnValue(E baseObject, int column) – Gets the value of the specified field for the specified object.

This should remind you of the TableModel interface presented previously, but note how the getColumnValue method is different – rather than getting a row and column, and forcing you to look up the object corresponding to that row, you are provided the object directly.

Here is a TableFormat which allows Person objects to be easily visible in a JTable:

public class PersonTableFormat implements TableFormat {

    String[] columnNames = {"Name", "Age"};
    private static final int NAME_INDEX = 0;
    private static final int AGE_INDEX = 1;

    public int getColumnCount() { return columnNames.length; }

    public String getColumnName(int column) { return columnNames[i]; }

    public Object getColumnValue(Person baseObject, int column) {
        switch (column) {
            case NAME_INDEX:
                return baseObject.getName();
            case AGE_INDEX:
                return baseObject.getAge();
            default:
                throw new IllegalArgumentException("Expected column 0 or 1, got " + column);
        }
    }
}

While this isn’t too hard to write, it’s still a lot of boilerplate (and not significantly different from the previous example). Glazed Lists makes it even easier than this. The entire class definition above can be replaced with three lines:

TableFormat personTableFormat = GlazedLists.tableFormat(Person.class,
    // Names of the properties to fetch
    new String[] {"name","age"},
    // Names for the columns
    new String[] {"Name", "Age"});

What’s this doing? And how can it do all that I had previously in one line of code? Well, it requires and takes advantage of JavaBeans naming convention. The static function uses reflection to find the methods mapping to properties named “name” and “age”. In this case, it looks for two methods, getName() and getAge(), both of which it finds. (If I didn’t name my methods appropriately, I would get a runtime exception). The second array defines the strings that should be used to identify the corresponding entry in the properties array. In other words, element 0 in the names column is used to identify the property name at index 0.

This TableFormat class alone is insufficient to display data in a table. To do that, you need a class which fulfills the TableModel interface I described previously. Fortunately, Glazed Lists makes this easy.

The fundamental building block of Glazed Lists is the EventList class. It is similar to the ArrayList class in Java, except that it has support for observers. If you’re not familiar with the Observer/Observable design pattern, it allows objects (observers) to register themselves and receive notifications whenever a different object (the observable) is changed. For instance, when a new item is added to the EventList, the UI element representing it on screen automatically refreshes itself.

The EventTableModel class fulfills the TableModel interface, making use of the EventList and TableFormat we described earlier. The EventList is the data provider, and the TableFormat determines how to extract the data from the EventList and display it in the table.

EventList people = new BasicEventList();
// Add all the elements
for (Person p : getPeople()) {
    personList.add(p);
}
TableFormat personTableFormat = GlazedLists.tableFormat(Person.class,
    // Names of the properties to fetch
    new String[] {"name","age"},
    // Names for the columns
    new String[] {"Name", "Age"});
EventTableModel tableModel = new EventTableModel(people, personTableFormat);
JTable table = new JTable(tableModel);
// Any modifications to the ‘people’ list is automatically reflected in the table

Provides a simplified means of filtering a table or list

Perhaps one of the most important features of any interactive table is the ability to filter out extraneous information. Glazed Lists makes this possible by chaining together EventList transformations; these transformations provide a different view of the underlying data. When the original model is modified, the filtered views automatically pick up the changes and update accordingly.

Say we want to provide the ability to filter the list based on people’s names. We will add a listener to a text field which listens for changes (new letters typed or deleted), and filters the list in real time. Once we have an EventList of some sort, it is easy to create a new “view” of that same list, filtering out entries you don’t want to see. You do this by wrapping the list in a FilterList, and then assigning some sort of filter criterion. Let’s start simple with a filtered list which only shows those users whose names start with the letter ‘A’.

EventList personList = new BasicEventList();
personList.add(new Person("Anthony Hopkins", 74));
personList.add(new Person("Barack Obama", 50));
personList.add(new Person("American McGee", 39));

Matcher personFilter = new Matcher() {
    public boolean matches(Person p) {
        return p.getName().startsWith("A");
    }
};
// Create a filtered list
FilterList filteredList = new FilterList(personList, personFilter);
// Displaying the people in a list as opposed to a table; could also create EventTableModel
// as in the last example.
EventListModel filteredListModel = new EventListModel(personList)
JList list = new JList(filteredListModel);
// At this point, shows Anthony Hopkins and American McGee

The filter I’ve defined above is static – once it’s instantiated, its filter condition never changes. Glazed Lists supports dynamic filters as well, through the MatcherEditor interface. We will see how to use a MatcherEditor instance for a text field, but first we need to tell Glazed Lists which strings to use when filtering for a given object. We do this with the TextFilterator interface.

Picture illustrating a FilterList which accepts only those people whose name starts with 'A'

 

public class PersonTextFilterator imlements TextFilterator {
    // Slightly strange interface, but done for efficiency reasons
    public getFilterStrings(List baseList, Person element) {
        baseList.add(element.getName());
        // Allow users to filter by age as well
        baseList.add(String.valueOf(element.getAge()));
    }
}

The MatcherEditor class to use in our case is TextComponentMatcherEditor. We provide it with the text field that it will use as the filter source, as well as an instance of the PersonTextFilterator class we just defined.

EventList personList = new BasicEventList();
personList.add(new Person("Anthony Hopkins", 74));
personList.add(new Person("Barack Obama", 50));
personList.add(new Person("American McGee", 39));

JTextField filterTextField = new JTextField();
// Add the text field to the UI - add to a JPanel

// Hook the text field up to a filter list
MatcherEditor filter = new TextComponentMatcherEditor(filterTextField, new PersonTextFilterator());

// Create a filtered list
FilterList filteredList = new FilterList(personList, filter);
EventListModel filteredListModel = new EventListModel(filteredList)
JList list = new JList(filteredListModel);
// List automatically updates in response to typing in the text field

Each transformed EventList is itself an EventList, meaning it can also be used as the basis of an EventListModel or EventTableModel. This chaining capability is extremely powerful.

Provides sorting capabilities

Finally, Glazed Lists makes it extremely easy to implement rich sorting capabilities in your tables or lists.

As we saw in the last example, it is possible to wrap a given EventList to provide a different view. In this case, we will have a sorted view of the data, which automatically updates whenever the underlying data changes.

To create the SortedList, you need to make your domain object implement Comparable, or create a Comparator. For instance,

public class PersonNameComparator implements Comparator {
    public int compare(Person p1, Person p2) {
        return p1.getName().compareTo(p2.getName());
    }
}
EventList personList = new BasicEventList();
personList.add(new Person("Anthony Hopkins", 74));
personList.add(new Person("Barack Obama", 50));
personList.add(new Person("American McGee", 39));

Comparator nameComparator = new PersonNameComparator();
// Create a sorted list decorator
SortedList sortedList = new SortedList(personList, nameComparator);
EventListModel sortedListModel = new EventListModel(sortedList)
JList list = new JList(filteredListModel);

A SortedList, wrapping a standard EventList

While the above example works for JLists, it’s nice to be able to sort a JTable as well. This is not too hard, either, as long as you have set up a TableFormat instance as described in the first section of this post. In essence, the TableFormat defines the type of each column, which is then used to sort the table whenever the corresponding column header is clicked. This behavior is defined in the TableComparatorChooser class, which exposes a static method to perform the installation on the target JTable. Here’s an example:

Comparator nameComparator = new PersonNameComparator();
// Create a sorted list decorator
SortedList sortedList = new SortedList(personList, nameComparator);
EventTableModel peopleTableModel = new EventTableModel(sortedList, new PersonTableFormat());
JTable peopleTable = new JTable(peopleTableModel);

// Use MULTIPLE_COLUMN_MOUSE to allow sorting by multiple columns, or SINGLE_COLUMN
// to sort by just a single column
TableComparatorChooser tableSorter = TableComparatorChooser.install(
    peopleTable, sortedList, TableComparatorChooser.MULTIPLE_COLUMN_MOUSE);

// At this point, clicking on the table headers will sort by this column

As the more detailed Glazed Lists tutorial warns,

By default, TableComparatorChooser sorts by casting column values to Comparable. If your column’s values are not Comparable, you’ll have to manually remove the default Comparator using TableComparatorChooser.getComparatorsForColumn(column).clear().

As long as your columns are represented by Comparable classes such as Number or String, you shouldn’t have to worry about this caveat.

Conclusion

Glazed Lists is one of the best Java Swing libraries I’ve ever used. It simplifies life for the programmer as well as the end user of the software project, since tables that allow sorting and filtering are far more useful than those which do not. If you do any sort of Swing programming, you owe it to yourself to try this library out. You can find much more information, including the aforementioned tutorial, on the Glazed List website.

 

Chaining together multiple list transformations makes it easy to create powerful programs