Archive
Three ways of creating dictionaries in Python
Dictionaries are the fundamental data structure in Python, and a key tool in any Python programmer’s arsenal. They allow O(1) lookup speed, and have been heavily optimized for memory overhead and lookup speed efficiency.
Today I”m going to show you three ways of constructing a Python dictionary, as well as some additional tips and tricks.
Dictionary literals
Perhaps the most commonly used way of constructing a python dictionary is with curly bracket syntax:
d = {"age":25}
As dictionaries are mutable, you need not know all the entries in advance:
# Empty dict d = {} # Fill in the entries one by one d["age"] = 25
From a list of tuples
You can also construct a dictionary from a list (or any iterable) of key, value pairs. For instance:
d = dict([("age", 25)])
This is perhaps most useful in the context of a list comprehension:
class Person(object): def __init__(self, name, profession): self.name = name self.profession = profession people = [Person("Nick", "Programmer"), Person("Alice","Engineer")] professions = dict([ (p.name, p.profession) for p in people ]) >>> print professions {"Nick": "Programmer", "Alice": "Engineer"}
This is equivalent, though a bit shorter, to the following:
people = [Person("Nick", "Programmer"), Person("Alice","Engineer")] professions = {} for p in people: professions[p.name] = p.profession
This form of creating a dictionary is good for when you have a dynamic rather than static list of elements.
From two parallel lists
This method of constructing a dictionary is intimately related to the prior example. Say you have two lists of elements, perhaps pulled from a database table:
# Static lists for purpose of illustration names = ["Nick", "Alice", "Kitty"] professions = ["Programmer", "Engineer", "Art Therapist"]
If you wished to create a dictionary from name to profession, you could do the following:
professions_dict = {} for i in range(len(names)): professions_dict[names[i]] = professions[i]
This is not ideal, however, as it involves an explicit iterator, and is starting to look like Java. The more Pythonic way to handle this case would be to use the zip
method, which combines two iterables:
print zip(range(5), ["a","b","c","d","e"]) [(0, "a"), (1, "b"), (2, "c"), (3, "d"), (4, "e")] names_and_professions = zip(names, professions) print names_and_professions [("Nick", "Programmer"), ("Alice", "Engineer"), ("Kitty", "Art Therapist")] for name, profession in names_and_professions: professions_dict[name] = profession
As you can see, this is extremely similar to the previous section. You can dispense the iteration, and instead use the dict
method:
professions_dict = dict(names_and_professions) # You can dispence the extra variable and create an anonymous # zipped list: professions_dict = dict(zip(names, professions))
Further reading
__slots__ in Python: Save some space and prevent member variable additions
Today I’m going to be writing about a feature of Python I’d never read before, namely __slots__. In a nutshell, using __slots__
allows you to decrease the memory needed by your classes, as well as prevent unintended assignment to new member variables.
By default, each class has a dictionary which it uses to map from attribute names to the member variable itself. Dictionaries are extremely well designed in Python, yet by their very nature they are somewhat wasteful of space. Why is this? Hash tables strive to minimize collisions by ensuring that the load factor (number of elements/size of internal array) does not get too high. In general hash tables use O(n) space, but with a constant factor nearer to 2 than 1 (again, in order to minimize collisions). For classes with very small numbers of member variables, the overhead might be even greater.
class DictExample: def __init__(self): self.int_var = 5 self.list_var = [0,1,2,3,4] self.nested_dict = {'a':{'b':2}} # Note that this extends from 'object'; the __slots__ only has an effect # on these types of 'new' classes class SlotsExample(object): __slots__ = ('int_var','list_var','nested_dict') def __init__(self): self.int_var = 5 self.list_var = [0,1,2,3,4] self.nested_dict = {'a':{'b':2}} # jump to the repl >>> a = DictExample() # Here is the dictionary I was talking about. >>> a.__dict__ {'int_var': 5, 'list_var': [0, 1, 2, 3, 4], 'nested_dict': {'a': {'b': 2}}} >>> a.x = 5 # We were able to assign a new member variable >>> a.__dict__ {'x': 5, 'int_var': 5, 'list_var': [0, 1, 2, 3, 4], 'nested_dict': {'a': {'b': 2}}} >>> b = SlotsExample() # There is no longer a __dict__ object >>> b.__dict__ Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'SlotsExample' object has no attribute '__dict__' >>> b.__slots__ ('int_var', 'list_var', 'nested_dict') >>> getattr(b, 'int_var') 5 >>> getattr(a, 'int_var') 5 >>> a.x = 5 # We cannot assign a new member variable; we have declared that there will only # be member variables whose names appear in the __slots__ iterable >>> b.x = 5 Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'SlotsExample' object has no attribute 'x'
Note that for the __slots__
declaration to have any effect, you must inherit from object
(i.e. be a ‘new style class’). Furthermore, if you extend a class with __slots__
defined, you must also declare __slots__
in that child class, or else it will have a dict allocated, obviating the space savings. See this StackOverflow question for more.
This feature was useful to me when using Python to implement a packed binary message format. The specification spells out in exquisite detail how each and every byte over the wire must be sent. By using the __slots__
mechanism, I was able to ensure that the client could not accidentally modify the message classes and add new member variables, which would not be serialized anyways.