Archive
New lines in XML attributes
If you have an attribute in xml that spans multiple lines, e.g.
q2="2 B"
you might expect the newline literal to be encoded in the resulting string when the attribute is parsed. Instead, the above example will be parsed as “2 B”, at least with Java’s SAX parser implementation. In order to have the new line literal included, you should insert the entity & #10; instead (this entity keeps getting eaten by wordpress, so ignore the space) This StackOverflow answer by Tomalak gives some more insight:
Bottom line is, the value string is saved verbatim. You get out what you put in, no need to interfere.
However… some implementations are not compliant. For example, they will encode & characters in attribute values, but forget about newline characters or tabs. This puts you in a losing position since you can’t simply replace newlines with
beforehand.
…
Upon parsing such a document, literal newlines in attributes are normalized into a single space (again, in accordance to the spec) – and thus they are lost.
Saving (and retaining!) newlines in attributes is impossible in these implementations.
__slots__ in Python: Save some space and prevent member variable additions
Today I’m going to be writing about a feature of Python I’d never read before, namely __slots__. In a nutshell, using __slots__
allows you to decrease the memory needed by your classes, as well as prevent unintended assignment to new member variables.
By default, each class has a dictionary which it uses to map from attribute names to the member variable itself. Dictionaries are extremely well designed in Python, yet by their very nature they are somewhat wasteful of space. Why is this? Hash tables strive to minimize collisions by ensuring that the load factor (number of elements/size of internal array) does not get too high. In general hash tables use O(n) space, but with a constant factor nearer to 2 than 1 (again, in order to minimize collisions). For classes with very small numbers of member variables, the overhead might be even greater.
class DictExample: def __init__(self): self.int_var = 5 self.list_var = [0,1,2,3,4] self.nested_dict = {'a':{'b':2}} # Note that this extends from 'object'; the __slots__ only has an effect # on these types of 'new' classes class SlotsExample(object): __slots__ = ('int_var','list_var','nested_dict') def __init__(self): self.int_var = 5 self.list_var = [0,1,2,3,4] self.nested_dict = {'a':{'b':2}} # jump to the repl >>> a = DictExample() # Here is the dictionary I was talking about. >>> a.__dict__ {'int_var': 5, 'list_var': [0, 1, 2, 3, 4], 'nested_dict': {'a': {'b': 2}}} >>> a.x = 5 # We were able to assign a new member variable >>> a.__dict__ {'x': 5, 'int_var': 5, 'list_var': [0, 1, 2, 3, 4], 'nested_dict': {'a': {'b': 2}}} >>> b = SlotsExample() # There is no longer a __dict__ object >>> b.__dict__ Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'SlotsExample' object has no attribute '__dict__' >>> b.__slots__ ('int_var', 'list_var', 'nested_dict') >>> getattr(b, 'int_var') 5 >>> getattr(a, 'int_var') 5 >>> a.x = 5 # We cannot assign a new member variable; we have declared that there will only # be member variables whose names appear in the __slots__ iterable >>> b.x = 5 Traceback (most recent call last): File "<input>", line 1, in <module> AttributeError: 'SlotsExample' object has no attribute 'x'
Note that for the __slots__
declaration to have any effect, you must inherit from object
(i.e. be a ‘new style class’). Furthermore, if you extend a class with __slots__
defined, you must also declare __slots__
in that child class, or else it will have a dict allocated, obviating the space savings. See this StackOverflow question for more.
This feature was useful to me when using Python to implement a packed binary message format. The specification spells out in exquisite detail how each and every byte over the wire must be sent. By using the __slots__
mechanism, I was able to ensure that the client could not accidentally modify the message classes and add new member variables, which would not be serialized anyways.
TextMate Grammar Editing Tip – “Edit in TextMate”
I wrote previously about creating language grammars in TextMate and I’ve been doing a bit more of this lately. One thing that makes this process a lot less painful is following the advice from the official Textmate book and installing the “Edit in TextMate” bundle. Do this by going to Bundles->TextMate->Install “Edit in TextMate”, and follow the instructions. After rebooting TextMate, you can press ⌃⌘E while within the Edit Grammar file to open a live copy of the document in a syntax highlighted textmate window. Every time you hit Save, the changes are pushed back to the unstyled document pane. This drastically speeds up development, as you no longer have to copy and paste text between the windows, but instead can hit save any time you want to try your changes out.
WordPress Stats April Fool’s
While not as flashy as some other April Fool’s day pranks, WordPress definitely got me for a second.