Archive

Archive for March, 2018

rstripping Simon Pegg: Don’t use rstrip for file extension removal

March 12, 2018 Leave a comment

In Python, what does '/path/to/my/simon_pegg.jpeg'.rstrip('.jpeg') yield?

If you guessed '/path/to/my/simon_pegg' – good try, but not quite right. The real answer is '/path/to/my/simon_'.

Despite what you might intuitively think, rstrip does NOT strip off a substring from the end of a string. Instead, it eliminates all of the characters from the end of the string that are in the argument. (Note that this is not mutating the string; it returns a copy of the string.)

Since pegg contains the characters that are in .jpeg, it is eliminated as well.

While this behavior is documented, it may be surprising.

Why does it matter? There are many instances of people attempting to use this rstrip approach to strip off a file extension. For instance, you might be converting an image from one filetype to another, and need to construct the final path. Or you might want to rename a bunch of files to have consistent extensions (jpegjpg).

This buggy implementation of stripping a file extension is tricky because most of the time it works – but it works by coincidence (the file name happens not to end with the characters in the file extension).

Github is rife with examples of people making this same mistake. For instance,

main.py: name = str(name).rstrip(".tif")

fixname.py:os.rename(i.path,
i.path.rstrip('.gzip').rstrip('.gz') + '.gzip')

selector.py:l = [i.rstrip(".jpg") for i in k]

The Go language has the same semantics for its TrimRight function. This leads to the same sort of mistakes when people use it to trim file names. For instance,

filehelper.go: filename := strings.TrimRight(f.Name( ".pdf")

latest_images.go: idStr := strings.TrimRight(f.Name(), ".jpg")

The lessons to be learned from this are,

  1. Read the documentation of the library functions you use.
  2. Test your code, and not just of the happy paths. Good tests should try to break your implementation and exercise edge cases.

 

(Hat tip to my colleague Fredrik Lundh who alerted me to this problem and inspired this post)

Advertisements