Posts Tagged ‘github’

rstripping Simon Pegg: Don’t use rstrip for file extension removal

March 12, 2018 Leave a comment

In Python, what does '/path/to/my/simon_pegg.jpeg'.rstrip('.jpeg') yield?

If you guessed '/path/to/my/simon_pegg' – good try, but not quite right. The real answer is '/path/to/my/simon_'.

Despite what you might intuitively think, rstrip does NOT strip off a substring from the end of a string. Instead, it eliminates all of the characters from the end of the string that are in the argument. (Note that this is not mutating the string; it returns a copy of the string.)

Since pegg contains the characters that are in .jpeg, it is eliminated as well.

While this behavior is documented, it may be surprising.

Why does it matter? There are many instances of people attempting to use this rstrip approach to strip off a file extension. For instance, you might be converting an image from one filetype to another, and need to construct the final path. Or you might want to rename a bunch of files to have consistent extensions (jpegjpg).

This buggy implementation of stripping a file extension is tricky because most of the time it works – but it works by coincidence (the file name happens not to end with the characters in the file extension).

Github is rife with examples of people making this same mistake. For instance, name = str(name).rstrip(".tif"),
i.path.rstrip('.gzip').rstrip('.gz') + '.gzip') = [i.rstrip(".jpg") for i in k]

The Go language has the same semantics for its TrimRight function. This leads to the same sort of mistakes when people use it to trim file names. For instance,

filehelper.go: filename := strings.TrimRight(f.Name( ".pdf")

latest_images.go: idStr := strings.TrimRight(f.Name(), ".jpg")

The lessons to be learned from this are,

  1. Read the documentation of the library functions you use.
  2. Test your code, and not just of the happy paths. Good tests should try to break your implementation and exercise edge cases.


(Hat tip to my colleague Fredrik Lundh who alerted me to this problem and inspired this post)


Pet peeve – useless error messages

November 3, 2014 Leave a comment

Here’s one from Github:

Screenshot 2014-11-02 14.28.51

Clearly I have entered a URL in the text field. I assume what Github is trying to say is that this doesn’t adhere to whatever client side validation logic they’re using. Since I’m relatively tech savvy I can assume that maybe they want the http:// prefix. Sure enough that fixes the problem, but why could the error message not tell me that? Or even better, why not assume that if I don’t specify the scheme, it should use http or https?

In the words of Steve Krug, “Don’t make me think.”