Data Visualization – Size of NFL Football Players Over Time

June 5, 2014 Leave a comment
1920 height/weight of NFL players

Screenshot from http://noahveltman.com/nflplayers/ – 1920

2014 height/weight of NFL players

Screenshot from http://noahveltman.com/nflplayers/ – 2014

I love Noah Veltman’s visualization of the changing height and weight distribution of professional football players. It uses animation to convey the incredible increase in size of the typical football player, and it does so with a minimal amount of chart junk. Let’s look at two aspects that make this effective.

It uses the appropriate visualization

There are 4 variables plotted on the graph – height, weight, density, and time. Two of the variables are encoded in the axes of the chart. The time dimension is controlled by the slider (or by hitting the play button). The density is represented by the color on the chart.

You could present this data as a table of data but it would be much harder to understand the pattern that the animation conveys in a very simple manner – not only are players getting bigger in both terms of height and weight, but the variance is increasing as well.

It makes good use of color

It uses color appropriately, by varying the saturation rather than the hue. I’ve blogged about this topic before when discussing the Wind Map. To repeat my favorite quote about this, Stephen Few states in his PDF “Practical Rules for Using Color in Charts”:

When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values

Extensions

I could imagine extending this visualization in a few ways:

  • Allow users to view the players that match a given height/weight combination (who exactly are the outliers?)
  • Allow restricting the data to a given position (see how quarterbacks’ height/weight are distributed vs those of the offensive line)
  • Compare against some other normalized metrics, such as rate of injury. Is there a correlation?

This is a great data visualization because it tells a story and it spurs the imagination towards additional areas of analysis and research.

Reblog: “Top 10 Mistakes that Python Programmers Make”

May 11, 2014 Leave a comment

Martin Chikilian from Toptal rounds up some common mistakes that Python programmers make.

I have made mistake #1 on multiple occasions:

Common Mistake #1: Misusing expressions as defaults for function arguments
Python allows you to specify that a function argument is optional by providing a default value for it. While this is a great feature of the language, it can lead to some confusion when the default value is mutable. For example, consider this Python function definition:

>>> def foo(bar=[]):        # bar is optional and defaults to [] if not specified
...    bar.append("baz")    # but this line could be problematic, as we'll see...
...    return bar

A common mistake is to think that the optional argument will be set to the specified default expression each time the function is called without supplying a value for the optional argument. In the above code, for example, one might expect that calling foo() repeatedly (i.e., without specifying a bar argument) would always return ‘baz’, since the assumption would be that each time foo() is called (without a bar argument specified) bar is set to [] (i.e., a new empty list).

I don’t remember for sure, but I’ve probably done something like #5, modifying a list while iterating through it.

If you write Python code, the rest of the article is worth a read

Categories: programming, Python Tags: ,

Google’s self-driving car – a trip through Mountain View

April 30, 2014 1 comment

Disclaimer – I work for Google but have nothing to do with the self-driving car team. The views expressed here are my own and do not necessarily represent that of my employer.

I found Eric Jaffe’s article on Google’s self-driving car and its handling of city streets immensely fascinating. He describes being a passenger in the self-driving car as it drives around Mountain View, CA (where I’m currently living) with minimal intervention from the Google engineer sitting in the driver’s seat.

It details some of the history of the self-driving car project, from its roots in DARPA’s Grand Challenge, a race through the desert with autonomous vehicles, through driving on the highway, to the vastly more complicated challenge of driving in an urban environment.

The article points out that Mountain View isn’t as crowded or chaotic as a city like New York or San Francisco, but it still has many interesting technological and algorithmic challenges, such as trying to intuit whether people at a crosswalk are really going to cross, or whether a bicyclist is going to get into the lane in front of the car.

I recommend reading the article to see how autonomous vehicles combine software engineering, robotics, machine learning, and computer vision in one sophisticated package.

Categories: Uncategorized

Spot the real Java class name

April 22, 2014 Leave a comment

This is hilarious (if you’re a programmer). Some folks trained a Markov chain on the class names in Spring and then made a game out of it – three Java class names are presented, only one is real. I didn’t do so hot – 4/11.

Screen Shot of the web app

Via Jeff Dean on Google+.

YAGNI and the scourge of speculative design

April 14, 2014 Leave a comment

Woman with crystal ball
Robert Anning Bell [Public domain], via Wikimedia Commons

I’ve been programming professionally for five years. One of the things that I’ve learned is YAGNI, or “You aren’t gonna need it”.

It’s taken me a long time to learn the importance of this principle. When I was a senior in college, I had a course that involved programming the artificial intelligence (AI) of a real-time strategy game. For our final project, our team’s AI would be plugged in to fight against another team’s. I got hung up on implementing a complicated binary protocol for the robots on our team to communicate efficiently and effectively, and our team ended up doing terribly. I was mortified. No other team spent much time or effort on their communication protocol, and only after getting everything else up and running.

In this essay I’ll primarily be talking about producing code that’s not necessary now, but might be in the future. I call this “speculative design” and it’s what the YAGNI philosphy prevents.

First, let’s discuss how and why this speculative design happens. Then we’ll discuss the problems with giving into the temptation.

Why does it happen

I can only speak to my own experience. The times I’ve fallen into this trap can be classified into a few categories:

  • It’s fun to build new features
  • It feels proactive to anticipate needs
  • Bad prioritization

Building features is fun

Programming is a creative outlet. It’s incredibly satisfying to have an idea, build it in code, and then see it in use. It’s more fun than other parts of development, like testing, refactoring, fixing bugs, and cleaning up dead code. These other tasks are incredibly important, but they’re ‘grungy’ and often go unrewarded. Implementing features is not only more fun, it get you more visibility and recognition.

Proactive to anticipate needs

A second reason one might engage in speculative design is to be proactive and anticipate the needs of the customer. If our requirements say that we must support XML export, it’s likely that we’ll end up having to support JSON in the future. We might as well get a head start on that feature so when it’s asked for we can delight the customer by delivering it in less time.

Bad prioritization

This is the case with the story I started this piece with. I overestimated the importance of inter-robot communications and overengineered it to a point where it hurt every other feature.

In this case, the feature was arguably necessary and should have been worked on, but not to the extent and not in the order that I did. In this case I should have used a strategy of satisficing and implemented the bare minimum after all of the more important things were done.

Why is it problematic

I’ve described a few reasons speculative code exists. You’ve already seen one example of why it’s problematic. I’ll detail some other reasons.

More time

Let’s start simple. Time spent building out functionality that may be necessary in the future is time not spent on making things better today. As I mentioned at the start of this post, I ended up wasting hours and hours on something that ended up being completely irrelevant to the performance of teams in the competition, at the expense of things that mattered a lot more, like pathfinding.

Less focus

Since there is more being developed, it’s likely that the overall software product is less focused. Your time and attention are being divided among more modules, including the speculatively designed ones.

More code

Software complexity is often measured in lines of code; it’s not uncommon for large software projects to number in the millions. Windows XP, for instance, had about 45 million lines.

Edsger Dijkstra, one of the most influential computer scientists, has a particularly good quote about lines of code:

My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

I once equated lines of code produced to productivity, but nothing could be further from the truth. I now consider it a very good week if I decrease the lines of code in the system, by deleting chunks of code or rewriting them to be simpler and shorter.

The extra code and complexity associated with speculative coding is very expensive.

  • It slows down readers of the code
  • It slows down building the software (especially if it pulls in more dependencies)
  • It adds additional tests that slow down the test suite
  • It is likely to add more bugs (more code generally equals more bugs)

Sets unrealistic expectations

Say that you design a feature because you think that the customer is going to want. Imagine that you actually got it right – what they end up asking for is essentially identical to what you’ve implemented. You deliver it to the customer a full week before you promised it.

You might look like a hero, but this sets a very bad precedent. It sets unrealistic expectations as to how much work the feature took to implement, and might lead to the customer setting impossible deadlines for features of similar scope. If you were able to finish that feature early, they might reason, there’s no reason you shouldn’t produce the next feature just as quickly.

You’re probably a bad judge of what will be needed in the future

It’s hard enough to build software from detailed specifications and requirements. Guessing about what the specifications and requirements of a feature that isn’t needed yet is likely to end up with a product that doesn’t make anyone happy. It will likely match the designers’ mental model but not the users, since there was inadequate input from them.

It’s hard to remove features once they exist

Say that you’re designing the export feature of your software. You imagine there will be a whole lot of formats you want to support, but at the moment the only hard and fast requirement is CSV (comma separated value) format. As you’re writing the CSV export code, you see how it would be trivial to implement JSON encoding. And while you’re at it, you throw in XML. You were required to produce CSV but now you have JSON and XML support too. Great!

Well, maybe. Maybe not. A year down the line you notice that only a small percentage of your users export to XML, but the feature has led to a disproportionate number of support tickets. Now you’re in a tough place – if you kill the feature, you’ll irritate these power users. Furthermore, you will have effectively wasted all of the time in implementing the feature in the first place, and all the subsequent patches.

I have seen little-used features remain in production because they’re too much trouble to delete and alienate the few users of said feature. Which leads to…

Increased risk of dead code

Imagine that you’ve implemented a new feature but it’s not ready for prime time yet. Or maybe you used it once or twice but it’s not worth turning on for your normal service. You don’t want to kill the feature entirely, as it might have some utility down the line. (Warning bells should be going off about now) You decide to hide the feature behind a configuration flag that defaults to off. Great! The feature can easily be reenabled should you ever need it again.

There’s just one problem – it gets turned on accidentally interacts catastrophically with the rest of the system. Your software deals with financial transactions and it ends up costing your company 460 million dollars.

This sounds unlikely – except it’s true. This is essentially what happened to Knight Capital in 2012.

From the Security and Exchange Commission report of the incident:

Knight also violated the requirements of Rule 15c3-5(b) because Knight did
not have technology governance controls and supervisory procedures
sufficient to ensure the orderly deployment of new code or to prevent the
activation of code no longer intended for use in Knight’s current operations
but left on its servers that were accessing the market; and Knight did not
have controls and supervisory procedures reasonably designed to guide
employees’ responses to significant technological and compliance
incidents;

This is one of the most visible failures caused by dead or oxbow code. I am not suggesting that Knight Capital speculatively developed the feature that malfunctioned. What I am saying is that

  • It’s dangerous to leave dead code around in a system
  • Speculative development is likely to lead to features that are not used often and are more likely to be dead code than if they were completely spec’ed out as in normal development
  • Therefore speculative development puts you at a greater risk of dead code problems

Don’t allow dead code stay in the codebase. If you should ever need it again, you should be able to retrieve it from the version control system. You almost certainly won’t.

Conclusion

As an engineer, it’s easy to fall into the trap of implementing features before they’re actually needed. You’ll look productive and proactive. In the end, it’s best to avoid this temptation, for all of the problems I’ve mentioned. These include

  • the extra code takes time to write, test, debug, and code review
  • it contributes to a lack of conceptual focus in the system
  • if done to please a customer, it sets unrealistic expectations for the development of other features
  • it imparts an extra maintenance cost for the rest of the lifetime of said feature
  • it will be difficult to remove the feature if and when its lack of use becomes apparent
  • it puts you at increased risk of leaving dead code in the system, code which may later be accessed with bad consequences

I love Dijkstra’s notion of ‘lines spent’. Do you want to spend your time and lines of code on a speculative feature? Just remember – you aren’t gonna need it.

Music hack roundup

March 24, 2014 Leave a comment

Disclaimer: the opinions expressed are my own and do not represent that of my employer, Google

I love music. And programming. I really like pieces of technology that either create new or modify existing pieces of music. Here I detail some of my favorite projects I’ve found in the past few years.

Songsmith

Songsmith is a project from Microsoft Research. From the project description page:

Songsmith generates musical accompaniment to match a singer’s voice. Just choose a musical style, sing into your PC’s microphone, and Songsmith will create backing music for you. Then share your songs with your friends and family, post your songs online, or create your own music videos.

There was a commercial that detailed how the project worked, but what I found really great was what the community did with it. They fed the vocals of famous songs through the software to see what sort of music came out. The results are, ahem, mixed.

First, one that I think sounds pretty interesting – a swing version of Katy Perry’s I Kissed a Girl

Then going into the realm of hilarious:

We Will Rock You – Queen Vs Songsmith

Mortorhead’s Ace of Spades

Nirvana’s In Bloom

Enter Sandman

Roxanne

The Beatle’s A Day in the Life

The Swinger

According to musicmachinery.com’s writeup,

The Swinger is a bit of python code that takes any song and makes it swing.

If you’re not into music theory (or old music) you might not know what constitutes swing. The following video (only need to watch the first 30 seconds or so) is a great example of the difference of straight vs swing styles:

As you can hear, it sounds very different. The first half of each beat is stretched out, and the second half is shrunk down. It sounds even more different when you start listening to familiar songs converted from straight to swing, or vice versa. While most of the links have died, Daft Punk’s Around The World still plays, as does Jefferson Airplane’s White Rabbit.

The source code is available at https://github.com/echonest/remix/blob/master/examples/swinger/swinger.py.

Autocanonizer

From musicmachinery.com’s writeup of The Autocanonizer:

It takes any song and tries to make a canon out of it. A canon is a song that can be played against a copy of itself.

Wikipedia has a bit more information on what exactly a Canon is:

In music, a canon is a contrapuntal compositional technique that employs a melody with one or more imitations of the melody played after a given duration (e.g., quarter rest, one measure, etc.). The initial melody is called the leader (or dux), while the imitative melody, which is played in a different voice, is called the follower (or comes). The follower must imitate the leader, either as an exact replication of its rhythms and intervals or some transformation thereof (see “Types of canon”, below). Repeating canons in which all voices are musically identical are called rounds – “Row, Row, Row Your Boat” and “Frère Jacques” being widely known examples. An example of a classical strict canon is the Minuet of Haydn’s String Quartet in D Minor, Op. 76, No. 2 (White 1976, 66).

With that in mind, here are some example.

My favorite is Adele’s Someone Like You. This one sounds close to a round.

Screenshot from the Autocanonizer

Screenshot from the Autocanonizer

  • Over The Rainbow – starts rough but 30 seconds in it sounds good
  • The Fox – I like it. Lots of self harmonizing. The doubled up chorus actually works. It gets out of sync with itself at some points
  • Take Five – demonstrates that the technique works with odd meter too. Not perfectly lined up at some points

See all the examples available at http://static.echonest.com/autocanonizer/loader.html
Source code available at: https://github.com/plamere/autocanonizer

Hat tip to Greg Linden whose Google+ post alerted me to this, and reminded me of these other projects I’d seen before.

MajorVsMinor

MajorVsMinor is a slight departure from the others I’ve listed because there is a human in the loop – it’s not all algorithmic. From Oleg Berg’s description from olegberg.com

Hello! I am Oleg Berg, a musician from Donetsk, Ukraine. I digitally re-edit famous compositions altering harmonic scale, and I call this experimental music project «Major versus Minor». It may sound surprising and unusual, but it is always interesting. Listen to the music videos below. And please donate to keep the project going
[...]
I by no means intend to enhance the famous music hits as I rework them; they are perfect already. I simply imagine what would it sound like, had the author written it in another mood. And it appears, I succeed in my imaginations.

Again, if you’re not a music nerd you might not know what the difference between major key and minor is. In general picture minor = sad, major = happy. You’ll instantly hear the difference in these versions.

First, a must if you’re an Arrested Development fan.

“Final Countdown in Major key”

My favorite comment from MYxxLxxCHIBI1:

I was literally coming down here to say that myself. GOB finally got accepted to the Alliance of Magicians

Maybe my favorite one -
“Be Worry, Don’t Happy”: Minor Key

I like this one too.
Jingle Bells

“Hey Jude” in minor key

See the whole channel at https://www.youtube.com/user/MajorVsMinor

Since this isn’t a software project per se, there is no link to the source code. According to Asshat8182’s comment on Smells Like Teen Spirit in Major key (with a name like that, he must be a reliable source of information), the way it’s accomplished is

The ‘somehow’ is called Celemony Melodyne. Specifically the DNA function

According to Wikipedia:

Celemony Software GmbH is a German musical software company that specializes in digital audio pitch correction software. It produces Melodyne, an industry standard audio pitch modification tool similar to Auto-Tune

Conclusion

I hope you’ve found this short roundup of music hacks interesting. There are some very creative people out there. If you find what they’re doing interesting, please let them know and/or donate so they’ll keep making great stuff.