Archive
Data Visualization – Size of NFL Football Players Over Time

Screenshot from http://noahveltman.com/nflplayers/ – 1920

Screenshot from http://noahveltman.com/nflplayers/ – 2014
I love Noah Veltman’s visualization of the changing height and weight distribution of professional football players. It uses animation to convey the incredible increase in size of the typical football player, and it does so with a minimal amount of chart junk. Let’s look at two aspects that make this effective.
It uses the appropriate visualization
There are 4 variables plotted on the graph – height, weight, density, and time. Two of the variables are encoded in the axes of the chart. The time dimension is controlled by the slider (or by hitting the play button). The density is represented by the color on the chart.
You could present this data as a table of data but it would be much harder to understand the pattern that the animation conveys in a very simple manner – not only are players getting bigger in both terms of height and weight, but the variance is increasing as well.
It makes good use of color
It uses color appropriately, by varying the saturation rather than the hue. I’ve blogged about this topic before when discussing the Wind Map. To repeat my favorite quote about this, Stephen Few states in his PDF “Practical Rules for Using Color in Charts”:
When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values
Extensions
I could imagine extending this visualization in a few ways:
- Allow users to view the players that match a given height/weight combination (who exactly are the outliers?)
- Allow restricting the data to a given position (see how quarterbacks’ height/weight are distributed vs those of the offensive line)
- Compare against some other normalized metrics, such as rate of injury. Is there a correlation?
This is a great data visualization because it tells a story and it spurs the imagination towards additional areas of analysis and research.
Wind Map – a visualization to make Tufte proud
Edward Tufte is a noted proponent of designing data rich visualizations. His books, including the seminal The Visual Display of Quantitative Information have influenced countless designers and engineers. When I first saw Fernanda Viégas and Martin Wattenberg’s Wind map project via Michael Kleber’s Google+ post, I immediately became entranced with it. After studying it for some time, I feel that the designers must have been intimately familiar with Tufte’s work. Let us examine how this triumph of data visualization succeeds.
Minimalist and data dense
Tufte describes the data density of charts based on the amount of information conveyed per measure of area. There are two ways of increasing data density – increasing the amount of information conveyed, and decreasing the amount of non-essential pixels in the image.
No chart junk
You’ll immediately notice what’s not in the image – there’s no compass rose, no latitude or longitude lines, or any other grid lines separating the map from the rest of the page. There aren’t even dividing lines between the states. It isn’t a map at all about political boundaries, so this extra information would only detract from the data being conveyed.
More info
This map conveys two variables, wind speed and wind direction, for thousands of points across the United States. A chart conveying the same information would take far more space and the viewer would have no way of seeing the patterns that exist.
Does not abuse color
In the hands of less restrained designers, this map would be awash in color. You see this often in weather maps and elevation maps, as illustrated below:
The problem is that it is difficult to place colors in a meaningful order quickly. Yes, there is the standard ROYGBIV color ordering of the rainbow, but it’s difficult to apply quickly. Quick – what’s ‘bigger’ – orange or mauve? How about pink or green? Yellow or purple?. It is much easier to compare colors based on their saturation or intensity rather than hue. Color is great for categorical differences, but not so great for conveying quantitative information. Stephen Few sums it up nicely in his great PDF “Practical Rules for Using Color in Charts”
When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values
The designers uses five shades of gray, each of which is distinguishable from the others, rather than a rainbow of colors. Five options is a nice tradeoff between granularity and ease of telling the shades apart.
Excellent use of the medium
In a print medium, the shades of gray would have had to suffice to illustrate how fast the wind was moving. In this medium, the designers used animation to illustrate the speed and direction of the wind in a truly mesmerizing way.
Conclusion
This visualization does a lot of things right. In particular, it uses a great deal of restraint in conveying the information. Unlike some of the other examples I showed, it does not have extra chart junk wasting space, it does not abuse color to try to convey quantitative information, and it is absolutely aesthetically pleasing.
Scala – type inferencing gotchas
int x = 100;
as we would in Java, you can instead write
val x = 100
The setup
def calculateAverageColor(image:BufferedImage):Color = { var redSum = 0 var greenSum = 0 var blueSum = 0 // calculate the sum of each channel here val red = (redSum / numPixels) val green = (greenSum / numPixels) val blue = (blueSum / numPixels) new Color(red, green, blue) }
Problem #1
scala> java.lang.Integer.MAX_VALUE res0: Int = 2147483647 scala> java.lang.Integer.MAX_VALUE+1 res2: Int = -2147483648
If this happens, then we end up trying to construct a color with negative red, green, or blue values; this will result in an IllegalArgumentException.
def calculateAverageColor(image:BufferedImage):Color = { // Declare the sums as longs so we don't have to worry about overflow var redSum:Long = 0 var greenSum:Long = 0 var blueSum:Long = 0 // calculate the sum of each channel val red = (redSum / numPixels) val green = (greenSum / numPixels) val blue = (blueSum / numPixels) new Color(red, green, blue) }
java.lang.IllegalArgumentException: Color parameter outside of expected range: Red, Green, Blue
Problem #2
At this point, you might start debugging the process by printing out the values of red, green, and blue. Sure enough they’ll be in the range [0, 255], just as you need for the Color constructor. What is going on?
There are two related problems. The first is that the type of red, green, and blue are not integers, due to the Long value in the computation. The compiler sees the Long and (correctly) infers that the type of red, green, and blue must be Long.
”
The following 19 specific conversions on primitive types are called the widening primitive conversions:
- byte to short, int, long, float, or double
- short to int, long, float, or double
- char to int, long, float, or double
- int to long, float, or double
- long to float or double
- float to double
”
long redSum = ...; int averageRed = (int) (redSum/numPixels);
val redSum:Long = ... val averageRed:Int = (redSum/numPixels).asInstanceOf[Int]
var redSum:Long = 0 var greenSum:Long = 0 var blueSum:Long = 0 // calculate the sum of each channel val red:Int = (redSum / numPixels).asInstanceOf[Int] val green:Int = (greenSum / numPixels).asInstanceOf[Int] val blue:Int = (blueSum / numPixels).asInstanceOf[Int] new Color(red, green, blue)
Conclusion
One of the nice things about Scala is that you do not need to explicitly declare the types of your variables. In one sequence of unfortunate events, the variables that looked like ints were in fact longs, leading to an implicit conversion to the float primitive type, which in turn caused the incorrect constructor to be invoked, and an IllegalArgumentException. Hopefully you can avoid doing something so foolish as a result of reading this post.
0to255.com – find lighter/darker shades of colors
Color choosers are a dime a dozen online, but 0to255.com is a very nice one. Its stated purpose is to allow you to specify a color and then find shades that are darker and lighter than that color. It’s very well designed, aesthetically pleasing, and has the good sense to allow you to copy the hex value of the color with a single click.
I use it on a semi-regular basis to design Java Swing UIs; just a quick tip for the Java folks out there – when you have the hex code copied, you need to preface the hex string with 0x for the Color constructor to work correctly. In other words, if you are have the hex string #facade, you would create a Java color object with the command new Color(0xfacade). The 0x tells the Java compiler to treat the following text as hexadecimal.
ColorBrewer
Just a quick post about a great online tool I was shown (thanks Eric) called Colorbrewer.
There are numerous books and articles online about color palette design, usually from a web-design / aesthetic standpoint. But there is more to the use of color than mere aesthetics; color can be used as an effective tool in scientific data visualization. One of the few books on the topic describes its contents as a guide to “how scientists and engineers can use color to help gain insight into their data sets through true color, false color, and pseudocolor imaging.” While the content of the book is a bit beyond the scope of this post, it’s clear that color gets a lot of use in charts and graphs, and being able to better pick colors is beneficial.
ColorBrewer is designed to help users pick a set of colors that is best used to show data on a map. Unlike most color scheme choosers where you pick whether you want muted colors, bright colors, pastel colors, etc., ColorBrewer starts by asking whether the data you are visualizing is sequential, diverging, or qualitative. Sequential and diverging both have to do with quantitative data, e.g. average salaries. Sequential is the more familiar for data; darker colors usually indicate a higher value on whatever metric is measured. Diverging, on the other hand, treats the average as a neutral color and then uses colors with sharply contrasting hues for the high and low ends. Qualitative could also be labeled as ‘categorical’; it means about the same thing.
Among its other features, ColorBrewer can exclude colors that would not come out well when photocopied, as well as those that would be confused by people with color blindness. It also has mechanisms for exporting the RGB/Hex color codes of the generated color palettes for use in other applications.