My two new blogs – Logic Fault and Mobile Last

September 10, 2014 Leave a comment

I’ve started two new blogs to highlight two particular issues that irk me.

The first is Logic Fault, which deals with the sometimes amusing failures of computer algorithms, with an emphasis on recommendation algorithms (think Netflix movie suggestions). You can find that at logicfault.tumblr.com

The second is Mobile Last, which highlights problems I encounter while living a primarily mobile computing lifestyle (in my personal life, I rarely use a laptop or desktop, preferring to use my phone or tablet). That one is at mobilelast.tumblr.com

http://mobilelast.tumblr.com/post/96619628006/this-is-the-impetus-for-starting-this-blog-in-the

 

I want to keep this blog focused on longer form content and analysis, which is why I opted for these two tumblr sites for the more image heavy content. If there is a particular point that is relevant to this blog, particularly as it relates to user interfaces, I very well might post it to both, with the long text appearing here.

Lying with bar charts

September 8, 2014 Leave a comment

I found this image in the newspaper a few months ago

The main problem with this visualization is that doubling the value doubles the height AND width of the bone, making it look like a four-fold increase.

Moral of the story? Use fixed width bars and only vary the height. Sorry USA Today et al who create these garbage charts.

Categories: data Tags: , , ,

Link: IKEA’s use of 3D rendering in its catalogs

September 2, 2014 Leave a comment

This blew my mind – almost all of the imagery in IKEA catalogs is computer generated.

The main rationale for switching from traditional photography to 3D rendering:

The IKEA team didn’t feel there was anything wrong with traditional photography, quality-wise. Like any company, they just wanted to make things easier for the team to work on – to make the process simpler, cheaper and faster. With traditional photography, you need to have prototype furniture being built in different parts of the world shipped over so it can be photographed. Everything needs to be there on time and it can be logistically difficult, expensive and not that environmental. Then if there are changes everything needs to be re-shot. With CG re-creations of pieces, it removes a lot of this difficulty. However to start with, Martin says, “There was no vision initially to create entire rooms in CG, like we do now. We just wanted to create the individual pieces – the ones you see on white backgrounds on the web.”

There are some great images in the article showing how the same kitchen is rendered for different countries. You’ll notice that the faucet switches sides, the oven handle changes, and in one of the renders the refrigerator is removed completely.

The article also describes the technology stack that they use to render all of the images.
Thanks to Hacker News for the link.

Categories: link Tags: , , ,

Juking the stats – WordPress and social proof

August 28, 2014 Leave a comment
Sign that shows addition of established date to elevation to population

“Unnecessary Math” – via slaya771 on reddit http://www.reddit.com/r/funny/comments/1d3zs9/unnecessary_math/

Everyone with a basic science education knows that you cannot add quantities whose units do not match; you cannot add population to elevation, for instance, as the picture shows.

This does not stop companies from doing something that’s arguably worse, as it’s harder to detect and call them on their BS.

Take WordPress.com. I use them as my blogging platform and I’m overall happy with them. WordPress allows you to customize your blog by inserting widgets. I have the “Follow Blog: Email Subscription” widget installed. Here is what it looks like to readers:

Email Subscription. Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 220 other followers

Here is what readers who are not email subscribers saw

This number is a lie. In my stats page I can see the truth – there are really only 41 email subscribers. The rest are following me on Twitter. When I post on WordPress, it automatically sends a tweet with a link to the post.

Only 41 email subscribers, NOT 201.

Only 41 email subscribers, NOT 220.

WordPress adds my Twitter follower count to my email subscriber count, and then implies that all of them are following my blog via email. Read the wording again. “Join 220 other followers”, right above a text box for email address entry.

First, why would WordPress do this?

I see two main possibilities.

One, it’s an honest mistake. The backend system has some field for ‘followers’ which is always computed by summing up all the different follower types, and this field was inadvertently used rather than the email follower count. I tried to contact WordPress about this on Monday, August 25, 2014 but have not yet received a response.

The second possibility is that it’s deliberate. The subscriber count is a form of social proof, which lets readers gauge the quality of the site. My hypothesis is that WordPress has empirical evidence that a higher number of followers displayed in this widget leads to increased follow rate. You could imagine A/B experiments where some visitors see the true count, and the others see the value doubled, and measure the difference. Or conversely, take away the follower count from that text and see if the follow rate drops.

The second question is, why does it matter?

While it’s not as wrong as adding elevation to population, as the image that started this post shows, it’s still wrong. The units are right in the sense that you are adding counts of people to counts of people. But all followers are not created equal. People could follow me on Twitter for any number of reasons, while not caring at all about my blog. Conversely, people who choose to explicitly sign up for email notifications of new posts are showing a drastically different level of intent. To call them both followers and to insert them in a widget that purports to show email subscribers is disingenuous.

Fortunately the widget has an option to disable the follower count altogether, and from now on I am going to do just that.

Categories: data Tags: , , , ,

Data Visualization – Size of NFL Football Players Over Time

June 5, 2014 Leave a comment
1920 height/weight of NFL players

Screenshot from http://noahveltman.com/nflplayers/ – 1920

2014 height/weight of NFL players

Screenshot from http://noahveltman.com/nflplayers/ – 2014

I love Noah Veltman’s visualization of the changing height and weight distribution of professional football players. It uses animation to convey the incredible increase in size of the typical football player, and it does so with a minimal amount of chart junk. Let’s look at two aspects that make this effective.

It uses the appropriate visualization

There are 4 variables plotted on the graph – height, weight, density, and time. Two of the variables are encoded in the axes of the chart. The time dimension is controlled by the slider (or by hitting the play button). The density is represented by the color on the chart.

You could present this data as a table of data but it would be much harder to understand the pattern that the animation conveys in a very simple manner – not only are players getting bigger in both terms of height and weight, but the variance is increasing as well.

It makes good use of color

It uses color appropriately, by varying the saturation rather than the hue. I’ve blogged about this topic before when discussing the Wind Map. To repeat my favorite quote about this, Stephen Few states in his PDF “Practical Rules for Using Color in Charts”:

When using color to encode a sequential range of quantitative values, stick with a single hue (or a small set of closely related hues) and vary intensity from pale colors for low values to increasingly darker and brighter colors for high values

Extensions

I could imagine extending this visualization in a few ways:

  • Allow users to view the players that match a given height/weight combination (who exactly are the outliers?)
  • Allow restricting the data to a given position (see how quarterbacks’ height/weight are distributed vs those of the offensive line)
  • Compare against some other normalized metrics, such as rate of injury. Is there a correlation?

This is a great data visualization because it tells a story and it spurs the imagination towards additional areas of analysis and research.

Reblog: “Top 10 Mistakes that Python Programmers Make”

May 11, 2014 Leave a comment

Martin Chikilian from Toptal rounds up some common mistakes that Python programmers make.

I have made mistake #1 on multiple occasions:

Common Mistake #1: Misusing expressions as defaults for function arguments
Python allows you to specify that a function argument is optional by providing a default value for it. While this is a great feature of the language, it can lead to some confusion when the default value is mutable. For example, consider this Python function definition:

>>> def foo(bar=[]):        # bar is optional and defaults to [] if not specified
...    bar.append("baz")    # but this line could be problematic, as we'll see...
...    return bar

A common mistake is to think that the optional argument will be set to the specified default expression each time the function is called without supplying a value for the optional argument. In the above code, for example, one might expect that calling foo() repeatedly (i.e., without specifying a bar argument) would always return ‘baz’, since the assumption would be that each time foo() is called (without a bar argument specified) bar is set to [] (i.e., a new empty list).

I don’t remember for sure, but I’ve probably done something like #5, modifying a list while iterating through it.

If you write Python code, the rest of the article is worth a read

Categories: programming, Python Tags: ,