Archive

Author Archive

Spot the real Java class name

April 22, 2014 Leave a comment

This is hilarious (if you’re a programmer). Some folks trained a Markov chain on the class names in Spring and then made a game out of it – three Java class names are presented, only one is real. I didn’t do so hot – 4/11.

Screen Shot of the web app

Via Jeff Dean on Google+.

YAGNI and the scourge of speculative design

April 14, 2014 Leave a comment

Woman with crystal ball
Robert Anning Bell [Public domain], via Wikimedia Commons

I’ve been programming professionally for five years. One of the things that I’ve learned is YAGNI, or “You aren’t gonna need it”.

It’s taken me a long time to learn the importance of this principle. When I was a senior in college, I had a course that involved programming the artificial intelligence (AI) of a real-time strategy game. For our final project, our team’s AI would be plugged in to fight against another team’s. I got hung up on implementing a complicated binary protocol for the robots on our team to communicate efficiently and effectively, and our team ended up doing terribly. I was mortified. No other team spent much time or effort on their communication protocol, and only after getting everything else up and running.

In this essay I’ll primarily be talking about producing code that’s not necessary now, but might be in the future. I call this “speculative design” and it’s what the YAGNI philosphy prevents.

First, let’s discuss how and why this speculative design happens. Then we’ll discuss the problems with giving into the temptation.

Why does it happen

I can only speak to my own experience. The times I’ve fallen into this trap can be classified into a few categories:

  • It’s fun to build new features
  • It feels proactive to anticipate needs
  • Bad prioritization

Building features is fun

Programming is a creative outlet. It’s incredibly satisfying to have an idea, build it in code, and then see it in use. It’s more fun than other parts of development, like testing, refactoring, fixing bugs, and cleaning up dead code. These other tasks are incredibly important, but they’re ‘grungy’ and often go unrewarded. Implementing features is not only more fun, it get you more visibility and recognition.

Proactive to anticipate needs

A second reason one might engage in speculative design is to be proactive and anticipate the needs of the customer. If our requirements say that we must support XML export, it’s likely that we’ll end up having to support JSON in the future. We might as well get a head start on that feature so when it’s asked for we can delight the customer by delivering it in less time.

Bad prioritization

This is the case with the story I started this piece with. I overestimated the importance of inter-robot communications and overengineered it to a point where it hurt every other feature.

In this case, the feature was arguably necessary and should have been worked on, but not to the extent and not in the order that I did. In this case I should have used a strategy of satisficing and implemented the bare minimum after all of the more important things were done.

Why is it problematic

I’ve described a few reasons speculative code exists. You’ve already seen one example of why it’s problematic. I’ll detail some other reasons.

More time

Let’s start simple. Time spent building out functionality that may be necessary in the future is time not spent on making things better today. As I mentioned at the start of this post, I ended up wasting hours and hours on something that ended up being completely irrelevant to the performance of teams in the competition, at the expense of things that mattered a lot more, like pathfinding.

Less focus

Since there is more being developed, it’s likely that the overall software product is less focused. Your time and attention are being divided among more modules, including the speculatively designed ones.

More code

Software complexity is often measured in lines of code; it’s not uncommon for large software projects to number in the millions. Windows XP, for instance, had about 45 million lines.

Edsger Dijkstra, one of the most influential computer scientists, has a particularly good quote about lines of code:

My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent”: the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

I once equated lines of code produced to productivity, but nothing could be further from the truth. I now consider it a very good week if I decrease the lines of code in the system, by deleting chunks of code or rewriting them to be simpler and shorter.

The extra code and complexity associated with speculative coding is very expensive.

  • It slows down readers of the code
  • It slows down building the software (especially if it pulls in more dependencies)
  • It adds additional tests that slow down the test suite
  • It is likely to add more bugs (more code generally equals more bugs)

Sets unrealistic expectations

Say that you design a feature because you think that the customer is going to want. Imagine that you actually got it right – what they end up asking for is essentially identical to what you’ve implemented. You deliver it to the customer a full week before you promised it.

You might look like a hero, but this sets a very bad precedent. It sets unrealistic expectations as to how much work the feature took to implement, and might lead to the customer setting impossible deadlines for features of similar scope. If you were able to finish that feature early, they might reason, there’s no reason you shouldn’t produce the next feature just as quickly.

You’re probably a bad judge of what will be needed in the future

It’s hard enough to build software from detailed specifications and requirements. Guessing about what the specifications and requirements of a feature that isn’t needed yet is likely to end up with a product that doesn’t make anyone happy. It will likely match the designers’ mental model but not the users, since there was inadequate input from them.

It’s hard to remove features once they exist

Say that you’re designing the export feature of your software. You imagine there will be a whole lot of formats you want to support, but at the moment the only hard and fast requirement is CSV (comma separated value) format. As you’re writing the CSV export code, you see how it would be trivial to implement JSON encoding. And while you’re at it, you throw in XML. You were required to produce CSV but now you have JSON and XML support too. Great!

Well, maybe. Maybe not. A year down the line you notice that only a small percentage of your users export to XML, but the feature has led to a disproportionate number of support tickets. Now you’re in a tough place – if you kill the feature, you’ll irritate these power users. Furthermore, you will have effectively wasted all of the time in implementing the feature in the first place, and all the subsequent patches.

I have seen little-used features remain in production because they’re too much trouble to delete and alienate the few users of said feature. Which leads to…

Increased risk of dead code

Imagine that you’ve implemented a new feature but it’s not ready for prime time yet. Or maybe you used it once or twice but it’s not worth turning on for your normal service. You don’t want to kill the feature entirely, as it might have some utility down the line. (Warning bells should be going off about now) You decide to hide the feature behind a configuration flag that defaults to off. Great! The feature can easily be reenabled should you ever need it again.

There’s just one problem – it gets turned on accidentally interacts catastrophically with the rest of the system. Your software deals with financial transactions and it ends up costing your company 460 million dollars.

This sounds unlikely – except it’s true. This is essentially what happened to Knight Capital in 2012.

From the Security and Exchange Commission report of the incident:

Knight also violated the requirements of Rule 15c3-5(b) because Knight did
not have technology governance controls and supervisory procedures
sufficient to ensure the orderly deployment of new code or to prevent the
activation of code no longer intended for use in Knight’s current operations
but left on its servers that were accessing the market; and Knight did not
have controls and supervisory procedures reasonably designed to guide
employees’ responses to significant technological and compliance
incidents;

This is one of the most visible failures caused by dead or oxbow code. I am not suggesting that Knight Capital speculatively developed the feature that malfunctioned. What I am saying is that

  • It’s dangerous to leave dead code around in a system
  • Speculative development is likely to lead to features that are not used often and are more likely to be dead code than if they were completely spec’ed out as in normal development
  • Therefore speculative development puts you at a greater risk of dead code problems

Don’t allow dead code stay in the codebase. If you should ever need it again, you should be able to retrieve it from the version control system. You almost certainly won’t.

Conclusion

As an engineer, it’s easy to fall into the trap of implementing features before they’re actually needed. You’ll look productive and proactive. In the end, it’s best to avoid this temptation, for all of the problems I’ve mentioned. These include

  • the extra code takes time to write, test, debug, and code review
  • it contributes to a lack of conceptual focus in the system
  • if done to please a customer, it sets unrealistic expectations for the development of other features
  • it imparts an extra maintenance cost for the rest of the lifetime of said feature
  • it will be difficult to remove the feature if and when its lack of use becomes apparent
  • it puts you at increased risk of leaving dead code in the system, code which may later be accessed with bad consequences

I love Dijkstra’s notion of ‘lines spent’. Do you want to spend your time and lines of code on a speculative feature? Just remember – you aren’t gonna need it.

Music hack roundup

March 24, 2014 Leave a comment

Disclaimer: the opinions expressed are my own and do not represent that of my employer, Google

I love music. And programming. I really like pieces of technology that either create new or modify existing pieces of music. Here I detail some of my favorite projects I’ve found in the past few years.

Songsmith

Songsmith is a project from Microsoft Research. From the project description page:

Songsmith generates musical accompaniment to match a singer’s voice. Just choose a musical style, sing into your PC’s microphone, and Songsmith will create backing music for you. Then share your songs with your friends and family, post your songs online, or create your own music videos.

There was a commercial that detailed how the project worked, but what I found really great was what the community did with it. They fed the vocals of famous songs through the software to see what sort of music came out. The results are, ahem, mixed.

First, one that I think sounds pretty interesting – a swing version of Katy Perry’s I Kissed a Girl

Then going into the realm of hilarious:

We Will Rock You – Queen Vs Songsmith

Mortorhead’s Ace of Spades

Nirvana’s In Bloom

Enter Sandman

Roxanne

The Beatle’s A Day in the Life

The Swinger

According to musicmachinery.com’s writeup,

The Swinger is a bit of python code that takes any song and makes it swing.

If you’re not into music theory (or old music) you might not know what constitutes swing. The following video (only need to watch the first 30 seconds or so) is a great example of the difference of straight vs swing styles:

As you can hear, it sounds very different. The first half of each beat is stretched out, and the second half is shrunk down. It sounds even more different when you start listening to familiar songs converted from straight to swing, or vice versa. While most of the links have died, Daft Punk’s Around The World still plays, as does Jefferson Airplane’s White Rabbit.

The source code is available at https://github.com/echonest/remix/blob/master/examples/swinger/swinger.py.

Autocanonizer

From musicmachinery.com’s writeup of The Autocanonizer:

It takes any song and tries to make a canon out of it. A canon is a song that can be played against a copy of itself.

Wikipedia has a bit more information on what exactly a Canon is:

In music, a canon is a contrapuntal compositional technique that employs a melody with one or more imitations of the melody played after a given duration (e.g., quarter rest, one measure, etc.). The initial melody is called the leader (or dux), while the imitative melody, which is played in a different voice, is called the follower (or comes). The follower must imitate the leader, either as an exact replication of its rhythms and intervals or some transformation thereof (see “Types of canon”, below). Repeating canons in which all voices are musically identical are called rounds – “Row, Row, Row Your Boat” and “Frère Jacques” being widely known examples. An example of a classical strict canon is the Minuet of Haydn’s String Quartet in D Minor, Op. 76, No. 2 (White 1976, 66).

With that in mind, here are some example.

My favorite is Adele’s Someone Like You. This one sounds close to a round.

Screenshot from the Autocanonizer

Screenshot from the Autocanonizer

  • Over The Rainbow – starts rough but 30 seconds in it sounds good
  • The Fox – I like it. Lots of self harmonizing. The doubled up chorus actually works. It gets out of sync with itself at some points
  • Take Five – demonstrates that the technique works with odd meter too. Not perfectly lined up at some points

See all the examples available at http://static.echonest.com/autocanonizer/loader.html
Source code available at: https://github.com/plamere/autocanonizer

Hat tip to Greg Linden whose Google+ post alerted me to this, and reminded me of these other projects I’d seen before.

MajorVsMinor

MajorVsMinor is a slight departure from the others I’ve listed because there is a human in the loop – it’s not all algorithmic. From Oleg Berg’s description from olegberg.com

Hello! I am Oleg Berg, a musician from Donetsk, Ukraine. I digitally re-edit famous compositions altering harmonic scale, and I call this experimental music project «Major versus Minor». It may sound surprising and unusual, but it is always interesting. Listen to the music videos below. And please donate to keep the project going
[...]
I by no means intend to enhance the famous music hits as I rework them; they are perfect already. I simply imagine what would it sound like, had the author written it in another mood. And it appears, I succeed in my imaginations.

Again, if you’re not a music nerd you might not know what the difference between major key and minor is. In general picture minor = sad, major = happy. You’ll instantly hear the difference in these versions.

First, a must if you’re an Arrested Development fan.

“Final Countdown in Major key”

My favorite comment from MYxxLxxCHIBI1:

I was literally coming down here to say that myself. GOB finally got accepted to the Alliance of Magicians

Maybe my favorite one -
“Be Worry, Don’t Happy”: Minor Key

I like this one too.
Jingle Bells

“Hey Jude” in minor key

See the whole channel at https://www.youtube.com/user/MajorVsMinor

Since this isn’t a software project per se, there is no link to the source code. According to Asshat8182′s comment on Smells Like Teen Spirit in Major key (with a name like that, he must be a reliable source of information), the way it’s accomplished is

The ‘somehow’ is called Celemony Melodyne. Specifically the DNA function

According to Wikipedia:

Celemony Software GmbH is a German musical software company that specializes in digital audio pitch correction software. It produces Melodyne, an industry standard audio pitch modification tool similar to Auto-Tune

Conclusion

I hope you’ve found this short roundup of music hacks interesting. There are some very creative people out there. If you find what they’re doing interesting, please let them know and/or donate so they’ll keep making great stuff.

Go gotcha #1: variable shadowing within inner scope due to use of := operator

March 3, 2014 Leave a comment

Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.

Last week I described how the range keyword in conjunction with taking the address of the iterator variable will lead to the wrong result. This week I’ll discuss how it’s possible to accidentally shadow one variable with another, leading to hard to find bugs.

Let’s take the same basic setup as last week; we have a Solution struct, and we’re searching for the best (lowest cost) one in a slice of potential candidates.

package main

import "fmt"

type Solution struct {
    Name     string
    Cost     int
    Complete bool
}

func FindBestSolution(solutions []*Solution) *Solution {
    var best *Solution
    for _, solution := range solutions {
        if solution.Complete {
            if best == nil || solution.Cost < best.Cost {
                best := solution
                fmt.Printf("new best: %v\n", *best)
            }
        }
    }
    return best
}

func main() {
    solutions := []*Solution{
        &Solution{
            Name:     "foo",
            Cost:     10,
            Complete: true,
        },
    }
    fmt.Printf("Best solution is: %v", FindBestSolution(solutions))
}

Output:

new best: {foo 10 true}
Best solution is: <nil>
Program exited.

Go playground

What’s going on? We see that we have a good candidate solution from the debugging information. Why does the function return the wrong value?

The bug is in this line:

best := solution

The problem is that we’re declaring and initializing a new variable (with the := operator) rather than assigning to the existing best variable in the outer scope. The corrected line is

best = solution

Use = to change the value of an existing variable, use := to create a new variable.

If I had not referenced the new variable with the debug print statement, this code would not have compiled:

if best == nil || solution.Cost < best.Cost {
    best := solution
}
prog.go:16: best declared and not used
 [process exited with non-zero status]

Go playground

Why is this shadowing of variables in other scopes allowed at all?

There is a long thread on the subject on Go-nuts, debating this subject.

Arguments For

Nate Finch:

type M struct{}

func (m M) Max() int {
    return 5
}

func foo() {
    math := M{}
    fmt.Println(math.Max())
}

If shadowing didn’t work, importing math would suddenly break this program.


My point was about adding an import after writing a lot of code (when
adding features or whatever), and that without shadowing, merely importing
a package now has the potential to break existing code….

The current shadowing rules insulate code inside functions from what
happens at the top level of the file, so that adding imports to the file
will never break existing code (now waiting for someone to prove me wrong
on this ;)

Rui Maciel:

There is a simpler and better solution: use a short variable declaration
when you actually want to declare a variable, and use an assignment
operator when all you want to do is assign a value to a variable which
you’ve previously declared. This doesn’t require any change to either
the language or the compiler, particularly one which is that cryptic.

Arguments Against

Johann Höchtl:

See it this way. I can carry a gun in my hand aiming towards a target. I
pull the trigger and hit the target. Everything happens exactly the whay
it is expected to happen.

Suddenly an inner block jumps in … the instructor. Me, a gun in my
hand, the instructor in between and on the other side the target. I pull
the trigger.

Still … everything happens exactly the way it is told to behave. Which
still makes the end results not a desirable result. Adding an “inner
block”, which by itself is behaving in a fully specified way,
influences the whole.

Somewhat odd I admit, but you may get what I mean?

Conclusion

I don’t think that the shadowing should be an error but I do think there should be a warning. The go vet tool already helps find common mistakes, such as forgetting to include arguments to printf. For instance:

example.go:

package main

import "fmt"

func main() {
    fmt.Printf("%v %v", 5)
}

Run:

go vet example.go

example.go:6: missing argument for Printf verb %v: need 2, have 1

If the vet tool were modified to add this warning, it would occasionally yield false positives for those cases where the shadowing is done intentionally. I think this is a worthwhile tradeoff. Virtually all Go programmers I’ve talked with have made this mistake, and I would be willing to bet that these cases are far more frequent than intentional shadowing.

Go gotcha #0: Why taking the address of an iterated variable is wrong

February 25, 2014 6 comments

Golang mascot
Go mascot – by Renée French under Creative Commons Attribution-Share Alike 3.0 Unported from http://en.wikipedia.org/wiki/File:Golang.png

Disclaimer: Go is open source and developed by many Google employees. I work for Google, but the opinions expressed here are my own and do not necessarily represent that of Google.

Go is my new favorite programming language. It’s compact, garbage collected, terse, and very easy to read. There are some things that trip me up even now after I’ve been using it for awhile. Today I’m going to discuss the range construct and how it has a surprising feature that might violate your assumptions.

Range

First, the range keyword is a way to iterate through the various builtin data structures in Go. For instance,

a := map[string]int {
    "hello": 1,
    "world": 2,
}
// 2 element range gets key and value
for key, value := range a {
    fmt.Printf("key %s value %d\n", key, value)
}
// 1 element is just the key
for key := range a {
    fmt.Printf("key %s\n", key)
}

// Works for slices (think of them as vectors/lists) too
b := []string {"hello", "world"}
// 2 element range gets the index as well as the entry
for i, s := range b {
    fmt.Printf("entry %d: %s\n", i, s)
}
// 1 element gets just the index (notice the pattern?)
for i := range b {
    fmt.Printf("entry %d\n", i)
}

This outputs

key hello value 1
key world value 2
key hello
key world
entry 0: hello
entry 1: world
entry 0
entry 1

Try this code in the Go Playground

Solution search – pointers

Imagine the case where we have a struct as follows

type Solution struct {
    Name string
    Cost int
    Complete bool
}

Say that we’re doing some sort of optimization where we’re looking for the minimum cost solution that meets some criteria; for simplicity’s sake, I’ve put that as the ‘complete’ bool. It’s possible that no such Solution matches, in which case we return a nil solution.

A reasonable implementation would be as follows

func FindBestSolution(solutions []Solution) *Solution {
    var best *Solution
    for _, solution := range solutions {
        if solution.Complete {
            if best == nil || solution.Cost < best.Cost {
                best = &solution
            }
        }
    }
    return best
}

Do you see the bug? Don’t worry if you don’t – I’ve made this mistake a few times now.

Let’s add some tests to find the problem. This is an example of a table driven test, where the test cases are given as a slice of struct literals. This makes it very easy to add new test cases.

func TestFindBestSolution(t *testing.T) {
    tests := []struct {
        name      string
        solutions []Solution
        want      *Solution
    }{
        {
            name:      "Nil list",
            solutions: nil,
            want:      nil,
        },
        {
            name: "No complete solution",
            solutions: []Solution{
                {
                    Name:     "Foo",
                    Cost:     25,
                    Complete: false,
                },
            },
            want: nil,
        },
        {
            name: "Sole solution",
            solutions: []Solution{
                {
                    Name:     "Bar",
                    Cost:     12,
                    Complete: true,
                },
            },
            want: &Solution{
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
        {
            name: "Multiple complete solution",
            solutions: []Solution{
                {
                    Name:     "Foo",
                    Cost:     25,
                    Complete: false,
                },
                {
                    Name:     "Bar",
                    Cost:     12,
                    Complete: true,
                },
                {
                    Name:     "Baz",
                    Cost:     25,
                    Complete: true,
                },
            },
            want: &Solution{
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
    }
    for _, test := range tests {
        got := FindBestSolution(test.solutions)
        if got == nil && test.want != nil {
            t.Errorf("FindBestSolution(%q): got nil wanted %v", test.name, *test.want)
        } else if got != nil && test.want == nil {
            t.Errorf("FindBestSolution(%q): got %v wanted nil", test.name, *got)
        } else if got == nil && test.want == nil {
            // This is OK
        } else if *got != *test.want {
            t.Errorf("FindBestSolution(%q): got %v wanted %v", test.name, *got, *test.want)
        }
    }
}

If you run the tests you’ll find that the last test fails:

--- FAIL: TestFindBestSolution (0.00 seconds)
    prog.go:82: FindBestSolution("One complete solution"): got {Baz 25 true} wanted {Bar 12 true}
FAIL
 [process exited with non-zero status]

This is strange – it works fine in the single element case, but not with multiple values. Let’s try adding a case where the correct value is last in the list.

    {
        name: "Multiple - correct solution is last",
        solutions: []Solution{
            {
                Name:     "Baz",
                Cost:     25,
                Complete: true,
            },
            {
                Name:     "Bar",
                Cost:     12,
                Complete: true,
            },
        },
        want: &Solution{
            Name:     "Bar",
            Cost:     12,
            Complete: true,
        },
    },

Sure enough, this test passes. So somehow if the element is last the algorithm works. What’s going on?

From the go-wiki entry on Range:

When iterating over a slice or map of values, one might try this:

items := make([]map[int]int, 10)
for _, item := range items {
        item = make(map[int]int, 1) // Oops! item is only a copy of the slice element.
        item[1] = 2                 // This 'item' will be lost on the next iteration.
}

The make and assignment look like they might work, but the value property of range (stored here as item) is a copy of the value from items, not a pointer to the value in items.

This is exactly what’s happening in this case. The solution variable is getting a copy of each entry, not the entry itself. Thus when you take the address of the entry, you end up with a pointer pointing at the LAST element in the slice (since the iteration stops at that point). To illustrate:

package main

import "fmt"

func main() {
    strings := []string{"some","value"}
    for i, s := range strings {
        fmt.Printf("Element %d: %s Pointer %v\n", i, s, &s)
    }
}

Element 0: some Pointer 0x10500168
Element 1: value Pointer 0x10500168

Note that the same pointer is used in both cases. This explains why the Solution pointer ended up pointing at the last element of the slice.
Playground

So how do we work around this problem? The key is to introduce a new variable whose address it’s safe to take; its contents won’t change out from underneath you.

Broken:

if solution.Complete {
    if best == nil || solution.Cost < best.Cost {
        best = &solution
    }
}

Fixed:

if solution.Complete {
    if best == nil || solution.Cost < best.Cost {
        tmp := solution
        best = &tmp
    }
}

With this patch the tests pass:

PASS

Program exited.

Alternative design

A great feature of Go is that you can return multiple values from a single function. Here’s an alternative implementation that doesn’t suffer from the previous problem.

func FindBestSolution(solutions []Solution) (Solution, bool) {
    var best Solution
    found := false
    for _, solution := range solutions {
        if solution.Complete {
            if !found || solution.Cost < best.Cost {
                best = solution
                found = true
            }
        }
    }
    return best, found
}

Since best is copying the VALUE of the solution variable, this works correctly. You can play with this example and see how the tests change in the Playground.

This illustrates one other nice feature of Go – all types have a ‘zero’ value that is legal to use. For strings this is the empty string, for pointers it’s nil, for ints it’s 0, for structs all of types are set to zero values. The line var best Solution implicitly sets best to be the zero solution. If I wanted to I could get rid of the found bool altogether and just compare the returned solution with another zero valued Solution.

Conclusion

I introduced some basic features of Go, including maps, slices, range, structs, and functions. I provided links to the amazingly useful Go playground which lets you easily test out code, format it, and share it with others.

I showed two implementations of a function that searches through a slice of struct values, searching for a solution that meets some criteria.

The first example using pointers led to a subtle bug that’s hard to find and solve unless you know how range works. I showed how to write unit tests that exercise the function and helped flush out the bug. I also explained what the bug was and how to work around it.

Finally I showed a version of the same function that uses Go’s multiple return types to return a found boolean rather than using a nil pointer to signify that the value wasn’t found.

The Pomodoro technique

February 17, 2014 1 comment

Pomodoro (tomato) kitchen timer

Erato, via Wikipedia licensed under Creative Commons Attribution-Share Alike 2.5 Generic

The Pomodoro technique is a productivity tool based on two premises:

  1. Multitasking is inherently inefficient
  2. One cannot maintain consistent performance at tasks for prolonged periods of time

You work for 25 minutes straight on one task; this is one pomodoro. If you are interrupted, you deal with the interruption and start over – there are no partial pomodoros, so you have to discard the one in progress.

After a 25 minute work period, you take a 5 minute break. Really – you are not allowed to keep working. Check email, look over your task list, whatever. I find it’s a good time to get up and walk around and shift my visual attention so I’m not constantly staring at a screen.

After 4 pomodoros, you are permitted a longer (15-30 minute) break.

For an added layer of complexity you can estimate how many pomodoros each task will take, and then compare it to how many it actually takes. This is a good way to train yourself to more accurately estimate task complexity.

I like this technique for a few reasons.

  • It forces me to take breaks, walk around, stretch and otherwise avoid melting into my chair for 8 hours at a time.
  • It forces me to break down complex tasks into small, manageable chunks. If I can’t complete a task in a few pomodoros, it’s probably too big.
  • It lets me track my productivity over time. It’s easy to say that I was constantly interrupted on Monday, but it’s easier to quantify if I can show that I only got 6 pomodoros done instead of my normal 10-12.

My problem with this technique is that it takes too long to get back into the coding mindset after a break. Some estimate it takes between 10-15 minutes to resume coding after an interruption; if you are interrupted by a break every 25 minutes, you’re not going to get much accomplished on a complicated piece of code. I sometimes find the break comes at an inopportune time, when I’m just on the verge of finishing something. I usually have to quickly dump some thoughts into the file I’m editing as to what I was doing and what my next step was.

Do you use the pomodoro technique while programming? If so, do you find the recommended 25/5 breakdown sufficient for getting work done? Do you increase it, decrease it?

Chrome’s sound indicator – a welcomed innovation for lessening embarrasment

February 13, 2014 1 comment

Which tab is playing music? It's easy to tell now

I sat at my desk early in the morning while the office was mostly empty. A coworker arrived, said hello and sat down to work. She put on her headphones and started blasting music. The only problem? Her headphones were unplugged and the music was playing for the whole office to hear.

Accidental noise like this is very embarrassing. I welcome all innovations that make this less likely to occur. I was blown away when I used an iPod for the first time and unplugged the headphones. Unlike the CD players I had used before, the music paused. With the iPod it didn’t make sense to continue to play music without the headphones plugged in, since there was no internal speaker.

I am happy that both iPhone and Android adopted this convention as well. You’re much less likely to embarrass yourself if the sound stops as soon as headphones are removed rather than automatically switching over to the external speakers. I wish that laptops did the same, but alas.

I live most of my work day in a browser. I often end up with a whole slew of tabs. Suddenly I hear some sort of noise – one of the dozen pages I have open is blaring an ad. I usually have to search each tab one by one to try to find the culprit. Chrome recently launched a new feature to visually distinguish which tab is making noise. This is a brilliant innovation, and one I welcome in my quest not to make a fool out of myself in front of others.

Disclaimer: I work at Google. The opinions expressed are my own and do not represent that of my employer.

Categories: iPod, UI Tags: , , , , , ,

“Everyone should be able to pull and analyze data”

February 11, 2014 Leave a comment

“Data overload” Islam Elsedoudi via flickr- cc http://creativecommons.org/licenses/by-sa/2.0/

Everyone should be able to write spaghetti code, and everyone should be able to pull and analyze data. And I’m not just talking about business-folk here.


Look at what’s going on in the digital humanities. Now, even literature, history, and religious scholars can use data to shed new insight on old texts. How awesome is that? But you have to be able to actually analyze the data. That means being able to query and scrub; that means knowing a bit of probability and statistics. The difference between a median and mean would be a start.

So yes, it’s no longer acceptable to say, “I suck at math!” and then ignore that part of the world.

I suck at physical exercise, but that doesn’t mean it’s OK for me to melt into a chair all day. We all need to work at the important stuff in life, and understanding data has become terribly important.

I agree with the overall sentiment of the quote, that more people should be able to do basic data scraping and analysis. Unfortunately, I don’t see it happening anytime soon for two reasons – the tools to analyze data are complicated to non-engineers and most people do not receive training in programming (to script and pull the data in the first place) or statistics (to crunch the data and draw valid insights).

Even if everyone had the skills and tools necessary to pull and analyze the data, there would still be a need for skilled analysts / data scientists. Executives and product managers often don’t have the time to do analysis themselves; it’s not efficient for them to do so. Analysts fulfill an important role by distilling raw data into products and insights.

How to download your WordPress.com stats in CSV, JSON, or XML format

January 30, 2014 1 comment

I wanted raw data about the popularity of my various posts on this blog to better determine what sort of topics I should post about. WordPress.com provides some nice aggregate stats, but I wanted more. After stumbling around the Internet for awhile, I cobbled together a way to download my blog data in either CSV, XML, or JSON format.

There are three steps:

  1. Get an API key
  2. Get your blog URL
  3. Construct the URL to download the data

Get an API key

Akismet is WordPress.com’s anti-spam solution. Register for an Akismet API key at http://akismet.com/wordpress/ by clicking on “Get an Akismet API key”.

Sign up for an account. If you choose the personal blog option, you can drag the slider all the way to the left and register for free. If you value the service that Akismet provides, you can pay more. When you complete the signup flow, you will be provided with a 12 digit ID. Copy this down.

Get your blog URL

Copy the full URL of your blog, minus the leading https://. For me this is developmentality.wordpress.com.

Construct the URL

There is a limited API for downloading your data at the following URL:

http://stats.wordpress.com/csv.php

View this in a browser to see what the API parameters are.

API as of 2014/01/28

Construct the url

http://stats.wordpress.com/csv.php?api_key=<api_key>&blog_uri=<blog_uri>

View this URL in the browser (or via wget / curl) and you should see the view data.

CSV rows returned

There are multiple data sources. From the documentation:

table String One of views, postviews, referrers, referrers_grouped, searchterms, clicks, videoplays

Here is some sample data from each table. Change the format param from csv to json or xml to get the data in different formats.

views

CSV

"date","views"
"2013-12-31",118

JSON

[{"date":"2010-02-05","views": 46}]

XML

<views>
    <day date="2014-01-01">112</day>
</views>

postviews

CSV

"date","post_id","post_title","post_permalink","views"
"2014-01-28",369876479,"Three ways of creating dictionaries in Python","http://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/",46

JSON

[{"date":"2014-01-29","postviews":[{"post_id":369876479,"post_title":"Three ways of creating dictionaries in Python","permalink":"http:\/\/developmentality.wordpress.com\/2012\/03\/30\/three-ways-of-creating-dictionaries-in-python\/","views":22},{"post_id":369875635,"post_title":"R - Sorting a data frame by the contents of a column","permalink":"http:\/\/developmentality.wordpress.com\/2010\/02\/12\/r-sorting-a-data-frame-by-the-contents-of-a-column\/","views":16}]}]

XML

<postviews>
    <day date="2014-01-30"></day>
    <day date="2014-01-29">
        <post id="369876479" title="Three ways of creating dictionaries in Python" url="http://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/">54</post>
    </day>
</postviews>

referrers

CSV

"date","referrer","views"
"2014-01-28","http://www.google.com/",63

JSON

[{"date":"2014-01-30","referrers":[]},{"date":"2014-01-29","referrers":[{"referrer":"http:\/\/www.google.com\/","views":66},{"referrer":"www.google.com\/search","views":27},{"referrer":"www.google.co.uk","views":10}]}]

XML

<referrers>
    <day date="2014-01-30"></day>
    <day date="2014-01-29">
        <referrer value="http://www.google.com/" count="" limit="100">66</referrer>
    </day>
</referrers>

referrers_grouped

CSV

"date","group","group_name","referrer","views"
"-","Search Engines","Search Engines","http://www.google.com/",1256

JSON

[{"date":"-","referrers_grouped":[{"referrers_grouped":"Search Engines","views":{"http:\/\/www.google.com\/":1305}}]}]

XML

<referrers_grouped>
    <day date="-">
        <group domain="Search Engines" name="Search Engines">
            <referrer value="http://www.google.com/">1305</referrer>
        </group>
    </day>
</referrers_grouped>

Dates aren’t included so it’s the sum over the past N days, defaulting to 30. To change this, set the days URL parameter:

http://stats.wordpress.com/csv.php?api_key=<api_key>&blog_uri=<blog_uri>&table=referrers_grouped&days=<num_days>

searchterms

CSV

"date","searchterm","views"
"2014-01-28","encrypted_search_terms",190

JSON

[{"date":"2014-01-30","searchterms":[]},{"date":"2014-01-29","searchterms":[{"searchterm":"encrypted_search_terms","views":159},{"searchterm":"dynamically load property file in mule","views":2}]}]

XML

<searchterms>
    <day date="2014-01-30"></day>
    <day date="2014-01-29">
        <searchterm value="encrypted_search_terms" count="" limit="100">159</searchterm>
        <searchterm value="dynamically load property file in mule" count="" limit="100">2</searchterm>
    </day>
</searchterms>

clicks

CSV

"date","click","views"
"2014-01-28","http://grab.by/grabs/b608b9c315119ca07a1f7083aabbb9c7.png",3

JSON

[{"date":"2014-01-30","clicks":[]},{"date":"2014-01-29","clicks":[{"click":"http:\/\/www.anddev.org\/extended_checkbox_list__extension_of_checkbox_text_list_tu-t5734.html","views":2},{"click":"http:\/\/android.amberfog.com\/?p=296","views":2}]}]

XML

<clicks>
    <day date="2014-01-30"></day>
    <day date="2014-01-29">
        <click value="http://www.anddev.org/extended_checkbox_list__extension_of_checkbox_text_list_tu-t5734.html" count="" limit="100">2</click>
    </day>
</clicks>

videoplays

I am not sure what this format is as I have no video plays on my blog.

Conclusion

I hope you find this useful. I’ll make another post later showing how to crunch some of this data and extract meaningful information from the raw data.

Why You’ll (Rarely) Catch Me With a Printed Book

January 25, 2014 Leave a comment

E-reader on top of books

kodumut via Flickr under Creative Commons license

I love books. I love book stores. I worked in a library for four years. But when I read Ali Wunderman’s post entitled Why You’ll Never Catch Me With An E-Reader, I was not convinced. In almost all cases I prefer to buy and read digitally. I’ll discuss her argument, my reasons for preferring digital in most cases, and cases where e-readers and digital books are worse than their paper alternatives.

The two main points of Ms. Wunderman’s argument are:

  • She likes the sensations of reading a physical book (touch, smell, sight)
  • She values the serendipity of meeting new friends who love the book that she’s reading. Had she used an e-reader, those people would not have been able to see what she was reading, and thus she would have missed out on such encounters

I can’t argue against the first point – it’s a matter of opinion whether or not holding a book feels good. Since e-readers are such a recent invention, I think this argument is rooted in nostalgia more than anything else. It would be interesting to see whether children who grow up with a choice between e-readers and physical books end up with such a physical attachment to books. I do like the touch and smell of books, but it’s not enough to make me buy paperbacks exclusively.

The second point is also subjective. I’ve never had strangers comment on what I’m reading, but I can imagine it would be a fun experience. I am willing to bet that it’s rare. On the other hand, I have heard that some women are more comfortable reading romance novels on their e-readers than physical copies. I don’t really care one way or another; I don’t read books with the intention of showing others what I’m reading.

I have had the experience of bonding with new friends over the contents of our respective bookshelves. If there were no books for us to look at, we would have missed out on some level of connection. While this makes more sense to me than the serendipity argument, I still don’t think this is a reason to stay slavishly attached to dead trees. Once you have established a relationship, it’s easy to converse about books you’ve read, no matter the medium.

Digital preference

The reasons I prefer digital books are price, convenience, ergonomics and lack of physical clutter.

Price

E-books are often cheaper than physical alternatives. This makes sense – the price of publishing and distribution is virtually zero.

Convenience

E-books are incredibly convenient. Let me count the ways:

  • Instant gratification – purchase, download, and start reading a book in less than a minute. No need to wait for a book to be shipped to you, or to go to a store
  • Instant definitions – no need to break the flow of reading to learn the meaning of a word. Tap and hold on the word to get a quick pop-up definition
  • Read free samples of a book before committing to buying it
  • Free lending library – check out one free book a month to read

Press on a word for a second and get a definition. No need to leave the book. Kindle App on Android

There’s a saying that the best camera is the one you have with you. It’s the same with books. I rarely bring physical books with me, sometimes I have my Kindle Paperwhite, but I always have my phone with me, and that phone has all of my Kindle purchases on it. When I used to use an iPhone it was uncomfortable to read on such a small screen, but I recently switched to a Nexus 5 and have happily consumed entire books on it. The progress I make on one device is instantly synced with all of my other Kindle compatible devices.

Ergonomics

I find it more pleasant to read digitally. I can control how big the font is, I can turn pages one-handed, I can read in the dark with no external illumination, and my devices are light to hold. For example, I bought Cryptonomicon on Kindle to replace a hardcover version partly because I was tired of reading such a bulky book.

Clutter-free

Most importantly, buying digitally frees me from physical clutter. If you’ve ever moved, you know how heavy and unwieldy books are. If you’re traveling, they add weight and bulk to your luggage.

“Rarely”

Unlike Ms. Wunderman, I am not absolute in my preference. I often prefer digital, but I acknowledge that there are some real problems with digital books. These include ownership, longevity, batteries, screens, and the distractions of reading digitally.

Ownership

When you buy digital content, what do you actually own? In 2009, Amazon deleted unauthorized versions of Animal Farm and 1984 straight off of owners’ Kindles. As the article says,

Digital books bought for the Kindle are sent to it over a wireless network. Amazon can also use that network to synchronize electronic books between devices — and apparently to make them vanish.

When you deal with DRM (digital rights management) content, there are very strong restrictions placed on what you can and can’t do with the content. It’s more like a limited license to view the content rather than outright ownership.

For instance, let’s look at lending. With physical books, you can lend your book to whomever you want for as long as you want. With Amazon’s titles, not all publishers allow digital lending in the first place. Of those that do, there are Draconian limitations. From the Amazon Kindle help page:

You can lend a Kindle book to another reader for up to 14 days… A book can only be loaned one time.

Until you can freely loan or give away your digital copies of books, paper wins hands down.

Longevity

Even if you buy DRM-free content, you take the chance that you won’t be able to read that content in a few years or decades. There is a strong precedent of technologies dying and data being trapped on obsolete devices; see Lost Formats for examples. If Amazon goes out of business, what happens to all of the Kindle content you’ve amassed? Hopefully if that were to happen, Amazon would offer a service like Google Takeout to transfer the books to you. (Full disclosure: I work for Google.)

A floppy disk - one of many dead storage formats

Trav1085 via Wikipedia

The other aspect of longevity is the e-readers themselves. I haven’t had the best of luck with my Kindles so far – I am on my fourth Kindle in about as many years. Three of them died quickly, but as of yet I’ve had no problems in the past 2.5 years with the Paperwhite. It makes me wonder how long these devices last. If you have to buy a new $100 device every 3-5 years, this changes the calculus of whether e-books are more affordable.

Batteries

Digital readers run out of batteries; books don’t. Standalone e-readers typically don’t need charging very often, but phones do. If I were traveling and didn’t have ready access to electricity, this would be a concern. In practice this isn’t a big problem for me.

Screens

Some people prefer the look of a book to an e-reader screen. The e-ink display on the Kindle has improved with each generation, both in resolution, sharpness, and refresh speed. I don’t think there’s an objective winner here. The display on my Nexus 5 is incredibly sharp and I can read it in the dark, just as with the Paperwhite. My only objection to the screen is that it contributes to my spending 90% of my time staring at a glowing rectangle. I mitigate this somewhat by changing to the white text on black background on my phone, and keeping the light low on the Paperwhite.

Distractions

It’s easy to get distracted if you are reading digital books on a multi-purpose device. Reading takes time and concentration. Sometimes it’s hard to stick with that when there’s the allure of games and an infinite expanse of Internet content that’s a few button presses away.

Standalone e-readers offer a more focused reading experience that’s closer to that of reading a book. Since there’s fewer things you can do on it, there’s less temptation to do something other than read. Depending on your level of willpower, this point could be completely moot.

Conclusion

I vividly remember seeing the first clunky version of the Kindle just a few years ago and wondering how its owner could enjoy reading on it. The technology has improved so much since then that I’m a happy convert. While I acknowledge the superiority of physical books in some ways, it often makes sense to buy digital. Doing so avoids physical clutter and is extremely convenient. Most of the technical problems with digital books and e-readers have been solved; the remaining hurdles of consumer-unfriendliness are sociological problems that we can combat. For example, Microsoft changed its restrictive DRM in the Xbox One due to overwhelming negative response. If consumers showed as much passion for their rights to the book publishers and Amazon, perhaps we’d see a loosening of the reins as well.