Archive

Posts Tagged ‘testing’

“How to write a great research paper”

December 31, 2013 Leave a comment

“How to write a great research paper”

This is a great overview of writing a research paper. I haven’t written many papers, but having read some, the following jumped out as being particularly good advice:

  • Move the related work section to the end rather than right after the introduction. I like this idea; when the related work comes early in the paper, it distracts from what the problem and solution are in the first place. I think it makes more sense to see how others have tackled the same problem after I have a good sense of the problem. (p. 23-24)
  • Write the paper before doing all of the research. (p.5). I see a parallel here with test driven development – just as writing tests first helps steer the API design of a library, writing the paper first:

* Forces us to be clear, focused
* Crystallises what we don’t understand
* Opens the way to dialogue with others: reality check,
critique, and collaboration

 

Advertisements

You don’t get big by writing tests

October 21, 2013 Leave a comment

You don’t get big by writing a lot of tests (or checks). You get big by getting stuff done with competent people that pay attention to the changes they apply. YMMV – bradfordw

Source: http://news.ycombinator.com/item?id=3541317

Running unit tests

An example of running unit tests – jetbrains.com Resharper

I vehemently disagree with this statement for three reasons.

First, it is extremely naive.  It implies that one can avoid the need for automated tests just by being an assiduous programmer.  If you are the sole developer on a project, this may be true (I doubt it).  Once you involve other programmers and the code is composed of different loosely coupled systems (as befits a good design), there is absolutely no way that a programmer can manually ensure that his changes are not breaking other code, even if his own code is perfect.

Second, it draws a false distinction between the behaviors of writing tests on the one hand and being competent and getting things done on the other.  Yes, writing tests slow down development in the short term, and in that sense are a hindrance to ‘getting things done’. In the long run they are absolutely crucial to the health of the code base.  Why?  Let me count just a few of the ways.

  1. Well designed tests help you catch many common errors (fencepost errors, typos, mixed up conditionals, overflows, underflows, etc.)

  2. Tests provide good documentation in the form of usages of your code to clients.

  3. By coding tests which exercise the contract (external API) of your class, you have a safety net for refactoring and improving it.  You can swap algorithms, data structures, etc., while having some assurance that your code still performs correctly.

  4. Tests allow you to prevent regressions.  If you fix a bug once, you can add the code that exercises that failure condition to the test suite and ensure that it does not creep back in with future maintenance.

While some of this may be possible to verify manually each time, it is incredibly wasteful of engineering time and talent.  Something as important as software testing and verification should not be left up to manual tests.

Third, and perhaps most importantly, it is patently false.  Large companies have many thousands (millions?) of tests.  The bigger the reach of software, the more potential cost a software error can have, and thus the more engineering effort is spent towards alleviating that risk.  The larger the software becomes, the less possible it is for a single person to understand the entire the system and to know all the possible repercussions of a change to his code.

I don’t know if the original poster was trolling or not, but it gave me a chance to collect some thoughts I’ve had about testing. When I was in college, I barely wrote tests of any kind unless they were explicitly required. The coding I did was mostly for projects that were completed in a week, submitted, and never seen again. After I joined the Northern Bites robotics team, I started working with a real, 100K+ SLOC code base that had evolved over years. There it was immediately drilled into my head the importance of testing. The fact that multiple people touched the same module, and that the same module might outlive your time on the team by years, made it absolutely crucial to test thoroughly.

It was also crucial to save time. We worked with Sony AIBO robots, and to put the new code on them involved cross compiling, putting the code on a memory stick, turning off the robot, inserting the stick, rebooting the robot, then waiting for the code to turn on. This easily took a minute or two each time. The more of the testing that could be performed automatically in software via unit tests and integration tests, the less time I had to spend in the painful compile/execute/debug cycle using actual robots.

Northern Bites in competition

Northern Bites in competition

Once I got to my first job, I mostly did software prototyping, meaning the quality of the code did not have to be incredibly high – they were proofs of concept, and were not intended for production. Nevertheless, I took with me the lessons I learned from the robotics team, and found that writing unit tests up front really did save a lot of time in the long run. Just as with the robots, it’s a lot faster to exercise the system via repeatable, automated tests rather than manually launching an app and verifying behavior.

I’ve since moved on and spent the last two years at a very large tech company, I am absolutely convinced of the efficacy and importance of automated tests. The smallest change can have unintended consequences, even when that code change is reviewed and signed off by other engineers. For instance, we recently had a case where someone changed a single flag to the empty string and it ended up breaking an entire pipeline in production. It was only configuration that was being changed, not code proper, yet it took down the system. Had there been a test that exercised the handling of the empty string, we would have prevented many hours of wasted effort. This sort of thing happens even with extremely smart, talented, conscientious people. I shudder to think how code bases devolve with only manual tests.

I’ve heard the expression ‘you play how you practice’, and it applies equally well to sports, music, and coding. The sooner you learn the importance of testing, the better. Even on my hobby projects, I will rarely write a line of code without starting with the tests. I encourage anyone reading this to do the same. The original poster claims that you don’t get big by writing tests. In my mind, you don’t get anywhere without writing tests.

Coursera’s Human-Computer Interaction Class: A triumph of education

July 26, 2012 5 comments

I recently took a course on Human-Computer Interaction on Coursera, taught by Scott Klemmer from Stanford University. According to its About page, Coursera is a

social entrepreneurship company that partners with the top universities in the world to offer courses online for anyone to take, for free. We envision a future where the top universities are educating not only thousands of students, but millions. Our technology enables the best professors to teach tens or hundreds of thousands of students.

After having completed this course, I feel that Coursera provides an amazing service. It’s not perfect, but it is far superior to any online courses I’ve taken so far.

Motivation

I have been fascinated with design and making things ‘user friendly’ ever since reading Donald Norman’s The Design of Everyday Things in college. This book details why certain designs fail while others are intuitive and obvious. One of the things that stuck with me is the concept of affordances – buttons ‘afford’ being pressed, dials ‘afford’ being turned, handles ‘afford’ being pulled. To this day, it is one of my biggest pet peeves to find doors that open the wrong way. During a recent trip to Paris, I took pictures of some of the design failures I saw, including this particularly nasty door in the hotel.

Push
Pull

It’s the exact same handle on both sides of the door, but it is designed to swing in only one direction. We walked through that door at least 10 times but each and every time we had to think about how to open it; we had to push an interface that was clearly designed to be pulled.

This is a somewhat trivial example, but design can have incredible safety implications as well. A recent article claims that poor design contributed to the 2009 Air France Flight 447 crash:

In the next 40 seconds AF447 fell 3,000 feet, losing more and more speed as the angle of attack increased to 40 degrees. The wings were now like bulldozer blades against the sky. Bonin failed to grasp this fact, and though angle of attack readings are sent to onboard computers, there are no displays in modern jets to convey this critical information to the crews. One of the provisional recommendations of the BEA inquiry has been to challenge this absence.

(Emphasis mine)

When I heard from a coworker that Coursera was offering a course on Human Computer Interaction, I knew I had to take it.

Structure

The course was slated to last 5 weeks though it actually took 6 due to an extension in one of the assignments. Some of the topics included needfinding (determining what actual people need in an interface and how they make do with the status quo), paper prototyping techniques, storyboarding techniques, heuristic evaluation, lab usability studies.

The course had four main components each week:

  • lectures
  • quizzes
  • projects
  • peer assessment

Lectures

The lectures were presented as a series of videos broken into approximately ten minute chunks. Each video had the same slides that the professor presented as downloadable attachments. Most of the videos also had subtitles for a few different languages; I heard complaints on the forums that some of the later videos were without subtitles but as I am a native English speaker, it did not affect me.

There were two nice touches I liked in the lectures: embedded quizzes and video playback speed.

In almost every video, there would be a break in the video where an interactive quiz was presented based off of what Professor Klemmer had just presented. It’s a nice pedagogical trick to make sure you’re paying attention and understanding the material.

I found the default pace of the lectures a little slow, but the embedded video player allowed me to speed up the videos. I found the lectures were comfortable to watch at 1.75x speed.

Quizzes

In addition to the mini quizzes embedded in the videos, there were multiple choice quizzes each week based off of the lectures. These quizzes gave instant feedback after submission, which I appreciated. Students could retake the quizzes a few times until they pass; in general I found that the quizzes were easy if you paid attention to the video lectures.

Projects

The course hammered home the point that designers and implementers are the worst people to judge their own work. They have too much knowledge, and their mental model is nothing like that of the ‘average’ user. Furthermore, they are too close to the system to provide an objective evaluation – if they labored weeks on a particular feature, they’re going to inflate its importance, even if the design as a whole would improve without it. The meat of the course, namely the assignments and peer evaluations, were imbued with this idea. We were tasked with building a software prototype and enlisting the help of users to test it and improve its design.

At the start of the course, we were presented with three options for themes our design projects could take on. From the design briefs page:

  • Change – “Use the power of new technology to create an application or service that facilitates personal or social behavior change”
  • Glance – “Find people and design a personal dashboard tailored to their needs”
  • Time – “Redesign the way we experience or interact with time”

I chose the Glance design brief. My inspiration was a hideously complicated board game named Twilight Imperium. It’s a fun turn based conquer-the-galaxy board game, but it has some serious usability issues.

index

It takes an incredibly long time to learn the rules, and an even longer time to play. The first game I played took literally 16 hours over the course of a few days, and the fastest game I’ve ever completed took 5 hours. Two aspects of the game struck me as particularly frustrating:

  • The technology tree (there are 24 technologies grouped into four categories, spread across two massive pages in the instruction manual)
  • Combat (each of the ~6 ship types has a different base attack rate, which can be modified by your race, action cards, and political cards in effect)

In every game I’ve played, choosing technologies to buy and the combat bring the game to a screeching halt. I decided that my project would be to build an application of some sort to help make one of these aspects more intuitive and fun.

I designed two storyboards, one for the combat app and one for the tech tree app idea, in order to solidify who the app was for (board gamers who play TI), where it would be used (wherever they play), and the problems it solved (takes too long to pick technologies and/or fight, too much has to be kept in players’ heads)

storyboard

Next I decided to focus on the technology tree idea and came up with two mockups of divergent designs of this application using the wireframing software Balsamiq. (Longtime readers of this blog might remember that I used Balsamiq to make the illustrations for my post explaining how ListView works in Android). I decided that I wanted to drastically simplify the tree structure as laid out in the instruction manual and instead only display the prerequisites when necessary. (Technology X cannot be purchased until you purchase A AND B or C…).

One of my designs was inspired by the slick UI of Diablo 3 for crafting items:

Diablo 3 crafting.

Here’s the Balsamiq wireframe for that design:

Linear prototype
Interactive PDF

The second design I created was a grid layout:

Grid prototype
Interactive PDF

After receiving (and performing) a ‘heuristic evaluation’ of the prototype, I decided to actually implement the grid layout after making a few modifications. In the final few weeks, I implemented an interactive version using d3.js and HTML tables. I am no web designer, and I’m embarrassed by some of the hacks and nonfunctioning pieces of the prototype, but overall I am pleased with how it came out. The final assignment was to perform an honest to goodness usability test with at least three participants. The feedback I received from them will be invaluable for improving the prototype in the coming weeks. You can play with the same version of the prototype that my testers did if you’d like.

Final prototye

Conclusion

Coursera aims to allow tens or hundreds of thousands of people to be taught in one class, and this course proves that it can be done. According to Professor Klemmer, almost 30,000 students from watched the lecture material, and about 800 completed all of the coursework.
Usage stats

I alluded to it earlier, but the only way that this many students can be graded in a timely manner is through the use of peer evaluations. Before you complete each assignment, you are given exactly the same rubric as your peer assessors will have to grade your work. After the deadline for submission is up, you must go through a training exercise, grading five sample assignments in order to calibrate your scores with that of the professor. After this, you must grade at least five other students’ assignments; failure to do so results in a penalty to your grade. After you have seen these ten examples of other students’ assignments, you grade yourself using that same rubric. While this peer evaluation process was time consuming, it was an invaluable feedback tool, as it allowed me to measure myself relative to my classmates. In a traditional class, it’s rare to ever see others’ completed work, and in some cases it’s even against the honor code. In this online form, it is an absolutely crucial aspect of the course, as it allows the class to scale to a size unimaginable in physical classrooms.

I was extremely satisfied with the course, especially considering this was the first time it was offered through Coursera. Compared to my prior experience of using P2PU’s School of Webcraft to learn JavaScript, the quality of this course was much higher. It was a large time commitment, but I learned a lot and it forced me to actually implement an idea that I’d been kicking around in my head. There were a few hiccups and bugs in the Coursera system, including somewhat vague assignments, but Professor Klemmer noted in his farewell video that they are actively working to resolve these issues. If you get the chance to take this course, I highly recommend it.

EventBus – how to switch EventService implementations for unit testing

February 23, 2011 2 comments

I’ve written previously about EventBus, a great open source Java library for pub-sub (publish subscribe). It’s a truly excellent way to write loosely coupled systems, and much preferable to having to make your domain models extends Observable and your listeners implement Observer. I’m writing today to describe some difficulties in incorporating EventBus into unit tests, and how to overcome that problem.

Test setup

I was attempting to test that certain messages were being published by a domain model object when they were supposed to. In order to test this, I wrote a simple class that did nothing more than listen to the topics I knew that my model object was supposed to publish to, and then increment a counter when these methods were called. It looked something like this:

class EventBusListener {
    private int numTimesTopicOneCalled = 0;
    private int numTimesTopicTwoCalled = 0;

    public EventBusListener() {
        AnnotationProcessor.process(this);
    }

    @EventTopicSubscriber(topic="topic_one")
    public void topicOneCalled(String topic, Object arg) {
        this.numTimesTopicOneCalled++;
    }

    @EventTopicSubscriber(topic="topic_two")
    public void topicTwoCalled(String topic, Object arg) {
        this.numTimesTopicTwoCalled++;
    }

    public int getNumTimesTopicOneCalled() {
        return this.numTimesTopicOneCalled;
    }

    public int getNumTimesTopicOneCalled() {
        return this.numTimesTopicTwoCalled;
    }
}

The basic test routine looked something like this:

@Test
public void testTopicsFired() {


    // Uses EventBus internally
    DomainObject obj = new DomainObject();

    int count = 10;
    EventBusListener listener = new EventBusListener();
    for (int i = 0; i < count; i++) {
        obj.doSomethingThatShouldFireEventBusPublishing();
    }

    assertEquals(count, listener.getNumTimesTopicOneCalled());
    assertEquals(count, listener.getNumTimesTopicTwoCalled());
}

This code kept failing, but in nondeterministic ways – sometimes the listener would report having its topic one called 4 times instead of 10, sometimes 7, but never the same issue twice. Stepping through the code in debug mode I saw that the calls to EventBus.publish were in place, and sometimes they worked. Nondeterminism like this made me think of a threading issue, so I began to investigate.

Problem

After reading through the EventBus javadoc, I came upon the root of the problem:

The EventBus is really just a convenience class that provides a static wrapper around a global EventService instance. This class exists solely for simplicity. Calling EventBus.subscribeXXX/publishXXX is equivalent to EventServiceLocator.getEventBusService().subscribeXXX/publishXXX, it is just shorter to type. See EventServiceLocator for details on how to customize the global EventService in place of the default SwingEventService.

And from the SwingEventService javadoc (emphasis mine):

This class is Swing thread-safe. All publish() calls NOT on the Swing EventDispatchThread thread are queued onto the EDT. If the calling thread is the EDT, then this is a simple pass-through (i.e the subscribers are notified on the same stack frame, just like they would be had they added themselves via Swing addXXListener methods).

Here’s the crux of the issue: the EventBus.publish calls are not occurring on the EventDispatchThread, since the Unit testing environment is headless and this domain object is similarly not graphical. Thus these calls are being queued up using SwingUtilities.invokeLater, and they have no executed by the time the unit test has completed. This leads to the non-deterministic behavior, as a certain number of the queued up messages are able to be processed before the end of execution of the unit test, but not all of them.

Solutions

Sleep Hack

One solution, albeit a terrible one, would be to put a hack in:

@Test
public void testTopicsFired() {
    // same as before

    // Let the messages get dequeued
    try {
        Thread.sleep(3000);
    }
    catch (InterruptedException e) {}

    assertEquals(count, listener.getNumTimesTopicOneCalled());
    assertEquals(count, listener.getNumTimesTopicTwoCalled());
}

This is an awful solution because it involves an absolute hack. Furthermore, it makes that unit test always take at least 3 seconds, which is going to slow the whole test suite down.

ThreadSafeEventService

The real key is to ensure that whatever we call for EventBus within our unit testing code is using a ThreadSafeEventService. This EventService implementation does not use the invokeLater method, so you can be assured that the messages will be delivered in a deterministic manner. As I previously described, the EventBus static methods are convenience wrappers around a certain implementation of the EventService interface. We are able to modify what the default implementations will be by the EventServiceLocator class. From the docs:

By default will lazily hold a SwingEventService, which is mapped to SERVICE_NAME_SWING_EVENT_SERVICE and returned by getSwingEventService(). Also by default this same instance is returned by getEventBusService(), is mapped to SERVICE_NAME_EVENT_BUS and wrapped by the EventBus.

To change the default implementation class for the EventBus’ EventService, use the API:

EventServiceLocator.setEventService(EventServiceLocator.SERVICE_NAME_EVENT_BUS, new SomeEventServiceImpl());

Or use system properties by:

System.setProperty(EventServiceLocator.SERVICE_NAME_EVENT_BUS,
 YourEventServiceImpl.class.getName());

In other words, you can replace the SwingEventService implementation with the ThreadSafeEventService by calling

EventServiceLocator.setEventService(EventServiceLocator.SERVICE_NAME_EVENT_BUS, 
new ThreadSafeEventService());

An alternative solution is use an EventService instance to publish to rather than the EventBus singleton, and expose getters/setters to that EventService. It can start initialized to the same value that the EventBus would be wrapping, and then the ThreadSafeEventService can be injected for testing. For instance:


public class ClassToTest{
    // Use the default EventBus implementation
    private EventService eventService = EventServiceLocator.getEventBusService();

    public void setEventService(EventService service) {
        this.eventService = service;
    }
    public EventService getEventService() {
        return this.eventService;
    }

    public void doSomethingThatNotifiesOthers() {
        // as opposed to EventBus.publish, use an instance of EventService explicitly
        eventService.publish(...);
    }
}

Conclusion

I have explained how EventBus static method calls map directly to a singleton implementation of the EventService interface. The default interface works well for Swing applications, due to its queuing of messages via the SwingUtilities.invokeLater method. Unfortunately, it does not work for unit tests that listen for these EventBus publish events, since the behavior is nondeterministic and the listener might not be notified by the end of the unit test. I presented a solution for replacing the default SwingEventService implementation with a ThreadSafeEventService, which will work perfectly for unit tests.