After entering the “real world” I’ve had to confront something I’ve never had to deal with before: the dreaded automated phone tree system. The routine is generally the same: Welcome. Choose language. Listen to top level hierarchy. Choose one, drill further down. Rinse and repeat.
These systems annoy people because they feel impersonal and they take a lot of time. Due to the strictly linear nature of spoken word, you can’t speed up the robotic voice, and you’re trapped there waiting for the choices. People have gone so far as to list all the ways to speak to a human as quickly as possible, bypassing all the garbage (http://gethuman.com/us/ seems to be down right now).
I understand why these phone trees exist: it’s more cost effective for the company to have you waste your time fumbling through a phone tree, trying to determine which entry pertains to you, rather than to have one of their call-center employees forward your question to the right department. And I sympathize in some cases- there certainly are things that an automated voice is just as good at providing you as a human, such as listing the hours of operation of a store. The problem comes when what you need requires a human and it wastes too much of your time trying to get through the tree.
Companies must hate the aforementioned site that tells you how to bypass their electronic system completely, as much of the information that you might be calling about is in their tree somewhere, if you were patient enough to find it.
I propose two solutions to the problem.
1. Visual phone tree
I’d hazard a guess that the majority of phones out there have displays on them of some kind or another – I know in my parent’s home, not exactly the Mecca of high-tech, only 2 wired handsets do not have screens while the cordless phones all do. The vast majority of cell phones have screens.
Given the preponderance of screens on phones, it seems wasteful that we’re stuck listening to the linear choices rather than seeing them presented graphically on the screen. This idea is similar to what Apple did with the Visual Voicemail system it introduced (http://support.apple.com/kb/HT1486).
By displaying the tree in its entirety, or at least the subset you are interested in, you can easily filter out all the information that doesn’t pertain to you and get to your destination much more quickly, as you do not have to listen to the plodding voice go on and on. Furthermore, it is easier to keep the choices in your head if you can see them within a single screen. I find it much harder to keep multiple things in my head when they are spoken, rather than when they are seen. There is a constancy in the visual display of information that cannot be matched by the inherently fleeting transmission of sound.
If the phone you are using does not have a screen, the system could fall back to the standard spoken description of the tree.
2. Voice recognition search
A few phone systems I’ve used had you speak your choice rather than pressing a number. This seems more like a gimmick than anything else; it’s not any faster for doing numeric entry. In some cases it has been impressive though; one system had me speak my email address and then it parroted it back to me, having perfectly recognized the address. Keying in that information would have been a huge pain.
Speech recognition technology is not a new thing, and phone companies are already using it to an extent, as I already mentioned. Furthermore, search is not a new field. The combination of the two could be very powerful in a phone system – rather than drilling through the phone tree, you could merely speak what you were calling about and the system could recognize the words you said, then perform a search on those keywords. It could then present you with the most likely hit, or hits. In conjunction with a visual display of the results, this could really speed things up in terms of being directed to the correct department or information. Furthermore the system could determine if you were speaking Spanish or English, and do away with the inane “for English press 1, para Espanol, numero dos.”
The speech recognition would be the difficult part of the system – the current systems seemed trained to handle certain specific phrases, rather than generic phrases and words. I have used Dragon Naturally Speaking, a commercial speech recognition/dictation product before, and it needs hours and hours of training with your specific voice before reaching any level of acceptable accuracy. I can imagine how frustrating it would be if the system kept returning irrelevant results due to a misunderstanding of your query.
I would love to avoid using the phone to deal with companies in the first place, but sometimes the thing I am trying to do cannot be accomplished online (e.g. resetting a password after having too many failed login attempts) and I must resort to the painful phone tree. The above solutions would help alleviate some of the pain.
What do you think? Do you find there are things you must do over the phone that you’d rather handle a different way?