Despite the hype, speech technology has never lived up to its promise. Speech technology needs to increase its ability to interpret natural language in order to become truly useful. Once this happens, speech will become an integrated component of many applications and will significantly alter how we access information.
A few days ago, I was flying back from the West Coast and wanted to check my flight status, so I called my airline's "speech enabled" call center. I then realized this "speech enabled" response system operated no differently from one driven by DTMF (dual-tone, multi-frequency) signaling. When I called in, the first thing I had to do was to "speak" my frequent flier number. The next step was to say "flight information" or hit digit 1. I was then prompted to say "departure" instead of pressing digit 2. This process dragged on and on until I had the information I wanted -- it took far longer than I would have liked.
The system was walking me through the exact same information tree whether I used key strokes or speech access, making the whole session about the same length of time regardless. Ideally, what I would have liked to do was call into a number and say "Departure information for Flight 123 from City X" and have the system parse the information and figure out the request. I really do believe that in order for speech access to become really useful it has to evolve to the point where your request is similar to the experience of talking with a live person.
There are many speech-enabled devices on the market today. Many voicemail systems, almost all cell phones, and call centers are voice enabled. Despite the availability of speech-enabled devices, the actual usage of speech comprises less than 10% of the market. Many people have asked me why this is, and I believe that the answer lies in the fact that speech hasn't really changed the way we work or simplified the way we access things. It's simply been a replacement for things we can already do.
I do think that speech holds a tremendous amount of promise, though, especially when it comes to mobile computing. Currently, mobile computing is limited to email, calendar access and voice on rather pricey "smartphones." While smartphones are on the rise, they still amount to only about 10% of the overall phone market. So if that's the case, how does the mobile computing market scale? The answer is speech.
How does this work? Imagine someone is in an airport and finds that his flight is cancelled. He needs to contact a party on the other side with the news of the delay and presumed tardy arrival. One feasible solution, and probably the most time effective, would be for the user to call the person or persons expecting his arrival. Another more likely alternative today is for the individual to open up his laptop, connect to the airport WiFi (which usually involves going through some sort of authentication process), invoke a VPN client, run the appropriate application, retrieve the phone number, and then make the call. This process can take a good 10-15 minutes -- and that is if everything works as it should. If any part of that process is faulty, you are looking at some frustrating work. What if the CRM system were speech enabled and the user could call in and simply say "Phone number for Cosmo Kramer and Vandalay Industries" and have it automatically dial the number? A task that took minutes (or hours, depending on your luck) would be reduced to seconds, simply by removing all of the human latency that is built into the process.
The benefit of this type of access method is that it's available to all users on all phones, since they all support voice (obviously), lowering the barrier to entry from a $500 smartphone to the free phone one might get with any basic mobile phone plan. I've talked with many vendors about this and they all seem to agree that speech will become not only a way to access corporate information but also a way to trigger different processes remotely. For example, if a system is having a problem, it could automatically dial an administrator who could invoke a restorative process by "telling" the system what to run instead of having to go to a remote console. This is currently being considered or trialed with a few companies I've contacted as a way to reduce processes with a high level of manual latency.
In summary, speech is a superb idea but has not realized its full potential in its current form. The process is often cumbersome, inaccurate and a waste of time. I think that speech will enable many businesses to streamline processes and will allow us to access more information in more places. For this to happen, though, speech technology will need to go through a significant upgrade over the next few years that addresses its security and authentication problems.
What should you be doing now if you are considering implementing a speech system? I'd recommend looking for those business processes that tend to have a lot of manual steps in them, creating a much longer process than is necessary. Once you find the process you think needs a little fine-tuning, you will be in a position to leverage the combination of speech and mobility when the technology is ready.