Problem solve Get help with specific problems with your technologies, process and projects.

The evolution of speech technology

Will the pervasiveness of speech cause a revolution in the IT industry, or will it drive the evolution of IT and contact centers in terms of improving their ability to retain customers, bring down costs and create new streams of revenue? Learn how speech technology has evolved and where it may be headed in this guest commentary.

People have a word error rate of about three to five percent, even with all of the auditory and visual cues offered when looking at someone. Suffice to say, for an un-seeing, un-hearing, un-thinking computer to understand speech, the task of understanding communication patterns is enormous. Will the pervasiveness of speech cause a revolution in the IT industry, or will it drive the evolution of IT and contact centers in terms of improving their ability to retain customers, bring down costs and create new streams of revenue? Will the way we interact with computers be revolutionized by speech?

Speech recognition began its evolutionary course as its own unique pillar. People used it to dictate with proper pauses between words, and then speech technology evolved to allow natural speech (no more pauses). Most people can speak faster than they type, but most companies were not willing to put the time and effort into training and maintaining the speech recognition system. Once the industry started placing speech technology in applications outside the desktop, peoples' lives began to benefit from speech technology's evolution. The more advantageous places we put speech, the more speech silo breaks down and speech becomes easier to design, deploy and use.

Many companies demand directed dialog applications in industries such as financial services and banking, where customers want to obtain their bank balance, locate a nearby ATM or transfer funds. The financial sector has always been the leader in the use of speech technologies. Other industries with strong demand for these kinds of inquiry/transaction interactions include insurance, telecommunications, government, travel, utilities and consumer packaged goods.

Another popular application in the market today is intelligent call routing using natural language understanding. Rather than listening to a repetitive cycle of possible menu routes, the caller hears, "How may I help you?" The response utterance is evaluated based on the possible set of allowable actions and routed appropriately. This kind of application cuts down on the amount of different phone numbers required for different functions, and leads to greater customer retention and satisfaction as they are directed immediately to where they need to go. A call routing application may, for example, connect you to a loan specialist or send you to a directed dialog system for getting your loan balance.

We are just now beginning to see how natural language self service can be beneficial across many industries. These applications are user friendly, and they enhance the customer experience, as well as create an on-demand environment for customer service. Their value will expand as we develop new tools to reduce the time to market for advanced natural language applications.

Speech verification has huge potential to reduce fraud and identity theft; however, little government regulation exists that dictates when speech verification is proof of action, like when the government declared that a facsimile is a legal copy. There are many flavors of speech verification out there, and the applications and user interfaces are still being refined.

We have witnessed great advances in the past five years in the quality of text-to-speech (TTS), and we will continue to see that technology become more natural and easier on the ear. More applications using TTS rather than recorded prompts will be created. Finally, one area that is virtually untapped is speaker independent transcription over the phone, which would allow you to dictate an e-mail or add notes to your customer database using your phone.

About the author:
Brian Garr is program director and segment manager for Contact Center Solutions in the Software Group of IBM. He has been with IBM for six years, and is an evangelist and speaker worldwide on machine translation, text to speech and speech recognition. Prior to joining IBM, Garr was a CTO and VP of two startup technology companies. He has a BA degree from Washington and Lee University. Garr received the Smithsonian Institute's "Heroes of Technology" designation in 1998 for his work in machine translation.

Dig Deeper on VoIP Migration and Implementation

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.