Ready for artificial intelligence in speech recognition?

Artificial intelligence in speech recognition is transforming the technology, but are enterprises ready to employ these new tools within their operations?

Katherine Finnell, Senior Site Editor

Published: 14 Mar 2018

ORLANDO, Fla. -- Speech technology has become more than speech-to-text dictation for note taking and documentation. Thanks to artificial intelligence (AI), machine learning and natural language processing, speech technology today is enabling the development of virtual assistants in user-facing applications and working in the background to automate workflows.

Virtual assistants powered by speech technology are seeing an upward trend in the enterprise, according to J Arnold and Associates analyst Jon Arnold, who spoke about speech technology in the enterprise at Enterprise Connect 2018.

The accuracy of artificial intelligence in speech recognition technology has reached a point where it can be seriously considered. Google's technology for artificial intelligence in speech recognition, for example, has achieved 95% accuracy, according to venture capital firm Kleiner Perkins Caufield & Byers. This high level of accuracy indicates that speech technology is ready for the enterprise market, Arnold said.

Yet enterprises may not be ready for speech technology.

"My sense right now is it's all fun technology to demonstrate, but I'm not seeing any overwhelming demand," said Nemertes Research analyst Irwin Lazar. Fewer than 20% of organizations had any plans or were evaluating virtual assistant technology, according to a Nemertes contact center study.

Artificial intelligence in speech recognition, and beyond

For enterprises considering speech technology, there are four flavors to choose from, according to Arnold.

Speech-to-text. These applications range from dictating emails to transcriptions during meetings. Arnold said AI-driven speech-to-text can quickly learn languages, which can be especially useful in verticals, such as finance and healthcare, which use specific terms and acronyms.
Text-to-speech. Arnold said text-to-speech apps can help employees stay productive in mobile settings by allowing them to review emails and messages. Text-to-speech can also fine-tune audio by smoothing out accents, volume, speaking rate and long pauses.
Speech recognition. Artificial intelligence in speech recognition can handle requests and queries for commands such as calendaring, managing meetings, keyword search, or customized phrases or shortcuts to automate tasks. Applications can also be more complex for real-time language translation or voice biometrics for identification and authentication, Arnold said.
Speech analytics. Arnold said speech analytics turns unstructured data into structured data and can offer compliance with call recording, quality monitoring, sentiment analysis, and improve the speed and accuracy of workflows.

Lazar said the most practical use for speech technology is to enable employees to walk into a conference room and employ voice biometrics to kick off a meeting or start a video call. The technology can also be used to let staffers dial into an audio bridge or upload relevant documents.

Speech technology vendor landscape

Speech technology vendors range from large companies such as Google and Amazon to smaller, niche technology vendors such as Nuance and LumenVox. Many of the vendors in the market come from the consumer and contact-center world, and as a result, few focus solely on the enterprise. These vendor offerings come in three difference forms, Arnold said.

Vendors such as LumenVox and Speechmatics offer point products for text-to-speech, speech-to-text, speech recognition and analytics. These offerings are purpose-built for a specific app.

The second type of vendor offering is cloud-based conversational platforms that use machine learning and natural language processing. These offerings include Amazon Echo and Apple's Siri and are often geared toward consumers but have practical applications in the enterprise.

The third offering comes as part of enterprise unified communications and collaboration platforms such as Cisco Spark Assistant, Microsoft Teams Cognitive Services and Alexa for Business, Arnold said. These artificial intelligence in speech recognition offerings are different from general purpose conversational platforms as they use AI to help workers become more productive.

Nuance, meantime, was recognized at the Enterprise Connect Innovation Showcase for its voice biometrics offering, VocalPassword, which allows users to authenticate their identity using their voice instead of using a PIN or other means of identification.

Jamie Flores, senior manager of core technologies at Nuance, said voice biometrics can be used proactively as a means of authentication and passively to match a caller's voice print on a call with a contact center agent. If the voice print doesn't match, the system can flag the agent for follow up with security questions.

Nuance, an omnichannel service provider of speech technology, also announced updates to its omnichannel customer engagement platform, which includes an AI-based text-to-speech engine called Zoe that uses deep neural network programming to create customizable, natural-sounding voices.

Next Steps

Learn why voice design is becoming increasingly important

Conference Coverage

Ready for artificial intelligence in speech recognition?

Artificial intelligence in speech recognition is transforming the technology, but are enterprises ready to employ these new tools within their operations?

Artificial intelligence in speech recognition, and beyond

Speech technology vendor landscape

Next Steps

Dig Deeper on Team collaboration software

AI drives new speech technology trends and use cases

AI speech technology offers enterprises benefits, risks

voice recognition (speaker recognition)

What's the difference between speech and voice recognition?