Speech technology — a broad field that has existed for decades — is evolving quickly, thanks largely to the advent of AI.
No longer is the field primarily about speech recognition and the accuracy of speech-to-text transcription. Underpinned by AI, speech-to-text today has been automated to the point where real-time transcription is good enough for most business use cases. Speech-to-text will never be 100% accurate, but it’s on par with human-based transcription, and it can be done much faster and at a fraction of the cost.
For some, that might be the only AI-based speech technology use case of interest, but within the workplace communication and collaboration field, it’s really just the beginning. For the past six years, I have been presenting an annual update on this topic at Enterprise Connect. Let’s explore three main speech technology trends discussed at this year’s conference that IT leaders should consider.
1. AI builds on speech technology
Today, AI has now gone well beyond basic transcription. Many AI-driven applications have become standard features of all the leading unified communications as a service (UCaaS) offerings, among them real-time transcription, real-time translation, meeting summaries and post-meeting action items. Note that some use cases apply solely to speech, but others are voice-based activities that tie into other applications, such as calendaring.
More recent applications rely on generative AI, which can automatically create cohesive email responses, memos and blog posts from either voice or text prompts (most workers will likely prefer using their voices).
The current state of play builds on conventional forms of speech technology. But with AI, the use cases are broader and are integrated across workflows, as opposed to just being used for speech recognition.
IT leaders should expect these capabilities to be table stakes as they evaluate potential UCaaS offerings or as they consider how to stay current within their existing deployments. All of these AI-based applications are still works in progress and should keep improving — both in terms of speech accuracy and how well they integrate with other workplace and productivity tools.
2. Emerging applications
Even as IT leaders assess these new capabilities, they mustn’t lose sight of the bigger picture. These applications mainly apply to the way people work today and they tend to be viewed as point products, which do a specific set of tasks very well. However, AI moves on a faster track than anything before. While many of these tasks are largely mastered now, the next wave of innovation based on AI speech technology operates on a higher, organization-wide scale.
A case in point is conversational AI, which enables chatbots to be more conversational and human-like, making them much more palatable options for self-service in the contact center. Today’s chatbots are far from perfect, but they are gaining much wider adoption now, including in the enterprise where workers now use them as digital assistants.
Large language models (LLMs) are the next big phase for AI. The main idea here is that enterprises are seeing value in capturing all forms of digital communication to help make AI applications more effective. Although text and video have long been digitized, many forms of speech have not. With the majority of everyday communications being voice-based, there is a growing interest in capturing this information, otherwise known as dark data, as it represents a valuable set of data inputs for AI.
LLM development and management is evolving quickly, not just due to the nature of AI, but also because C-suite executives now see the potential of LLMs as a competitive differentiator. (There are, in fact, many types of language models for AI, so the reference here to LLMs is an oversimplification. Most IT leaders are not data scientists, so this is an area where outside expertise would be of value.) With speech being so central to this trend, IT leaders need to take a more strategic view of speech technology.
3. Strategic implications for IT
Clearly, IT needs to move past the legacy model of speech technology, especially as AI drives much of the innovation around voice and other communications. As such, speech technology trends can no longer be viewed in a vacuum, where the metric of success is transcription accuracy.
More important is recognizing how AI now ties speech applications to everything else, integrating with workflows, project management, personal productivity and team-based outcomes. Everyday conversations, wherever they take place, still have inherent value, but with AI, their worth as digital streams that blend with other digital streams is poised to become even greater.
This is what makes speech technology in the enterprise so strategic. These applications will continue playing a key role in helping workers communicate and collaborate more effectively — mainly with UCaaS — but the bigger picture is pinpointing where AI’s business value really lies.
Data is the oxygen that gives AI life, and the more data your model has, the greater the benefit. Most organizations are only capturing a small portion of their dark data, and this is where speech technology really comes into play when considering your plans for AI.
Jon Arnold is principal of J Arnold & Associates, an independent analyst providing thought leadership and go-to-market counsel with a focus on the business-level effect of communications technology on digital transformation.
This post was originally published on the 3rd party mentioned in the title ofthis site