Are humans better than AI at transcription services?

AI at transcription services, Match human accuracy
© Arnon Mungyodklang

Despite AI-based transcription services undergoing improvements in recent years, their limited vocabularies and lack of socialisation mean that they are unable to match human accuracy rates of 99 to 100%

Over the last decade, the transcription industry has become virtually unrecognisable. Whilst the concept of transcription, or the production of a written record of something spoken, has remained the same, the industry has changed dramatically. This is due to advancements in technology, which have subsequently led to the emergence of AI-driven transcription apps such as Trint, Otter, and Wreally.

This is due to their ease of use in that the user simply has to download the app onto their mobile device, press record, and the process begins. Furthermore, a 2017 study found that AI now has a 5.1% error rate in terms of recognizing speech, which is the almost same as a human.

However, transcription experts believe that despite the advancements made in the fields of bot-driven transcription services, providers who use human transcribers are still at a significant advantage due to the better accuracy rates afforded by the human ear, which stand between 99 and 100%.

Furthermore, whilst AI-driven services are theoretically faster given that they produce transcripts in real-time, the error rate by bots still stands at 12% meaning that the client will have to read the entire transcript and fix the errors themselves. By using a human-driven transcription service such as GoTranscript, the transcription process works differently.

The client will upload their audio or video file via the GoTranscript mobile app or website, which will then be worked on by a team of human transcribers who will begin the transcription process by dividing the upload into smaller parts. The transcript will then be worked on and depending on its size, will be returned to the client error-free, within six to 24 hours.

In addition to the rapid turnaround given to clients by human-driven transcription services, there is also the matter of how the human ear is more accurate than a bot. Whilst most Automated Speech Recognition (ASR) software works well in terms of day-to-day use on mobile devices, it still has a number of drawbacks.

To begin with, the human brain is highly adept at filtering out background noise, meaning that it is less likely to skip or misunderstand words. Furthermore, the human’s social upbringing means that it is able to understand other external factors such as slang, intonation, and even jokes depending on their cultural context. A robot, however, is pre-programmed with a limited vocabulary, which is arguably their largest problem.

A robot’s limited vocabulary means that it is also unable to distinguish between multiple speakers, or ‘interlocked speech’, which is the biggest reason why they currently have an accuracy rate of no higher than 95%.

GoTranscript CEO, Peter Trebek, commented:

“It is great to see the advancements that AI has made in terms of transcription in recent years.

“With these ever continuing improvements, they have also driven us to continue raising the quality of our product which we hope will continue to keep us at the very top of the transcription market in terms of accuracy rates between 99 and 100%.”

As the margins between AI and reality become ever narrower, it remains to be seen what measures and other human-driven transcription services will take in order to maintain their accuracy rates over automated transcription services.


Please enter your comment!
Please enter your name here