
Accuracy is especially important when transcriptions are affecting the accuracy of quotes in news stories, the outcomes of expensive and important trials, or even the lives of patients While it might seem like a straightforward project for AIs, since it can be seen as simply converting one kind of data (sound) into another (text), in fact, various factors make voice transcription a significant computing problem.

Although it’s hard to find statistics for transcription specifically, Grand View Research projects that the global voice recognition market overall will hit $127.58 billion by 2024. According to the Bureau of Labor Statistics, there were 57,400 medical scribes and 19,600 court reporters (which includes closed captioners for television and other media) in the United States in 2016.

In addition, voice to text transcription has long been an important business on its own merits in the medical, legal, and media fields, to name a few, and has traditionally been done by teams of human transcribers who charge rates of $3 or $4 per minute. This makes reliable voice transcription an important goal for artificial intelligence. Instead, it’s in the form of spoken words on video and audio recordings or even live events. But much of the data in the world isn’t in text form.

Artificial intelligence - especially machine learning - is at it’s best when it is working with a large, analyzable data set, like text.
