Jump to content






Photo
* * * * - 7 votes

Speech to Text Recognition: The Last Frontier? Holy Grail? Sleight of Hand?

Posted by Ken Cook , 08 October 2012 · 13722 views

Speech to Text Recognition: The Last Frontier? Holy Grail?  Sleight of Hand? Ok, my second post on the blog. So far my numbers are impressive. I’ve made it into double digits. True, not far into double digits, but nonetheless….. Oh, and just as a reminder, if you have any issues with my topic this time, please refer to my first blog for further explanation of how this blog works.

Speech to text recognition (STR) is, to many people, the Holy Grail of the digital recording world. Well, the medical profession has had STR for a long time. It’s much easier if one doctor is speaking into an isolated microphone. They can train the system to recognize their voice and as long as they speak in a consistent manner, then it works very well. Even with long complicated medical terms. In most cases, someone reviews the final document before it’s sent out.

This is a closed microphone environment.

In our world, this environment rarely exists. You can have a dozen or more microphones with the people changing all day long. And then you add in the accents, dialects, scared witnesses, angry attorneys, multiple simultaneous speakers and so on.

This is an open microphone environment.

STR simply does not work to the level necessary to meet the high standards of the judicial and law enforcement venues. Now that is not to say that some companies claim to have STR, but if you look under the hood, you find that a 70% accuracy rate is considered sufficient. I don’t know about you, but I don’t want a computer to be 70% or even 90% accurate when creating a transcript. Not with my freedom or money on the line.

Now some of you are saying what about voice writers (used to be called steno masks). These are highly trained people who can repeat what they hear into a hand held mask with a voice silencer (so they don’t disturb the court) that is recorded onto specialized software and translated into text. Good voice writers have a fairly accurate (90%) translation rate. Of course, like the doctors, this is a closed microphone environment.

The challenge here is a matter of numbers. All the combined numbers of stenographers and voice writers are grossly insufficient to cover all the courts, hearings and depositions. Generating new stenos and voice writers can take up to 2 years of difficult training and only voice writers can utilize STR technology. Stenos can use Computer Assisted Transcription (CAT) software but it still requires they ‘type’ on their steno machine.

Thus we come to digital recording.

Digital recording can capture it all in any language, any dialect, whispering, screaming, whatever. But how do you get that into a useful document that ranges from the official court record to summaries to quick briefs?

The old-fashioned way. You type it. Whether that is in a word processor on a desktop or as part of a CAT, it all boils down to eventually someone has to apply mechanical energy to create an accurate transcript.

To be fair, people can use STR to create drafts of transcripts. However, they still have to listen to the audio to edit and ‘clean up’ their document before submission which just seems like more work to me.

One approach to an effective use of STR is selective conversion which could be considered a subset of a closed microphone environment. In a number of agencies, only the judge’s rulings or comments are actually put on paper, so to speak. By isolating the STR technology to one recording channel it could be possible to leverage that to increase the rate of the official document production.

An interesting question we are seeing out in the field from judges is “How accurate does the transcript have to be?” Not in terms of words and context, but in terms of formatting, grammar, punctuation, etc. Many requests for transcripts are from judges looking to review the previous day’s testimony. They only want to refresh their memory or add to their notes. STR could be very useful with this.

Other ideas exist for using STR technology in new ways within the court environment, but corporate policy forbids me from discussing it even in the context of free speech. Uncle Kenny likes his paychecks appearing on a regular basis. Yes, that was a deliberate teaser and a sop to my corporate handlers. Stay tuned.

Look, I’m not saying STR doesn't have a role to play in court, hearings or law enforcement digital recording world. I’m just saying if someone promises you STR, be sure to look carefully behind the curtain before you embrace it.

  • Ron Dalessio, Radhika Anand, Tim Grant and 4 others like this