Not so long ago we began a review of a speech recognition program by repeating the gibberish that it generated.
Since then, the technology has improved beyond all recognition. In fact, Dragon Naturally Speaking 10 is being used to write this review.
Although faster than its predecessor, Nuance’s claim that it is 15 per cent more accurate is misleading as this refers to a reduction in an already small number of mistakes.
Version 9 was well over 90 per cent accurate, and even inexperienced users can achieve that with version 10 with care but little or no training.
So why isn’t it used more often? The software has become more tolerant of noise but you still need to be in a quiet room to use it, if only to avoid driving colleagues crazy with your dictation. This also rules out using it in open-plan offices. Another reason, perhaps, is that people aren’t used to dictating.
The flip side is that some people talk better than they write; if you like pacing up and down, giving forth, you could go for the wireless edition, which comes with a Plantronics headset. Dragon will also transcribe from a digital recording.
The biggest problem is correction. If the software has failed to recognise a word once, it is likely to get it wrong again. Dragon failed in four attempts to recognise ‘gibberish’ in the first sentence of this review and got it only when the word was spelled out. But the second time the word was used it got it right first time. Recognition improves as the software gets accustomed to you and your vocabulary and, to an extent, as you become accustomed to using the software.
Correction is easier with simplified commands. To delete the word ‘publisher’ from a previous sentence you need say only ‘delete publisher’, rather than first selecting the text and then operating on it.
The Preferred and Professional editions take this further by allowing commands such as: ‘Search the web for green bananas’, or ‘Search Wikipedia for plantains’. Potentially more useful, but only in the Professional edition, is the support for controlling Microsoft Office applications and automating tasks, including those involving several Windows applications.
The hidden benefit, particularly for offices, is that users do not need to know what they are doing and so require less training. You can, for instance, create a 3x4 table by saying ‘Create three by four table’, without needing to know about menus and dialogue boxes.
For more complex tasks you can create macros and call them by name, either using Office’s own facilities or, in non-Office applications, by using the program’s own scripting language. Visual Basic for Applications developers can automate tasks involving several applications.
It also helps people who cannot spell, so long as they are literate enough to recognise its mistakes (writing ‘their’ instead of ‘there’, for instance). Dictation recording can be saved with the transcription to allow other people to check and correct.
The interface has been smartened up, as has the help system. If you can’t think of a command, ask ‘What can I say?’ and a box pops up with a list of commands relevant to what you are doing.
Voice recognition is unlikely to improve until a step change in computing power allows transcription to be based on ‘understanding’ as well as statistics. It also needs more help at the system level: a multimodal interface that allows dictation to be used better in tandem with handwriting and gesture recognition. But as it stands, it is still seriously useful.
All Audio Recording, Editing & Mixing Tags: Nuance, Speech-recognition



