Voice Mail Preview FAQ (Part 2)

Article
01/21/2010

This is the second part of the frequently-asked questions list that I’ve put together for Exchange 2010’s Voice Mail Preview feature. You may want to read Part 1 before reading this.

Why ‘Preview’? Why not ‘Transcription’ or ‘Speech to Text’?

Names are important, and we thought hard about this one. It’s important to set users’ expectations appropriately. Voice Mail Preview does not necessarily produce text that’s the same as what the caller said in their voice message. In fact, it is usually inaccurate in some way. To call it Transcription would be to suggest a more perfect result than is generally achievable. Preview suggests that the reader should be able to gain an idea of the voice content, which is closer to the real capability of the feature.

How accurate is a Voice Mail Preview?

That depends on the content of the voice recording.

Speech recognition is very difficult for software, even under ideal conditions. Voice mails are already especially difficult for automatic speech recognition because:

The sound is often transmitted by devices and networks that discard information at low and high frequencies, and
The person leaving the message is unknown to the speech recognition system, so it has no idea of what their particular voice sounds like. The system must use an averaged-out approximation of speech patterns, for a given culture.

The average accuracy of Exchange 2010 Voice Mail Preview for callers speaking English, with US English accents, is between 70% and 80%. This means that about 1 word in every 4 in a typical Voice Mail Preview will be wrong. Accuracies can be higher than this. They can also be much lower.

What makes a Voice Mail Preview more or less accurate?

Voice Mail Preview is likely to be more accurate when the following conditions are met:

The speaker speaks with an accent, pacing and intonation that are well matched by the tuning of the speech recognition. The speaker must, of course, use a language to which the system is attuned. A message left by person speaking English normally with a “middle American” accent should produce a more accurate preview than one left by a person speaking very quickly and/or softly with a pronounced regional (or non-American) accent.
The speaker leaves a “typical” voice mail, and does not use unusual names or words (e.g. technical jargon, archaic forms of speech).
The telephone call is free of background noise, echo and audio drop-out.

Clearly, some of these factors cannot be controlled. Therefore, Voice Mail Previews will sometimes be inaccurate.

In which languages is Voice Mail Preview available?

In Exchange 2010, Voice Mail Preview will be offered in the following language UM language packs:

English (US)
English (Canada)^*
French (France)
Italian
Portuguese (Portugal)^*
Polish^*

At the time of writing, language packs marked with a star^* were not yet available. The Understanding Unified Messaging Languages topic on Microsoft TechNet has an up-to-date listing. The article also contains links from which the language packs can be downloaded.

The goal is to offer Voice Mail Preview in as many UM language packs as possible. Later releases may extend the supported set.

Voice Mail Preview FAQ (Part 2)

Why ‘Preview’? Why not ‘Transcription’ or ‘Speech to Text’?

How accurate is a Voice Mail Preview?

What makes a Voice Mail Preview more or less accurate?

In which languages is Voice Mail Preview available?

Additional resources