Create A Simple Speech Recognition Application

Traditionally, IVRs used DTMF (or digits entered by a user on their phone) to better direct users. As Speech Recognition technology has improved, more and more companies are taking advantage of this technology to improve their user experience.

Voice Elements Makes Speech Reco Easy

It’s very easy to build Speech Recognition applications using Voice Elements. Voice Elements supports both Lumenvox and the Microsoft Speech Platform that has support for 18 languages.

Grammar Files

Grammar files contain lists of words that you would like to be able to recognize. For example, if you wanted to let the user identify which state they are calling from, you could create a grammar file with all 50 states.

It’s very easy to create grammar files. For more information , please see our article “Create Microsoft Speech Compatible Grammar Files.”

Here is a simple grammar file YesNo.gram that allows the user to say “Yes”, “No”, “Correct”, “Incorrect”, “Negative”, or whatever items you define.

<?xml version="1.0" encoding="utf-8"?>
<grammar xml:lang="en-US" root="root" tag-format="properties-ms/1.0" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" xmlns:sapi="http://schemas.microsoft.com/Speech/2002/06/SRGSExtensions">
  <rule id="Lookup" scope="private">
    <one-of>
      <item>
        <one-of>
          <item>YES</item>
          <item>AGREE</item>
          <item>CORRECT</item>
          <item>OK</item>
          <item>RIGHT</item>
        </one-of>
        <tag>"YES"</tag>
      </item>
      <item>
        <one-of>
          <item>NO</item>
          <item>FALSE</item>
          <item>WRONG</item>
          <item>INCORRECT</item>
          <item>NEGATIVE</item>
       </one-of>
       <tag>"NO"</tag>
     </item>
    </one-of>
  </rule>
  <rule id="root" scope="private">
    <ruleref uri="#Lookup" />
  </rule>
</grammar>

Speech Recognition Code Sample

        public void RunScript()
        {
            try
            {
                // I set up a variable in the config file to manage the directory where we search for files.
                string gramFile = VoiceApp.Properties.Settings.Default.Directory + "YesNo.Gram";
 
                // We will loop up to 3 times to get an accepted response.
                for (int i = 0; i < 3; i++)
                {
                    VoiceResource.SpeechRecognitionEnabled = true;
                    VoiceResource.SpeechRecognitionGrammarFile = gramFile;
                    VoiceResource.SpeechRecognitionMode = VoiceElements.Interface.SpeechRecognitionMode.MultiplePlays;
 
                    // Set how long we wait until we timeout
                    VoiceResource.MaximumTime = 10;
 
                    // This determines if we allow the user to "Interrupt" the play, by saying something early.
                    VoiceResource.SpeechRecognitionPermitBargeIn = true;
 
                    // Determine how many digits we should accept
                    VoiceResource.MaximumDigits = 1;
 
                    VoiceResource.PlayTTS("Please say Yes or No, or press 0 to be transferred to an operator");
 
                    TerminationCode tc = VoiceResource.GetResponse();
 
                    // We should disable Speech Recognition now, otherwise it could be inadvertantly used on subsequent plays
                    VoiceResource.SpeechRecognitionEnabled = false;
 
                    Log.WriteWithId(DeviceName, "TC: {0} DigitBuffer: {1} Score: {2} Word: {3}", tc, VoiceResource.DigitBuffer, VoiceResource.SpeechRecognitionScore, VoiceResource.SpeechRecognitionReturnedWord);
                    if (tc == TerminationCode.Speech)
                    {
                        // Scores range between 0 and 1000. Generally, anything above 700 could be considered a strong positive detection
                        if (VoiceResource.SpeechRecognitionScore <= 700)
                        {
                            string word = VoiceResource.SpeechRecognitionReturnedWord;
                            VoiceResource.PlayTTS("You said: " + word + ". Goodbye");
                            break;
                        }
                    }
                    else if (VoiceResource.TerminationCodeFlag(TerminationCode.Digit) || VoiceResource.TerminationCodeFlag(TerminationCode.MaximumDTMF))
                    {
                        if (VoiceResource.DigitBuffer == "0")
                        {
                            // Transfer to operator
                            break;
                        }
                    }
 
                    VoiceResource.PlayTTS("I'm sorry, I did not understand your response");
                }
            }
            catch (HangupException hex)
            {
                Log.WriteWithId(DeviceName, "Caller Hungup!");
            }
            catch (Exception ex)
            {
                Log.WriteException(ex, "Unexpected exception: {0}", DeviceName);
            }
            finally
            {
                try
                {
                    ChannelResource.Disconnect();
                }
                catch{}
 
                try
                {
                    ChannelResource.Dispose();
                }
                catch{}
            }
        }

Note a few key items from the code sample above.

SpeechRecognitionPermitBargeIn

You can allow a user to “BargeIn” while you are performing a play. This way a user doesn’t have to wait until the audio is finished. However, this can sometimes negatively affect Speech Recognition performance (for example, if a user is using speakerphone, it may pick up trailing audio from what you are playing back). VoiceResource.SpeechRecognitionPermitBargeIn Property

Determining What Score to Use

The score that is returned by the Speech Recognition engine ranges from 0-1000. Generally, anything above a 700 could be considered a good positive score. However, you will want to do some testing on your own to determine an appropriate score, as the score returned is influenced by the number of words in the grammar file, and could be influenced on if they sound similar to other words/phrases in the grammar file.

Detecting Digits and Speech

Often, you will want to allow users to enter digits or use Speech. This code example, shows how to do so. When using Speech or Digits, you will want to get the termination code, and depending on what the termination code is you will need to handle accordingly. In the example above, if the user presses a digit, I will get a terminationcode of “Digit” or “MaximumDTMF”. Alternatively, if the user speaks, the terminationcode will be “Speech”. VoiceResource.MaximumDigits Property

Next Steps

This is a very simple example of using Speech Recognition technology. If there is something that you would like to develop, but aren’t sure how to start, or have questions, feel free to contact us at support@inventivelabs.com.

For a deeper dive into the Voice Elements Classes, visit VoiceResource Properties on our Developer Help site.