Programmable Voice

  1. Home
  2. Docs
  3. Programmable Voice
  4. How do I
  5. Create A Simple Speech Recognition Application

Create A Simple Speech Recognition Application

Traditionally, IVRs used DTMF (or digits entered by a user on their phone) to better direct users. As Speech Recognition technology has improved, more and more companies are taking advantage of this technology to improve their user experience.

It’s very easy to build Speech Recognition applications using Voice Elements. Voice Elements supports both Lumenvox, and the Microsoft Speech Platform (Include link) that has support for 18 languages. First, you will need to create a grammar file. Here is a link with a simple grammar file, that allows the user to say “Yes”, “No”, “Yeah”, “Yup”, and “Nah”.

Grammar files contain lists of words that you would like to be able to recognize. For example, if you wanted to let the user identify which state they are calling from, you could create a grammar file with all 50 states. It’s very easy to create grammar files. Here is a link that describes how to do so

Here is an example of how to use Speech Recognition:

        
public void RunScript()
{
    try
    {
        string gramFile = VoiceApp.Properties.Settings.Default.Directory + "YesNo.Gram"; // I set up a variable in the config file to manage the directory where we search for files.
 
        // We will loop up to 3 times to get an accepted response.
        for (int i = 0; i < 3; i++) 
        { 
            VoiceResource.SpeechRecognitionEnabled = true; 
            VoiceResource.SpeechRecognitionGrammarFile = gramFile; 
            VoiceResource.SpeechRecognitionMode = VoiceElements.Interface.SpeechRecognitionMode.MultiplePlays; 
            
            // Set how long we wait until we timeout 
            VoiceResource.MaximumTime = 10; 
            
            // This determines if we allow the user to "Interrupt" the play, by saying something early. 
            VoiceResource.SpeechRecognitionPermitBargeIn = true; 
            
            // Determine how many digits we should accept 
            VoiceResource.MaximumDigits = 1; 
            VoiceResource.PlayTTS("Please say Yes or No, or press 0 to be transferred to an operator"); 
            TerminationCode tc = VoiceResource.GetResponse(); 
            
            // We should disable Speech Recognition now, otherwise it could be inadvertantly used on subsequent plays 
            VoiceResource.SpeechRecognitionEnabled = false; 
            
            Log.WriteWithId(DeviceName, "TC: {0} DigitBuffer: {1} Score: {2} Word: {3}", tc, VoiceResource.DigitBuffer, VoiceResource.SpeechRecognitionScore, VoiceResource.SpeechRecognitionReturnedWord); 
            if (tc == TerminationCode.Speech) { if (VoiceResource.SpeechRecognitionScore >= 700) // Scores range between 0 and 1000. Generally, anything above 700 could be considered a strong positive detection
                {
                    string word = VoiceResource.SpeechRecognitionReturnedWord;
                    VoiceResource.PlayTTS("You said: " + word + ". Goodbye");
                    break;
                }
            }
            else if (VoiceResource.TerminationCodeFlag(TerminationCode.Digit) || VoiceResource.TerminationCodeFlag(TerminationCode.MaximumDTMF))
            {
                if (VoiceResource.DigitBuffer == "0")
                {
                    // Transfer to operator
                    break;
                }
            }
 
            VoiceResource.PlayTTS("I'm sorry, I did not understand your response");
        }
    }
    catch (HangupException hex)
    {
        Log.WriteWithId(DeviceName, "Caller Hungup!");
    }
    catch (Exception ex)
    {
        Log.WriteException(ex, "Unexpected exception: {0}", DeviceName);
    }
    finally
    {
        try
        {
            ChannelResource.Disconnect();
        }
        catch{}
 
       try
        {
            ChannelResource.Dispose();
        }
        catch{}
    }
}

 

There are a few key items:

SpeechRecognitionPermitBargeIn

You can allow a user to “BargeIn” while you are performing a play. This way a user doesn’t have to wait until the audio is finished. However, this can sometimes negatively affect Speech Recognition performance (for example, if a user is using speakerphone, it may pick up trailing audio from what you are playing back).

Determining What Score to Use

The score that is returned by the Speech Recognition engine ranges from 0-1000. Generally, anything above a 700 could be considered a good positive score. However, you will want to do some testing on your own to determine an appropriate score, as the score returned is influenced by the number of words in the grammar file, and could be influenced on if they sound similar to other words/phrases in the grammar file.

Detecting Digits and Speech

Often, you will want to allow users to enter digits, or use Speech. This code example, shows how to do so. When using Speech or Digits, you will want to get the termination code, and depending on what the termination code is you will need to handle accordingly. In the example above, if the user presses a digit, I will get a terminationcode of “Digit” or “MaximumDTMF”. Alternatively, if the user speaks, the terminationcode will be “Speech”.

Next Steps

This is a very simple example of using Speech Recognition technology. If there is something that you would like to develop, but aren’t sure how to start, or have questions, feel free to contact us at support@inventivelabs.com

Was this article helpful to you? Yes 13 No

How can we help?