Programmable Voice

  1. Home
  2. Docs
  3. Programmable Voice
  4. Tutorials
  5. Speech Recognition

Speech Recognition

Download

This tutorial is a walkthrough explanation of the Speech Recognition sample solution that can be downloaded from your demo dashboard.  If you haven’t already, sign up for a demo account and get 100 minutes of free call time for 30 days with the Voice Elements servers. This tutorial covers how to write simple programmable voice applications with voice recognition functionality using Voice Elements.

This project contains two main classes that you should pay attention to; IvrApplication and InboundCall. It is important to note that this project is set up to run either as a windows service or as a Windows form.  While you are debugging, you will want to simply start the application using the default Windows form.  Once you are ready to deploy for production, we recommend that you install as a Windows Service. See the documentation for .net windows services here. See instructions on how to install as a Windows service here.

Once you have the project downloaded, unzip it and open SpeechRecognition.sln with Visual Studio. This application is already complete and ready to go, so you can test it out to see what it does before you start to read through the code to see how it works.  When you first compile the solution, it will automatically get the Voice Elements Client from NuGet.

Voice Elements MainCode

The core class of this project is IvrApplication. This class contains a lot of logic that sets up the application as a windows service so you can ignore a lot of the code in it for now. The most important method here is MainCode(). When the application is run, it starts a new thread which runs MainCode(). This connects to the Voice Elements servers in the cloud. Then loops indefinitely checking for new tasks to run, and inbound call events.

Note that Log.Write() is used frequently to log call progress and help with debugging. It is recommended that you continue to do this as you program your own Voice Elements applications.

The first thing MainCode() does is connect to the Voice Elements servers. This is done by constructing a new TelephonyServer object passing in server ip, username, and password as parameters. These values have already been generated for your account but you can change them in your Settings.settings file.

MainCode() also sets the CacheMode on the TelephonyServer object. ClientSession mode means that the server will stream and cache the files to and from your client machine. These files are flushed after you disconnect. Server mode means that the files reside on the server and will use the full path name to find them there. Note that Server mode can only be used on your own dedicated Voice Elements server.

After connecting to the server and setting its cache mode the new call event should be subscribed to. This sets a method to be called when an incoming call is received. In this example TelephonyServer_NewCall() is the method to be called on new incoming call events.

RegisterDNIS() is then called on the TelephonyServer to tell the server which phone numbers the application will handle. This method can be called with no parameters to instruct Voice Elements to handle calls from all phone numbers on your account. Otherwise you can specify numbers to handle as parameters.

try
{
    Log.Write("Connecting to: {0}", Properties.Settings.Default.PhoneServer);

    m_telephonyServer = new TelephonyServer("gtcp://" + Properties.Settings.Default.PhoneServer, Properties.Settings.Default.UserName, Properties.Settings.Default.Password);

    // CHANGE YOUR CACHE MODE HERE
    m_telephonyServer.CacheMode = VoiceElements.Interface.CacheMode.ClientSession;

    // SUBSCRIBE to the new call event.
    m_telephonyServer.NewCall += new VoiceElements.Client.NewCall(TelephonyServer_NewCall);
    m_telephonyServer.RegisterDNIS();

    // Subscribe to the connection events to allow you to reconnect if something happens to the internet connection.
    // If you are running your own VE server, this is less likely to happen except when you restart your VE server.
    m_telephonyServer.ConnectionLost += new ConnectionLost(TelephonyServer_ConnectionLost);
    m_telephonyServer.ConnectionRestored += new ConnectionRestored(TelephonyServer_ConnectionRestored);
}

Inbound Call

The InboundCall class is designated to handle most of the logic for an inbound call. When an inbound call event is generated and the TelephonyServer_NewCall() method in IvrApplication is called. An object of the InboundCall class is constructed with the TelephonyServer and ChannelResource objects as parameters. The RunScript() method is then called on the new InboundCall object.

static void TelephonyServer_NewCall(object sender, VoiceElements.Client.NewCallEventArgs e)
{
    try
    {
        Log.Write("NewCall Arrival! DNIS: {0}  ANI: {1}  Caller ID Name: {2}", e.ChannelResource.Dnis,
        e.ChannelResource.Ani, e.ChannelResource.CallerIdName);

        // Handle The New Call Here
        InboundCall inboundCall = new InboundCall(m_telephonyServer, e.ChannelResource);
        inboundCall.RunScript();
    }
    catch (Exception ex)
    {
        Log.WriteException(ex, "IvrApplication::NewCall");
        e.ChannelResource.Disconnect();
        e.ChannelResource.Dispose();
    }
}

The RunScript() method contains the logic for handling this inbound call. And in this project, all of the logic for speech recognition. Voice Elements uses Microsoft compatible grammar files for its speech recognition functionality. The first step to programming speech recognition is to set the SpeechRecognitionEnabled VoiceResource property to true. The SpeechRecognitionGrammarFile property must also be set, this is a string containing the file path to the xml grammar file. The SpeechRecognitionMode property must be set to MultiplePlays this makes it so that when speech is detected all subsequent play commands will be bypassed until speech recognition is stopped. The SpeechRecognititonPermitBargeIn property can be set to true so that a previous play command will stop when the user begins to speak. The MaximumTime property can be set to only wait for a limited amount of time for a response. The GetResponse() method is then called to begin recording the speech recognition message. The SpeechRecognitionEnabled property is then set back to false.

If speech was received the program will then check to see if the SpeechRecognitionScore property is high enough. This score ranges from 0 to 1000. Generally, anything above 700 could be considered a strong positive detection. If the speech was recognized the program plays back what it detected then does speech recognition again for confirmation.

try
{
    // Answer the call
    Log.WriteWithId(m_channelResource.DeviceName, "Answering...");
    m_channelResource.Answer();

    while (true)
    {

        // If you want to try a complex response (ie a phone number), test this method
        // string phoneNumber = GetPhoneNumber();


        // Enable the speech recognition functionality
        m_voiceResource.SpeechRecognitionEnabled = true;

        // Select the grammar file to use for this recognition
        // For further information on creating your own grammar file, go to http://support.voiceelements.com/index.php?title=How_do_I_create_Microsoft_Compatible_Grammar_Files%3F
        m_voiceResource.SpeechRecognitionGrammarFile = @"..\..\AudioFiles\OneToFour.xml";

        // Then set to multiple plays, when speech is detected it will bypass all 
        // subsequent play commands until speech recognition is stopped
        m_voiceResource.SpeechRecognitionMode = VoiceElements.Interface.SpeechRecognitionMode.MultiplePlays;

        // Enable Barge-In - This allows the talker to stop the play by speaking
        m_voiceResource.SpeechRecognitionPermitBargeIn = true;

        // Wait up to 5 seconds for a response
        m_voiceResource.MaximumTime = 5;
        // Only allow 1 digit to be entered (if they use the keypad instead of speech)
        m_voiceResource.MaximumDigits = 1;

        Log.Write("Playing menu options...");
        // Play a menu option, allowing users to press or say their response
        m_voiceResource.PlayTTS("Press or say 1,2,3 or 4");

        // If you want, you can specify a different voice to use instead of the default voice
        // m_VoiceResource.PlayTTS("Press or say 1,2,3 or 4"", "Microsoft Server Speech Text to Speech Voice (en-US, ZiraPro)");

        // If the user did not speak during the message or enter a digit, we will
        // wait for a response
        m_voiceResource.GetResponse();

        // Once complete waiting for a response, turn off voice recognition 
        m_voiceResource.SpeechRecognitionEnabled = false;


        // If we received speech, process it now
        if (m_voiceResource.TerminationCodeFlag(TerminationCode.Speech))
        {
            // Log what happened
            Log.Write("Captured Speech: {0} Score: {1}", m_voiceResource.SpeechRecognitionReturnedWord,
            m_voiceResource.SpeechRecognitionScore);

            // Scores range between 0 and 1000. Generally, anything above 700 could be 
            // considered a strong positive detection
            if (m_voiceResource.SpeechRecognitionScore >= 700) 
            {
                // Save this off because it will be overridden by the confirmation
                string response = m_voiceResource.SpeechRecognitionReturnedWord;

                // Turn on voice recognition
                m_voiceResource.SpeechRecognitionEnabled = true;

                // Use yes or no grammar file
                m_voiceResource.SpeechRecognitionGrammarFile = @"..\..\AudioFiles\YesNoComplex.xml";

                Log.Write("Playing confirmation...");
                // Check to see if the response was correct
                m_voiceResource.PlayTTS("You said: " + response + ". Is this correct? Say yes or no.");

                // If the user did not speak during the message, we will wait for a response
                m_voiceResource.GetResponse();

                // Turn off voice recognition
                m_voiceResource.SpeechRecognitionEnabled = false;

                // Log what happened
                Log.Write("Captured Speech: {0} Score: {1}", m_voiceResource.SpeechRecognitionReturnedWord, 
                m_voiceResource.SpeechRecognitionScore);

                if (m_voiceResource.SpeechRecognitionReturnedWord == "yes")
                {
                    // Handle the response

                    break;
                }
                else
                    continue; // Replay the main menu if the user didn't say yes
            }
            else
                m_voiceResource.PlayTTS("I'm sorry, I did not understand your response");
        }
        // If a digit was entered, process that now
        else if (m_voiceResource.TerminationCodeFlag(TerminationCode.Digit) || m_voiceResource.TerminationCodeFlag(TerminationCode.MaximumDTMF))
        { 
            m_voiceResource.PlayTTS("You pressed " + m_voiceResource.DigitBuffer);

            // TODO - Handle the response

            switch (m_voiceResource.DigitBuffer)
            {
                default:
                    break;
            }

            break;
        }

    }

    // Log often
    Log.Write("Playing 'Goodbye'");

    // Play the goodbye prompt.
    m_voiceResource.PlayTTS("Goodbye");
}

GetPhoneNumber() is an example method for using voice recognition with continuous speech. The main difference for continuous speech is that MaximumDigits and MaximumTime are not set.

private string GetPhoneNumber()
{
    // This shows how to get a continuous speech, like a 10 digit phone number
    // Turn on voice recognition
    m_voiceResource.SpeechRecognitionEnabled = true;

    // Use PHone Number grammar file (NOTE - there are many other types of speech in the file that you can switch to)
    m_voiceResource.SpeechRecognitionGrammarFile = @"..\..\AudioFiles\Phone_Number.gram";

    // Then set to multiple plays, when speech is detected it will bypass all 
    // subsequent play commands until speech recognition is stopped
    m_voiceResource.SpeechRecognitionMode = VoiceElements.Interface.SpeechRecognitionMode.MultiplePlays;

    // Enable Barge-In - This allows the talker to stop the play by speaking
    m_voiceResource.SpeechRecognitionPermitBargeIn = true;

    Log.Write("Asking for phone number...");
    m_voiceResource.PlayTTS("Say your 10 digit phone number.");

    Log.Write("Getting response...");
    // If the user did not speak during the message, we will wait for a response
    m_voiceResource.GetResponse();

    // Turn off voice recognition
    m_voiceResource.SpeechRecognitionEnabled = false;

    // Log what happened
    Log.Write("Captured Speech: {0}  Score: {1}", m_voiceResource.SpeechRecognitionReturnedWord,
                                                          m_voiceResource.SpeechRecognitionScore);

    if (m_voiceResource.SpeechRecognitionScore > 700)
    {
        // Handle the response

        m_voiceResource.PlayTTS("You said " + m_voiceResource.SpeechRecognitionReturnedWord);

        return m_voiceResource.SpeechRecognitionReturnedWord;
    }
    else
    {
        m_voiceResource.PlayTTS("I'm sorry, I did not understand your response");
        return string.Empty;
    }
}
Was this article helpful to you? Yes 9 No

How can we help?