Specifying embedded XML tags

  1. Home
  2. Programmable Voice
  3. HMP Elements
  4. Specifying embedded XML tags

Specifying embedded XML tags

You can specify embedded XML tags to change the way the text-to-speech engine produces output. Depending on your speech engine, you might be using SSML (Speech Synthesis Markup Language) or another XML-based markup language, such as Apple’s embedded speech commands, and Microsoft’s SAPI Text to speech (TTS) markup.

Examples of Microsoft Server Speech using SSL Markups


Play a pre-recorded audio file as part of a prompt. “<audio src=”c:\the_name_of_your_file.wav”> This is what I will say if the file is not found. </audio>”


Add a pause between words. “Five hundred milliseconds of silence <break time=”500ms” /> just occurred.”


Add emphasis to a word in a sentence. “The word <emphasis level=”strong”> boo </emphasis> is emphasized!”

Say-As Element

The say-as element provides guidance about pronunciation–date formats, cardinal and ordinal numbers, characters, time, and telephone numbers. See Say-As Element for more details. Here are a few examples:

To have each letter spelled out individually: “<say-as interpret-as=”characters”> TEST </say-as>”
To have text spoken as an ordinal number: (e.g. 3rd, 4th) “Select the <say-as interpret-as=”ordinal”>3rd</say-as> option.”
To specify a date format: “Today is <say-as interpret-as=”date” format=”mdy”> 4-24-2017 </say-as>”

Phoneme Element

Using the <phoneme> element, you can specify a phonetic pronunciation for a word or phrase. “His name is Mike <phoneme alphabet=”x-microsoft-ups” ph=”JH AU”> Zhou </phoneme>”


Prosody specifies the pitch, contour, range, rate, duration, and volume for speaking the contained text. See Prosody Element for more details. Here are a few examples:

“This is normal volume. <prosody volume=”40″> This is a whisper. </prosody>”
“This is normal pitch. <prosody rate=”-20%” volume=”40″> This is slow and quiet. </prosody>”


To define the voice to be used, you can use the Voice markup or use the overload on the PlayTTS command:

Voice Markup: “<voice name=”Microsoft Server Speech Text to Speech Voice (en-US, Helen)”> This is the text that the application will speak. </voice>”
PlayTTS Command: m_VoiceResource.PlayTTS(“This is the text that the application will speak.”, “Microsoft Server Speech Text to Speech Voice (en-US, Helen)”);


Examples of SAPI 5.3 Markups:

Volume Control: <volume level=”nn”/> where nn is a value from 1 to 100.

Absolute Rate Of Speech: <rate absspeed=”nn”/> where nn is a value from -10 to 10.

Relative Rate Of Speech: <rate speed=”nn”/> where nn is a value from -10 to 10.

Absolute Pitch: <pitch absmiddle=”nn”/> where nn is a value from -10 to 10.

Relative Pitch: <pitch middle=”nn”/> where nn is a value from -10 to 10.

Emphasis: “The word <emph> boo </emph> is emphasized!”

Spelling Control: “The Word <spell> Spell </spell> is spelled out.”

Silence Control: “Five hundred milliseconds of silence <silence msec=”500″/> just occurred.”

Pronunciation: “<pron sym=”h eh l l ow & w er l l d”/>”

Bookmarks: “<bookmark mark=”bookmark_one”/>Simple Text”

Parts Of Speech: “<partofsp part=”noun”> A </partofsp> is the first letter of the alphabet.”

Context: “<context id=”date_mdy”> 03/04/01 </context> should be March fourth, two thousand one.”

Voice: “<voice required=”Age=Teen”>A teen should speak this sentence – if a female, non-child teen is present, she will be selected over a male teen, for example.</voice>”

Language: “<voice required=”Language=409″>A U.S. English voice should speak this.</voice>”

Was this article helpful to you? No Yes 14

How can we help?