VXMLRef 007-02542-0025 R4.21 v01
VXMLRef 007-02542-0025 R4.21 v01
VXMLRef 007-02542-0025 R4.21 v01
VOICEXML
INTERFACE REFERENCE GUIDE
RELEASE 4.21
Date
August 2010
Proprietary Information
Copyright 20012010 RadiSys Corporation. All rights reserved. RadiSys and Convedia are registered trademarks of RadiSys Corporation. CMS-3000, CMS-6000, CMS-9000, eXMP, and eXtended Media Processing are trademarks of RadiSys Corporation. Red Hat and Red Hat Linux are registered trademarks of Red Hat, Inc. Linux is a registered trademark of Linus Torvalds. All other trademarks, registered trademarks, service marks, and trade names are the property of their respective owners. No part of this publication may be reproduced, modified, transmitted, transcribed, stored in any retrieval system, or translated into any language in any form, in whole or in part, by any means without the express prior written permission of RadiSys Corporation. RadiSys Corporation reserves the right to make changes to software, hardware, and documentation without notice. For the most recent version of documentation, visit the RadiSys web site at: www.radisys.com/service_support/convedia_support.cfm. This product may include the third-party software detailed in the installation manual for your media server.
Contact Information
RadiSys Corporation 4190 Still Creek Drive, Suite 300 Vancouver, BC V5C 6C6 Canada RadiSys Technical Assistance Center (TAC) Phone: +1-800-622-2235 (North America only, toll free) Phone: +1-604-918-6415 E-mail: [email protected] To access support for Convedia Media Servers from the RadiSys web site, go to: www.radisys.com/service_support/convedia_support.cfm.
TABLE OF CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .ix List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi List of Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Guide Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii RadiSys Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix Whats New in Release 4.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi New Features in R4.21.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi New Features for SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Behavior Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Documentation Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Release Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Table of Contents
SSML Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 General XML Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 SIP Transport of VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Request-URIs for the dialog Service Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Passing Variables to the VoiceXML Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Standard Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Application Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Terminating VoiceXML Dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Sample VoiceXML Call Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 VoiceXML Interaction with HTTP Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 HTTP Server-Side Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 HTTP Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Set-Cookie Response Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Cookie Request Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 ASR and TTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 User Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 System Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Session Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 ECMAScript Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Escape Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Working with Media Files and TTS Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Media Clip Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Clip Delineation in Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Referring to Media Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 HTTP Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Relative URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Sets and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
iv
Table of Contents
Radisys Confidential
Table of Contents
<example> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 <exit> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 <field> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 <filled> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 <form> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 <goto> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 <grammar> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 <help> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 <if> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 <initial> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 <item> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 <link> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 <log> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 <mark> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 <menu> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 <meta> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 <metadata> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 <noinput> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 <nomatch> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 <one-of> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 <option> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 <p> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 <param> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 <phoneme> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 <prompt> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Prompt Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Barging and Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 <promptcontrol> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 <property> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 <prosody> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 <record> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Storage of Recorded Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Size of Streamed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Encoding of Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Stopping Recordings with DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Setting a Pre-Speech Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Trimming Post-Speech Silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Appending to a Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 <reprompt> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 <return> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 <rule> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 <ruleref> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 <s> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 <say-as> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 <script> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 <speak> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
vi
Table of Contents
<sub> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 <subdialog> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 <submit> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 <throw> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 <value> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 <var> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 <voice> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 <vxml> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Appendix A: Best Practices for VoiceXML Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Glossary of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Radisys Confidential
vii
Table of Contents
viii
LIST OF TABLES
Table 1-1 Table 1-2 Table 1-3 Table 1-4 Table 1-5 Table 1-6 Table 1-7 Table 1-8 Table 1-9
VoiceXML 2.0 Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 VoiceXML 2.1 Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 SRGS Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 SSML Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Event Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Error Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 VoiceXML: Supported Media Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 VoiceXML: Supported Media Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Referencing Named Media Files in VoiceXML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 1-10 Referencing Indexed Audio Files in VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Table 2-1 Table 2-2 Table 2-3 Table 2-4 Table 2-5 Table 2-6 Table 2-7 Table 2-8 Table 2-9 Property Support Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 MRCP Speech Recognizer Properties Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 General Speech Property Elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Generic DTMF Recognizer Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Prompt Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 fetchhint Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 maxage Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 maxstale Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Support for Other Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Table 2-10 Support for Object Fetch Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Table 2-11 Fax Detection Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Table 2-12 Interaction of bargein and Fax Tone Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Table 2-13 Interaction of dtmfterm and Fax Tone Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Table 3-1 Table 3-2 Table 3-3 Table 3-4 Default Input Modes for VoiceXML Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Mechanisms for Setting Input Mode Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Interaction of Input Mode and Grammar Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Conversion of Built-In Speech Grammars to XML-SRGS Grammars . . . . . . . . . . . . . . . . . . 56
List of Tables
Table 4-1 Table 4-2 Table 4-3 Table 4-4 Table 4-5 Table 4-6 Table 4-7
Conversion of <field> type Attribute to <grammar> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 Prompt Completion Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 DTMF Collection Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Effect of Barging Announcements on the Digit Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Recording Shadow Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Supported Encoding Formats for Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Summary of append Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
LIST OF EXAMPLES
Example 1-1 Example 1-2 Example 1-3 Example 1-4 Example 1-5 Example 1-6 Example 1-7 Example 3-1 Example 3-2 Example 3-3 Example 3-4 Example 3-5 Example 3-6 Example 3-7 Example 3-8 Example 3-9 Example 3-10 Example 4-1 Example 4-2 Example 4-3
Request-URI for dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Request-URI for dialog with Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 SIP INVITE with Query in URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 SIP Dialog with Query in URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Example Server-Side Perl Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Relative URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Inline SRGS DTMF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Inline SRGS Voice Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Boolean Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Currency Built-In DTMF Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Date Built-In DTMF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Digits Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Number Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Phone Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Time Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Variable Maximum Digit Length in a DTMF Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Alternate Audio Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 <option> Grammar Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 XML-SRGS Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
List of Examples
xii
application.cvd_lastprompt$.bargein ............................................................................................................................ application.cvd_lastprompt$.duration........................................................................................................................... application.cvd_lastprompt$.lasturl ............................................................................................................................... application.cvd_lastprompt$.lasturl_offset .................................................................................................................. application.cvd_lastresult$.faxtyp................................................................................................................................... application.cvd_lastresult$.termcond............................................................................................................................ application.lastresult$.confidence................................................................................................................................... application.lastresult$.inputmode .................................................................................................................................. application.lastresult$.interpretation.............................................................................................................................. application.lastresult$.utterance...................................................................................................................................... name$.duration .................................................................................................................................................................. name$.maxtime.................................................................................................................................................................. name$.size........................................................................................................................................................................... name$.termchar .................................................................................................................................................................
125 125 125 126 127 127 127 126 126 126 136 136 136 136
List of Examples
PREFACE
This guide describes the Voice Extensible Markup Language (VoiceXML) interface to the Convedia Media Server. It provides a brief overview of VoiceXML, highlighting core concepts. It also documents Convedia Media Server compliance with the VoiceXML specification [13] and [14] describing extensions, deviations, and/or omissions from the specification. The VoiceXML 2.0 language is defined by the W3C Recommendation specifying the language [13]. VoiceXML 2.1 is defined by [14].For a full description of VoiceXML, the reader is referred to that Recommendation, which remains the normative implementation reference. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported by the Convedia Media Server in this release. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification. This preface describes this guide, laying out its organization, the assumptions made about the reader, and the conventions used in the guide. It also explains how to get technical support, and describes the features that are new in this release. The following information is presented: Intended Audience Guide Organization Document Conventions RadiSys Publications Technical Support Whats New in Release 4.21
Preface
Intended Audience
This guide is intended for applications developers and other technical personnel wanting to communicate with a Convedia Media Server from a control agent (that is, from a softswitch or an application server) using SIP and VoiceXML. Readers should be thoroughly conversant with application programming using Session Initiation Protocol (SIP).
Guide Organization
Thisguideisorganizedasfollows:
Chapter 1: VoiceXML Overview Chapter 2: VoiceXML Properties Chapter 3: DTMF and Voice Grammars Chapter 4: VoiceXML 2.0 Elements Chapter 5: VoiceXML 2.1 Elements Chapter 6: ECMAScript Language Binding for the DOM Appendix A: Best Practices for VoiceXML Development References Glossary of Acronyms This chapter provides an overview of the core concepts of the Voice Extensible Markup Language (VoiceXML). This chapter describes the media servers support for VoiceXML properties. This chapter describes the media servers support for DTMF and voice grammars in VoiceXML. This chapter describes the VoiceXML 2.0 elements currently supported by the Convedia Media Server, including SRGS and SSML elements. This chapter describes the VoiceXML 2.1 elements currently supported by the Convedia Media Server. This chapter describes the ECMAScript binding for the subset of Level 2 of the DOM. This appendix describes some development practices that can help you maximize performance and capacity of your VoiceXML applications.
xvi
Document Conventions
Document Conventions
This guide uses the following advisory paragraphs:
Warning: Warnings alert you to situations that may pose a threat to personal safety.
Caution: Cautions alert you to situations that might cause harm to your system or damage to equipment, or that may affect service.
Note: Notes provide information you might need to avoid problems or configuration errors.
In addition to advisory paragraphs, the following typographic conventions are used in RadiSys guides:
Monospace font is used in special example paragraphs to indicate code samples and console output. Angle brackets surrounding Monospace font are used to indicate elements in a markup language, such as VoiceXML and MSML. Boldface Monospace font is used in examples where you must interact with the system. The text in boldface Monospace represents information you must enter. Boldface font is used to indicate file names, comnmands, and any term in a formal languagefor example, a signal or parameter in MGCP, an attribute in MSML, a property in VoiceXML, and other methods, classes, and headers. Italic font is used in command or element syntax, and inline, to indicate arguments and variables, that is, values that you must supply. Upper case is used to indicate protocol requests and messages, for example, a PUT request in HTTP, a SYN packet in TCP, or an INVITE or BYE message in SIP. Angle brackets are used to indicate a key on your keyboard. Combinations of keys are joined by plus signs (+), for example <Ctrl>+<Alt>+<Del>. Square brackets enclose elements that are optional in a syntax. Curly brackets enclose a set of syntax elements where exactly one element must be chosen.
boldface
italics
CAPS
<key>
[] {}
Radisys Confidential
xvii
Preface
arg | arg
Vertical bars are used to separate elements that are strict alternatives (exclusive OR). When vertical bars are used, only one alternative can be chosen. The typographic convention at left indicates a value that can optionally represent a space-separated list of the same kind of element (for example, a space-separated list of IP addresses). The typographic convention at left indicates a value that can optionally represent a comma-separated list of the same kind of element (for example, a comma-separated list of IP addresses). The typographic convention at left indicates a value that can optionally represent a hyphen-separated range of values (for example, a range of IP addresses).
arg [arg...]
arg[, arg...]
arg[-arg...]
RadiSys Publications
The following product documentation is available for RadiSys products. Download the correct version of the documents you need from the RadiSys web site at www.radisys.com.
.
Convedia Media Server System Description IMMS 3G-324M-Integrated Media Server Solutions Guide CMS-9000 Media Server User Guide
Provides a high-level overview of RadiSys Convedia Media Servers. Provides an overview of the IMMS 3G-324M-Integrated Media Server and its place in the network. Describes the CMS-9000 Media Server, and explains how to perform operations, administration, management on the CMS-9000 Media Server using the web GUI. Provides hardware installation and maintenance procedures for the CMS-9000, up to and including RS-232 console configuration. Describes the CMS-6000 Media Server, and explains how to perform operations, administration, management on the CMS-6000 Media Server using the web GUI. Provides hardware installation and maintenance procedures for the CMS-6000, up to and including RS-232 console configuration. Describes the CMS-3000 Media Server, and explains how to perform operations, administration, management on the CMS-3000 Media Server using the web GUI.
CMS-9000 Media Server Hardware Installation Manual CMS-6000 Media Server User Guide
CMS-6000 Media Server Hardware Installation Manual CMS-3000 Media Server User Guide
xviii
Technical Support
CMS-3000 Media Server Hardware Installation Manual Convedia Software Media Server User Guide
Provides hardware installation and maintenance procedures for the CMS-3000, up to and including RS-232 console configuration. Describes the Convedia Software Media Server, and explains how to perform operations, administration, management on the Convedia Software Media Server using the web GUI. Describes the Convedia Software Media Server, and explains how to perform operations, administration, management using the web GUI on the Convedia Software Media Server when the operational mode is configured to co-resident mode. Provides software installation and maintenance procedures for the Convedia Software Media Server, up to and including initial network configuration. Describes the media servers support for SIP, and how to use the SIP interface. Describes the media servers support for VoiceXML 2.0 and 2.1, and how to use the VoiceXML interface. Describes the media servers support for MSML 1.1, and how to use the MSML interface. Describes the media servers support for MGCP, and how to use the MGCP interface. Describes the media servers support for H.248/MEGACO, and how to use the H.248 interface. Describes the media servers support for SNMP, and how to use the SNMP interface. Describes the media servers sets and variables feature, and support for each language. Explains how to configure and use the media server to interoperate with external devices such as NFS servers, HTTP servers, speech servers, and video terminals. Provides general guidelines for expected performance and capacity for RadiSys Convedia media servers.
Convedia Software Media Server Installation Manual Convedia Media Server SIP Interface Reference Guide Convedia Media Server VoiceXML Interface Reference Guide Convedia Media Server MSML 1.1 Interface Reference Guide Convedia Media Server MGCP Interface Reference Guide Convedia Media Server H.248 Interface Reference Guide Convedia Media Server SNMP Interface Reference Guide Convedia Media Server Sets and Variables Interface Reference Guide Convedia Media Server Special Interfaces Reference Guide Convedia Media Server Capacity and Performance Reference Guide
Technical Support
Technical support is available from the RadiSys Technical Assistance Center (TAC). Support is governed by the terms of your agreement with RadiSys Corporation.
Radisys Confidential
xix
Preface
TAC can be reached using the following contact information: RadiSys Corporation 4190 Still Creek Drive, Suite 300 Vancouver, BC V5C 6C6 Canada RadiSys Technical Assistance Center (TAC) Phone: +1-800-622-2235 (North America only, toll free) Phone: +1-604-918-6415 E-mail: [email protected] To access support for Convedia Media Servers from the RadiSys web site, go to: www.radisys.com/service_support/convedia_support.cfm.
xx
Radisys Confidential
xxi
Preface
When enabled, the media servers MSML interface reports to the control agent as events significant state changes in the 3G-324M session, such as the establishment of logical channels. For an overview of the RadiSys 3G-324MIntegrated Media Server, please see the Integrated Mobile Media Server (IMMS) 3G-324M-Integrated Media Server Solutions Guide. Complete details of the media servers support for the 3G-324M protocol are given in the Convedia Media Server Special Interfaces Reference Guide. The User Guide for your media server describes how to configure the integrated 3G-324M gateway. Additional usage information is provided in the protocol guides. 3G-324M session statistics This release introduces new statistics for 3G-324M sessions. For every statistics interval the media server reports the number of sessions created, maximum concurrent sessions, successful and failed sessions set up. The media servers existing per-port statistics are supported for 3G-324M sessions (with the exception of those related to the jitter buffer). For more information about new statistics, please see the Convedia Media Server SNMP Interface Reference Guide and the User Guide for your platform.
Behavior Changes
There are no behavior changes in this release.
Documentation Changes
New Integrated Mobile Media Server (IMMS) Solutions Guide This release introduces a new book, the Integrated Mobile Media Server (IMMS) 3G-324MIntegrated Media Server Solutions Guide, which provides an overview of the first IMMS product, a media server with integrated 3G-324M video gateway functionality. New 3G-324M Gateway chapter in Convedia Media Server Special Interfaces Reference Guide This release adds a new chapter, 3G-324M Gateway, to describe the media servers support for 3G-324M sessions. Changes to the Convedia Media Server MSML 1.1 Interface Reference Guide The Convedia Media Server MSML 1.1 Interface Reference Guide has been restructured to better reflect the organization in the MSML specification, RFC 5707.
Release Limitations
This release does NOT support the following media server features, available in the previous release (R4.20) of the CMS-9000 and CMS-3000 media servers: RFC 4117 transcoding Audio transcoding services as an RFC 4117 Transcoding Server (T), providing transcoding services between two SIP User Agents (UAs) through the use of Third Party Call Control (3pcc). New hardware: TPC-I A new Transcoding Processor Card (TPC-I) dedicated to providing RFC 4117 audio transcoding services on the CMS-9000.
xxii
EVRC codec 3G2 C.S0014-0 Enhanced Variable Rate Codec (EVRC-A) codec for EVRC0 media type specified in RFC 3558. Automatic noise reduction Automatically activating and inactivating noise reduction based on a configured threshold. MSML support for CRBT random ring MSML <play> elements start and end attributes, used to select part of an announcement. NLD reports change to noise type Events for changes in the type of noise (background, impulsive, continuous-signal noise, or a low SNR) exceeding configured limits. MSML configuration of per-port statistics Configuring the per-port statistics through the MSML interface. Per-port statistics can be configured through the SIP interface. T.38 fax data is replicated on G.711 ports Replicating T.38 fax data when the call is negotiated as G.711 in the SIP group context. Enhancements to SIP Custom Profile 2 for facsimile services Enhancements to SDP for fax support and changes to case sensitivity. 3G2 file format Multimedia, audio-only, and video-only announcements in the 3G2 file format as defined in 3GPP2 C.S0050. SIP message serialization SIP message serialization prevents out-of-order delivery of SIP messages. R4.20 also introduced a number of new VQE (voice quality enhancement) statistics and improvements to the echo cancellation algorithms that are not implemented in this release. Additionally, the following CMS-9000 behavior changes of R4.20 are not implemented in this release: Default network topology change from Internal Control Subnet to External Control Subnet. Binding the Apache HTTP daemon service to the management interface on the SCC. For detailed descriptions of these features, please see the documentation for R4.20.
Radisys Confidential
xxiii
Preface
xxiv
Chapter1:
VOICEXML OVERVIEW
This chapter provides an overview of the core concepts of the Voice Extensible Markup Language (VoiceXML). This chapter presents the following information: Introduction VoiceXML Structure Protocol Support SIP Transport of VoiceXML VoiceXML Interaction with HTTP Servers ASR and TTS User Input System Output Control Flow Session Termination Shadow Variables Events Errors ECMAScript Support Escape Characters Working with Media Files and TTS Strings
RadiSys Confidential
VoiceXML Overview
Introduction
VoiceXML is an XML-based markup language for creating user dialogs or Interactive Voice Response (IVR) interactions. VoiceXML provides an extensive mechanism for developing simple or complex IVR applications. The ability to create modular applications from many reusable subcomponents enables VoiceXML developers to create complex IVR applications in a short period of time. The widespread adoption of VoiceXML, together with its inherent similarities to data-centric user dialogs, make it a powerful language for IVR application development. The media server supports a rich set of VoiceXML mechanisms for creating simple or elaborate IVR applications: Playing of streamed audio files, stored inside the media server or on external NFS and HTTP servers Inband and RFC 2833 DTMF detection, collection, and interpretation Detection of user speech input Support for built-in, SRGS, Menu-Choice, and Option grammars for both DTMF and speech Support for playing Text to Speech media clips Recording of audio and video to internal memory or external NFS and HTTP servers Playback of user-recorded audio and video to internal memory or external NFS and HTTP servers Support for VCR-like controls (skip forward, skip back, pause, resume, append) CNG and CED fax detection and notification capabilities Embedding of complex functions (ECMAScript/JavaScript) Dialog control flow The ability to transfer the caller to another destination, such as another telephone line or voice application The basis of all VoiceXML dialogs consists of sending audio prompts to the user and collecting user input in the form of DTMF digits. An example application is a user dialing up a service center and ibeing prompted to select from several spoken options by pressing the corresponding telephone key. Upon receiving the DTMF information the VoiceXML application determines what action to take.
VoiceXML Structure
This section presents the following topics: VoiceXML Documents Dialogs Forms Mixed-Initiatives
RadiSys Confidential
VoiceXML Structure
VoiceXML Documents
A VoiceXML application consists of one or more VoiceXML documents, or scripts. The IVR session with a user begins at the invocation of the first VoiceXML document associated with the application. This document is called the root document of the application. During the IVR session any number of additional documents (leaf documents) may be fetched and loaded, and then unloaded, until the user ends the IVR session according the application dialog flow. During the IVR session the root document may reference, or call, other supporting VoiceXML documents, as in the illustration below. During a given session, any number of documents may be loaded and unloaded. While a subdocument is loaded, information from higher-level documents remains available to the session. Although applications alwaysbegin by loading the root document, they can terminate from any document, including subdocuments. This ends the users IVR session. Alternatively, an external control agent (for example, a SIP User Agent) can forcibly terminate the IVR session at any point.
Root
Document D1
Document D2
Document
Dialogs
The foundation of the VoiceXML application is the dialog, which takes place between the application and the user. VoiceXML dialogs define interactions between a user and the network through an IVR session. Once the application has launched, the user interacts with it through VoiceXML dialogs and subdialogs. Dialogs are composed of VoiceXML elements. The dialog is a series of audio or video prompts to the user, streamed over RTP, and subsequent collection of user input in the form of DTMF key presses or speech inputs, which are detected and reported to the VoiceXML session. The control logic defined in the VoiceXML application (that is, the document or script) defines when media is played to the user and when user input is collected, to create a dynamic media-based user dialog or IVR session, similar in nature to a web or HTML-based data-centric user dialog session. The following types of dialogs can be created using VoiceXML:
RadiSys Confidential
VoiceXML Overview
System-directed. In system-directed dialogs, the system leads the user by asking questions and waiting for user input. User-directed. In user-directed dialogs, the user input controls the dialof flow. Mixed-initiative. In mixed-initiative dialogs, either the system or the user can direct dialog flow. These types of dialogs are more complex and are difficult to implement in DTMF-based systems. With speech-based grammars, this type of dialog is more practical to implement. DTMF input is obtained from users through either forms or menus.
Forms
The form item is the primary mechanism of prompting the user. A form-based dialog plays an audio or multimedia prompt to the user. In response, the user presses some sequence of DTMF digits or responds with speech input, in which the input is expected to match the field format or grammar. If the collected choices match the expected grammar, they are said to fill the field. Collected input matching the expected grammar is assigned to the field variable. The field variable can then be used as a standard variable (as in a standard programming language) within further logic and control flow. Additionally, the collected input contained in the variable can be submitted to an external application using the HTTP protocol. Each form item may consist of two sections: a user input item or a form control item. The media server supports any of the following elements as user input items: <field>: This element allows the user to enter DTMF according to a pre-determined format or grammar. <record>: This element records audio spoken by the user. <subdialog>: This element moves the user to another location (a subdialog) in the application. When the subdialog is complete, control returns to the calling dialog. The media server supports any of the following elements as form control items: <block>: This element does not collect input, but rather defines a set of executable statements for prompting. <initial>: Defines the initial control for the form when using mixed-initiative dialogs (where either the system or the user can direct the dialog flow).
Menus
A menu-based dialog presents the user with a number of choices. The menu item is a simplified version of a form item, designed to present the user with a fixed set of choices. The choices are presented as a series of audio or multimedia prompts played to the user. In response, the user presses some sequence of DTMF digits or speaks, and the inputs are collected and interpreted by the application. For example, a simple menu item may ask the user to press DTMF digit 1 to hear a weather report, to press DTMF digit 2 to hear a sports report, and to press DTMF digit 3 to hear a traffic report. If the user input matches one of the choices, then the application transitions control to another location within the document, or to another document in the application, as specified for the given choice.
RadiSys Confidential
VoiceXML Structure
Menu Play prompt to user requesting choice (1, Wait for user input User Choice 1: action 1 (jump to location User Choice 2: action 2 (jump to location User Choice 3: action 3 (jump to location No User Input: No-input action End Menu
2 or 3) from user 1) 2) 3)
Mixed-Initiatives
A mixed initiative is a <form> element containing one or more <form>-level grammars, where both the user and the application can define the direction the dialog will take. A common mechanism for implementing this is to use an <initial> element that prompts the user for general information. The results of the users input then directs the user to specific fields with specific prompts and possibly other grammars defined. This mechanism is most commonly used in voice-based applications.
Elements
A VoiceXML element invokes an action. For example, the <prompt> element defines the output to be played to a user. The scope of an element is from its opening tag to its closing tag, as in the following example:
<prompt>....</prompt>
An element can have child elements nested within its scope or can itself be a child nested within the scope of a parent element. Elements can have attributes associated with them with values that can be set. The media servers support for elements is summarized in the section Protocol Support on page 7. Each supported element is described in detail in Chapter 4: VoiceXML 2.0 Elements and Chapter 5: VoiceXML 2.1 Elements
Subdialogs
A subdialog allows a user to enter into another dialog. Upon returning from the subdialog, the original dialog continues from the place where it left. Parameters can be passed into the subdialog and the subdialog can return values to the calling control logic. The subdialog mechanism is much like a subroutine in a standard programming language.
RadiSys Confidential
VoiceXML Overview
Subdialogs are useful in creating and organizing commonly used dialog functions as a libraries, which can be reused by many applications.
Scope
Whenever a supporting document is loaded by the VoiceXML interpreter, the root document is also loaded. This provides the interpreter with all the global information necessary to properly apply values to variables, links, and events. However, a value may be redefined within a different scope. The concept of scope applies to grammars, variables, links, and event handling. Scope determines the order of precedence for VoiceXML tags. Scope allows developers to: Control the global behavior of an application Group logically related tasks into documents Break down large applications into more manageable, faster-loading modules VoiceXML has a number of scopes, listed here in order of decreasing scope and increasing precedence. Session. Session variables are declared by the platform on which the voice application is deployed. Session variables apply to an entire user session. They are read-only, which means they cannot be modified within any VoiceXML document, either the root document or a supporting document. Application. Applications are declared within the <vxml> tag of the root document. Values assigned at the application level are initialized when the root document is loaded, and apply as long as it remains loaded. These values are available to any element within the root document or any supporting document referenced by the root document. Document. Values within documents are assigned within the <vxml> tag of a supporting document. Document values are initialized when the supporting document is loaded, and remain available as long as the document is loaded. Document values are available to any dialog within the document. Document values are not available across documents. Dialog. Values for dialogs are declared within the <form> or <menu> tags in a document. Values for dialogs are available only to the elements within the dialog for which they are declared. For executable content, values are initialized when the content is executed and are released when execution terminates. For form/field items, values are initialized when the form item is collected. Elements. Values for elements apply to any of its child elements. Precedence of values increases as the scope becomes more local. That is, the session scope has the least precedence, and values within a dialog have the greatest precedence. Another way to say this is that global scoping behavior can be overridden by declaring parameters at a lower level; locally defined values always override values defined at a higher level. For example, the scope of variables from broadest to narrowest is as follows:
Session > Application > Document > Dialog > Anonymous
On the other hand the precedence of variables from highest to lowest is as follows:
RadiSys Confidential
Protocol Support
Protocol Support
This section describes the media servers support for the following protocols: VoiceXML 2.0 Elements VoiceXML 2.1 Elements SRGS Elements SSML Elements General XML Handling Use of an unsupported VoiceXML element results in an error.unsupported event. Use of an unsupported SRGS element results in an error.badfetch or an error.grammar, depending on when it is encountered. An unsupported SSML elements (and its content) is ignored in order to maximize compatibility with documents that include SSML elements as alternatives to prerecorded audio files. SSML elements are not supported within SRGS grammars.
RadiSys Confidential
VoiceXML Overview
RadiSys Confidential
Protocol Support
<foreach>
Yes. The media server does not support and rejects the following child elements of <foreach> in this release: <aws> and <enumerate>. The media server ignores the following child elements of <foreach> in this release: <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <value>, and <voice>.
SRGS Elements
The SRGS specification is given in [11]. Table 1-3 shows which elements from that Recommendation are supported. Note that, while the media server supports all SRGS elements for voice grammars, the actual support for voice is a function of the specific support provided by the external speech server deployed. Whether the external server support all the elements supported by the media server depends on the server deployed.
RadiSys Confidential
VoiceXML Overview
Note also that even if supported or ignored, if used illegally, an element will be rejected with an error. For example, an SRGS element that would be ignored if used correctly will be rejected with an error if enclosed directly within a VoiceXML element.
Table 1-3 SRGS Supported Elements
Element <example> Description [SRGS] Provides an example phrase that matches the input specification. Defines user input rules for DTMF or voice. [SRGS] Defines valid user input, as part of a DTMF or voice grammar rule. [SRGS] Defines valid user input, as part of a DTMF or voice grammar rule. Defines page information. Supported DTMF: Ignored Voice: Yes Yes Yes DTMF: Ignored Voice: Yes DTMF: Ignored Voice: Yes DTMF: Ignored Voice: Yes Yes Yes DTMF: Rejected Voice: Yes DTMF: Rejected Voice: Yes
<meta>
<metadata>
[SRGS] Defines information about a document using a metadata schema. [SRGS] Allows one selection from a list of alternatives. [SRGS] Defines a grammar rule for an inline DTMF or voice grammar. [SRGS] Allows another voice grammar rule to be included.
<token>
SSML Elements
The SSML specification is given in [5]. Table 1-4 shows supported elements from that Working Draft plus supported VoiceXML extensions as per [13]. Please note that, for SSML elements, supported means that the media server passes the request to the external speech server. The behavior for the element depends on the behavior of the speech server and this can vary. That is, from the point of view of the media server, all SSML elements except <speak> may be included in a VoiceXML document; whether the external server supports them is an independent matter.
10
RadiSys Confidential
Protocol Support
Note also that even if an element is supported or ignored, if used illegally, it is rejected with an error. For example, an SSML element that would be ignored if used correctly will be rejected with an error if used illegally within an SRGS grammar.
Table 1-4 SSML Supported Elements
Element <break> <desc> <emphasis> <enumerate> Description Inserts a pause or silence into audio. [SSML] Provides a textual description of audio content. [SSML] Directs the speech server to add emphasis to surrounded text. [VoiceXML extension] This element is defined in [13]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] Places a marker into a text or tag sequence. [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide. [SSML] Represents a paragraph. [SSML] Provides a phonemic/phonetic pronunciation for the contained text. [SSML ] Permits control of the pitch, speaking rate and volume of the speech output [SSML] Represents a sentence. [SSML] Defines a text string to be rendered as an audio clip. [SSML] The root element of SSML. [SSML] Replaces the contained text with a substitute. [VoiceXML extension] [SSML] Requests a change in speaking voice. Supported Yes Yes Yes Yes
<lexicon>
Yes
<mark> <meta>
Yes Yes
<metadata>
Yes
a. The <speak> element is not supported directly in VoiceXML scripts. All TTS scripts are rendered into <speak> SSML XML scripts which are then passed to an external server for playing if an external server is active. A parse error results if a <speak> element with TTS text is included in a VoiceXML file.
RadiSys Confidential
11
VoiceXML Overview
12
RadiSys Confidential
If the media server was unable to retrieve or successfully parse the document, it retursn a 404 (Not found) response. The SIP Request-URI delay parameter is measured in units of milliseconds instead of in 100-millisecond increments and can be set to up to 99999 msec.
sip:[email protected];voicexml=http://host.company.com/scripts/ ivr.vxml
The URL must not exceed 1024 characters. The HTTP URI can include a query component, to allow the document to be dynamically generated by the server.
Note: The query delimiter character (?) must be escaped as %3f, since ? is a reserved character within a SIP URI. Similarly, when not used in a value equals context, the equals sign (=) must be escaped as %3d. In general, to determine the equivalent escaped characters for Linux or Solaris, search for the character in question, then replace its ASCII value with its equivalent hex value preceded by a %.
Example 1-2 shows an example of a VoiceXML dialog Request-URI containing a query string passing multiple parameters.
Example 1-2 Request-URI for dialog with Query Original Query: sip:[email protected];voicexml=http://host.company.com/scripts/ ivr?caller=usera&callee=userb Send to Media Server as: sip:[email protected];voicexml=http://host.company.com/scripts/ ivr%3fcaller%3dusera&callee%3duserb
When the document is expressed as a stand-alone URI, the voicexml keyword should be omitted. The SIP Request-URI should remain otherwise unchanged from that shown in Example 3-7.
RadiSys Confidential
13
VoiceXML Overview
The media server passes only one URI parameter from the Request-URI to the VoiceXML interpreter. If additional URI request parameters are included in the Request-URI, the media server treats the Request-URI as a bad request. The HTTP URI can include a query component, instead of a straight request for a specific VoiceXML document. This allows the server to dynamically select a VoiceXML script for execution. This allows such features as invoking scripts on the basis of called numbers, for instance. Example 1-3 shows a SIP INVITE that dynamically selects a script according to the number that was dialled. In this example, the DialledNumber parameter is sent to the HTTP server as a URL-encoded request to fetch the VoiceXML document. This request dynamically generates the associated script and fetches it for the media server to execute.
Example 1-3 SIP INVITE with Query in URI
Example 1-4 shows a SIP dialog that dynamically selects a script based on the callers in the session.
Example 1-4 SIP Dialog with Query in URI
sip:[email protected];voicexml=http://host.company.com/scripts /ivr?caller=usera&callee=userb
14
RadiSys Confidential
This VoiceXML script has access to the following session variables. The values are all derived from header fields within the SIP INVITE request:
session.connection.local.uri session.connection.remote.uri session.connection.callid Derived from the To: header field Derived from the From: header field Derived from the CallId: header field
would create a session variable named session.user.x with a value of y. For example, the following SIP INVITE request invokes the ivr.vxml VoiceXML script:
sip:[email protected];voicexml=http://host.company.com/ scripts/ivr.vxml;appvara=786;appvarmsg=hi there; appvarnumber=604-555-1234
In this request, values for application-specific variables appvara, appvarmsg, and appvarnumber are explicitly passed to the ivr.vxml script. Within the context of the script, these variables are defined as follows:
session.user.appvara session.user.appvarmsg session.user.appvar.number Value: 786 Value: hi there Value: 604-555-1234
The second method uses two session variable arrays to hold all URI parameters (including those defined for SIP in RFC 3261) and values. The first session variable array:
session.connection.protocol.sip.parameter[N].name
contains the names of all URI parameters. The first array element [0] always contains the string voicexml (regardless of where it appears in the SIP Request-URI) since that is the first and only required URI parameter for the dialog service context. The second array element contains the second URI parameter (if present), and so on. The second session variable array:
session.connection.protocol.sip.parameter[N].value
RadiSys Confidential
15
VoiceXML Overview
contains the corresponding values for the URI parameters. For example, a Request-URI of:
sip:[email protected];voicexml=http://server.example.com/script.vxml;x=y
Any escaped characters in the SIP Request-URI that are used as the name or value of VoiceXML session variables will be replaced with their unescaped representation. For example, a Request-URI of:
sip:[email protected];voicexml=http://server.example.com/script.vxml;%78=%79
creates and populates session variables exactly the same as in the preceding examples.
16
RadiSys Confidential
User A
Media Server HTTP GET 200 200 ACK RTP HTTP GET 200 BYE
Server
Optional additional server interactions to submit results and/or fetch other documents, grammars, or audio clips
Request-URI contains a voicexml= value which specifies the URI for the root document on an external server.
Note: The media server does not support VoiceXML 2.0 scripts in INVITE message bodies.
2 The media server sends a GET request with the HTTP URI to the server to retrieve the initial VoiceXML
document.
3 The external server responds with a 200 message to the media server and sends the document. 4 The media server responds with a 200 message to the control agent. 5 The control agent then sends an ACK to the media server indicating that the RTP connection is ready. 6 Upon receipt of the ACK, the media server sends the appropriate audio prompts to the user. 7 The user should reply with a DTMF input. This may trigger other dialogs to be acquired and sent. 8 When the IVR dialog ends or the user terminates his or her connection, the session is terminated by either
the media server, which sends a BYE to the control agent (or vice versa).
RadiSys Confidential
17
VoiceXML Overview
#!/usr/bin/perl # vxml requesthandler perl script $input_st = ""; if ($ENV{'REQUEST_METHOD'} eq "POST") { $input_st = <STDIN>; } else { $input_st = $ENV{'QUERY_STRING'}; } print Content-type: text/html\n\n; print print print print print print '<?xml version=1.0?>'; \n; '<vxml version=2.0 xmlns="http://www.w3.org/2001/vxml">'; "\n\n"; '<form>'; "\n";
18
RadiSys Confidential
print print print print print print print print print print
'<block>'; "\n"; '<disconnect/>'; "\n"; '</block>'; "\n"; '</form>'; "\n"; '</vxml>'; "\n";
HTTP Cookies
The media server supports HTTP cookies in VoiceXML HTTP transport, as defined in the Netscape Persistent Client State HTTP Cookies specification [13]. A server returning a document or other HTTP object to a client can include a cookie, which contains state information plus the range of URIs to which that state information applies. The client stores the information in the cookie, and for any future HTTP requests to the server falling within that URI range, the client will transmit the state information along with the request. This allows the HTTP server to maintain state for a VoiceXML session. The media server permits or denies the use of VoiceXML cookies using a configuration parameter in the VoiceXML configuration file vxml.cfg. The media server generates a VoiceXML log message when it receives a request to set a cookie and the OAMP configuration parameter has been set to deny cookies. By default, cookies are enabled on the media server. Cookies are deleted when the associated SIP session expires, regardless of any expiration time specified. The media server supports a maximum cookie size (that is, NAME=VALUE combination) of 4096 bytes. The media server silently discards any cookies over the maximum. The media server will support up to 10 cookies per session. After the system maximum is reached, the media server deletes the least recently used cookie when a new cookie is created.
RadiSys Confidential
19
VoiceXML Overview
expires=DATE
path=PATH
domain=DOMAIN_NAME
Attributes are separated by semi-colons. The media server does not support any other cookie attributes.
20
RadiSys Confidential
However, the media server does not support HTTPS at this time. Any cookies specifying the secure attribute are ignored. The following is an example of a Set-Cookie header in an HTTP response:
The media server supports lists of cookies in the Set-Cookie header. Cookies are separated in a list by commas (,). In addition, the media server accepts multiple Set-Cookie headers within a single HTTP response. Cookies are uniquely identified by the combination of domain-path-name. So long as cookies have different path or domain attributes, they can have the same name. If the media server receives a cookie with the same domain-path-name as an existing cookie, it overwrites the old cookie. If the media server receives a cookie with the same domain-path-name as an expired cookie, it deletes the cookie.
RadiSys Confidential
21
VoiceXML Overview
regardless of the input mode defined. Similarly, if no TTS servers are definedthen throughout the entire session TTS strings found within VoiceXML scripts are ignored. If one or more servers is enabled and brought online, subsequent new sessions will be able to utilize these servers; however, existing sessions will not.
User Input
In the VoiceXML applications supported by the media server, user input comes in the form of DTMF key presses or speech utterances. The way in which user input is collected, buffered and validated varies based on whether it is DTMF or speech input. For DTMF, all processing of digits is handled within the media server: digits are matched against active grammars according to the Form Interpretation Algorithm (FIA) defined by [13]. For speech, voice processing is performed by external speech servers. The validation of the speech against the grammar and determination of whether the speech matches or not is determined by these external servers. The results of the collection are returned to the interpreter as NLSML [8] scripts which are then processed by the media server.
DTMF
Within the media server, DTMF digits are detected on received RTP streams (inband DTMF). The media server also recognizes out-of-band (RFC 2833) DTMF digits.
Speech
Speech detection processing is performed by the external speech server. Once the media server determines that a grammar defined in a VoiceXML script is a speech grammar, the media server sends the grammar to the speech server and performs no other processing on the grammar: the speech server assumes responsibility for detecting and processing the user input. Voice input is received from the users RTP stream and routed to the speech server. The speech server makes all determinations of whether more input is required or whether the current input produces a match or no-match event. The result of the collection is returned to the media server as an NLSML script. The media server then interprets the results of the collection and determines the next action based on the FIA.
System Output
The output of VoiceXML applications is either the playback of recorded audio or video files or synthesized text-to-speech (TTS) played to the user by the external speech server. Audio playback to the user is invoked using the <audio> element. Audio files may be stored internally in the media server or on an external HTTP server. In either case, the source of the audio file is specified as a URI.
22
RadiSys Confidential
Control Flow
Audio files may be stored internally on the media server or on an external HTTP or NFS server. In either case, the source to the audio file is specified as a URI. The URI can be explicitly specified, or specified as the evaluation of an ECMAScript expression. The latter mechanism allows playing of audio files based on application-defined logic. Different methods of specifying audio files are described in detail in the section Working with Media Files and TTS Strings on page 28. If desired, the VoiceXML application can allow the user to interrupt (barge) audio playback with a DTMF key press, by enabling the bargein attribute of the <audio> element. TTS clips are specified by embedding strings into VoiceXML scripts. The media server supports plain text strings, Speech Synthesis Markup Language (SSML) strings, or a combination of the two. All strings, however, specified, are converted to SSML strings which are subsequently passed to an external speech server.
Control Flow
Control flow within a VoiceXML application can be manipulated using any of the following mechanisms: Application-defined variablesfor example, using variables defined using <var>, <assign>, or
<clear>
Predefined system variablesfor example, using variables defined using <var>, <assign>, or <clear> Event generation and handlingfor example, using <throw>, <catch>, <error>, <help>, <noinput>, <nomatch>, or user-defined events Conditional executionthat is, using <if>, <else>, and <elseif> Control transfer and jumpsfor example, using <goto>, <subdialog>, <submit>, <exit>, <return>, and <disconnect> Promptsfor example, using <prompt> and <reprompt> Scriptsthat is, using ECMAScript, either embedded inline or externally fetched
Session Termination
A session terminates either from the execution of a <disconnect> element within a script, because a user hangs up, or because a fatal error occurs during the execution of a script. Regardless of the cause, when a session terminates, the script enters into a state in which the set of operations that can be executed is restricted, so that script can clean up resources (for example, post the current state of collections, recordings, and so on to the HTTP server) before the session terminates. A script that is terminating is not able to queue or play prompts, recordings, or collect DTMF. There is a limit of two HTTP access operations and a maximum of six iterationsthat is, the script can move between forms and other scripts a maximum of six times before the session is terminated. These restrictions are intended to prevent unnecessary processing while in this clean-up state.
RadiSys Confidential
23
VoiceXML Overview
Shadow Variables
Some VoiceXML elements have associated shadow variables. Shadow variables are variables that are automatically assigned values when the elements are used. The media server supports shadow variables for the following: Announcements. Shadow variables for announcements are provided through the <prompt> element. These provide information about prompt completion and information resulting from DTMF collection. For information about shadow variables for announcements, please see the Shadow Variables section of the <prompt> element. Note that if the session terminates as the result of a SIP BYE, the shadow variables will not be updated with information about the prompt they will not contain correct values. Recordings. Shadow variables for recordings are provided through the <record> element. These provide information about the duration of the recording and the reason for its termination. For information about shadow variables for announcements, please see the Shadow Variables section of the <record> element. Shadow variables cannot be modified by a user or an application. They are returned from a VoiceXML document. Supported shadow variables are summarized in the List of Shadow Variables on page xiii.
Events
Some events and errors are automatically generated by the media server; others are generated under direct control of the VoiceXML application. For each event type or error type, the VoiceXML application can specify specific handling. Some event and error handling has a predetermined default implementation provided by the media server. In most cases, the system default event or error handlers can be overridden by the VoiceXML application to provide a more tailored mechanism.
24
RadiSys Confidential
Events
Event Support
Description Supported. Thrown whenever a disconnect distinct from hangup occurs. This event, if caught within an application, allows the application the opportunity to perform final processing before terminating the session. This may include posting of data (such as a recording), or submitting variables to an HTTP server using the <submit>, <goto>, or <link> element. Processing allowed before session termination is restricted to posting information to an HTTP server, setting variable values, and executing simple if-else-elseif statements. Execution requests to play audio, perform recording, define grammars will not be honored. In addition, a maximum of two HTTP post events are allowed in the catch handler and subsequent VXML documents. Note that this limitation applies equally to events executed within a particular <catch> handler and to subsequent VXML documents returned to the application. All attempts to perform restricted operations are terminated without incident; that is, even though a request to play an audio clip after a disconnect will fail, a second error will not be generated. Supported. Thrown whenever a disconnect occurs. This event, if caught within an application, allows the application the opportunity to perform final processing before terminating the session. This may include posting of data (such as a recording), or submitting variables to an HTTP server using the <submit>, <goto>, or <link> element. Processing allowed before session termination is restricted to posting information to an HTTP server, setting variable values, and executing simple if-else-elseif statements. Execution requests to play audio, perform recording, define grammars will not be honored. In addition, a maximum of two HTTP post events are allowed in the catch handler and subsequent VXML documents. Note that this limitation applies equally to events executed within a particular <catch> handler and to subsequent VXML documents returned to the application. All attempts to perform restricted operations are terminated without incident; that is, even though a request to play an audio clip after a disconnect will fail, a second error will not be generated. This event is thrown regardless of how the disconnect is initiatedthat is, whether the <disconnect> element was executed or encountered, or whether the disconnect was on account of a user hang-up. Supported. Thrown if fax tone detection is enabled (by setting the com.cvd.faxdetect property to true) and a fax tone is detected. Supported. This event will be processed either by the appropriate catch handler, or by the <exit> element. Please see page 88 for details on the <exit> element. Supported. Indicates a match event for DTMF collection. Allows application to define specific behavior relative to DTMF match events. This event is not actually thrown in the sense that it can be caught by a catch handler such as the <catch> element. Supported. This event will be processed either by the appropriate catch handler, or by the <help> element. Please see page 100 for details on the <help> element. If an application-specific <help> handler is not defined in the document, the default <help> handler executes 5 times before exiting the session.
connection.disconnect. hangup
help
RadiSys Confidential
25
VoiceXML Overview
Table 1-5
Event noinput
Event Support
Description Supported. Used to catch no-input events relative to DTMF collection and recording.This events allows an application to override the default <noinput> handler. Please see page 114 for details on the <noinput> element. If an application-specific <noinput> handler is not defined in the document, the default <noinput> handler executes 5 times before exiting the session. Supported. Used to catch no-match events relative to DTMF collection. Allows an application to override the default <nomatch> handler. Please see page 115 for details on the <nomatch> element. If an application-specific <nomatch> handler is not defined in the document, the default <nomatch> handler executes 5 times before exiting the session. Ignored. Ignored. Ignored.
nomatch
Errors
All VoiceXML errors are fatal to the current session, and the session terminates in all cases. Table 1-5 shows VoiceXML errors supported by the media server. Note that unsupported errors are not listed.
Table 1-6
Error error.badfetch error.grammar error.max_loop_count_ exceeded
Error Support
Description Thrown when specified resource could not be fetched, or the resource was specified incorrectly. Thrown for incorrectly formatted grammars, or unsupported attributes used within a grammar. Thrown if: 1. The maximum document fetches set for this session has been exceeded. Note this includes VXML document fetches for submit, subdialog, goto, link and the initial application document. Root documents and external SRGS grammars are not counted for this counter. The default is 100 fetches. 2. If the number of iterations (loops) exceeds 400 for a session. This includes all documents fetches and transitions between forms within a document. Thrown when a request (for example to play a clip or to enable fax detection) is rejected because available resources have been exceeded and overload protection is in effect. Like all VoiceXML errors, this error is fatal and the session will be terminated.
error.noresource
26
RadiSys Confidential
ECMAScript Support
Table 1-6
Error
Error Support
Description Thrown when incorrect or invalid values are assigned to properties. For information about supported properties, please see Chapter 2: Properties Overview. Also thrown for unsupported or undefined ECMAScript objects; for example when an undefined variable is evaluated. For information about ECMAScript support, please see ECMAScript Support on page 27. Thrown when an unsupported language has been specified for sets and variables. Thrown when an unsupported element is specified.
error.semantic
ECMAScript Support
The media servers ECMAScript support is fully compliant with ECMA-262, Edition 3 based on JavaScript 1.5. The length of any variable in VoiceXML, however specified, is limited to 256. In addition to the 256-character maximum enforced by the media server, ECMAScript may apply additional constraints in its own handling of variables. Any string specified longer than 256 or that supported by ECMAScript results in session termination with an error.semantic being thrown, except in the following cases: The variable is a URI, Remote or Local Address session variable. These are default session variables available to all applications. The maximum length of a VoiceXML URI that starts a session is 1024. Rather than rejecting the call with an exception, such values are truncated to 256 characters and stored in there in the shortened form. A user-defined session variable is longer than 256 characters. In this case, the session terminates, but an exception is not thrown. All other cases result in an error.semantic and session termination. Note that the media server does not throw an error.semantic for division-by-0 errors. Instead, the media server returns a value Inf, INF, or inf, representing infinity.
Escape Characters
There are essentially four classifications of data received or sent by the media server in which checking for escape characters (in the form%HEXHEX) may or may not be requiredor in which the media server may need to escape characters deemed to be special by the protocol. Thes four types of data are the following:
RadiSys Confidential
27
VoiceXML Overview
The media server assumes that this URI has been successfully extracted by the SIP layer. The URI may or may not include escape characters, so the media server processes the URI to remove any escape characters from the string.
2 Session variables appended to the end of the Request-URI.
The session variables are removed by the SIP layer based on the rules as defined in SIP RFC 3261. The session variables are presented in a list and are individually processed, with each escape character being converted into its ASCII equivalent.
3 URIs received within a VoiceXML document.
URIs received within a VoiceXML document are processed by the XML parser, which unescapes all characters based on the rules of XML. Subesquent to this operation, there is no other escape checking required or performed.
4 URIs including namelist data sent from the media server to an HTTP server.
These are URIs compiled by the media server and sent to the HTTP server. All characters in these URIs must be escaped, according to the HTTP protocol, and the media server processes them accordingly.
28
RadiSys Confidential
NFS
HTTP
a. The software media server, which does not support AMR, does not use 3GP for audio, only for video-only clips.
Audio clips must have the following characteristics: 8 kHz Mono (number of channels is 1) 8-bit
RadiSys Confidential
29
VoiceXML Overview
Alternate clips are sent as separate requests and are not included with primary clips. TTS clips are not included with audio or multimedia clips and are handled separately by the media server. Audio and TTS clips can be grouped together. Clips containing video must be grouped separately: either all the clips must contain video or none of them may. In addition VCR controls are not supported for video clips. If a clip containing video is discovered within a set of clips of another type, or if VCR controls are applied to video clips, the media server reports an error and terminates the playing of any other queued and requested clips. Clips that have been queued but are not yet requested are not affected.
NFS
HTTP
a. The software media server, which does not support AMR, does not use 3GP for audio, only for video-only clips.
Audio clips must have the following characteristics: 8 kHz Mono (number of channels is 1) 8-bit
30
RadiSys Confidential
The following table shows how to specify named media files in a VoiceXML document.
Table 1-9 Referencing Named Media Files in VoiceXML
Identifier Type Internal Announcement Syntax Syntax is [file:/]/provisioned/path/filename. Provisioned clips with alphanumeric names can be structured in up to nine levels of hierarchical directories or paths (with the level /provisioned forming a tenth level where applicable). Levels are delimited with the slash character (/). If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 128 characters can be used in total for path/filename. File names are case-sensitive. File extensions are not case-sensitive. Numbers, letters, and the underscore character are supported. Slash (/) is supported only to delimit levels of hierarchy. One period (.) is supported to delimit the file name from the file extension. Examples: file://provisioned/audioclips/hello.wav /provisioned/audioclips/hello.wav Internal Recording Syntax is [file:/]/transient/filename Transient recordings do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 128 characters can be used in total for filename. Numbers, letters, and the underscore character are supported. One period (.) is supported to delimit the file name from the file extension. Example: file://transient/user1_name.wav /transient/user1_name.wav file://transient/intro.mov /transient/intro.QT An absolute URL consisting of the file://mnt header (representing the mount point or exported directory) plus a valid NFS URI as per RFC 2224. The syntax is as follows: [file://]mnt/nfs_server_ip/path/filename where nfs_server_ip is the IP address of the external NFS server, path is the path fragment to be appended to the exported directory, and filename is the media file. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Syntax restrictions are as follows: Up to 255 characters can be used in total. Numbers, letters, and the underscore character are supported. Slash (/) is supported only to delimit levels of hierarchy. One period (.) is supported to delimit the file name from the file extension.
NFS server
RadiSys Confidential
31
VoiceXML Overview
The following table shows how to specify indexed audio files in a VoiceXML document.
Table 1-10 Referencing Indexed Audio Files in VoiceXML
Identifier Type Internal Announcement Syntax Syntax is [file://]index, where index is the numeric index of the clip. The range for indexes is 150000. Indexed clips do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI. Example: file://729 729
32
RadiSys Confidential
Note: When using the VoiceXML interface, avoid using spaces in media file names; instead, encode the space as the escape character %20. The media server does accept file names that include spaces, but replaces them with the escape character %20 before passing the file along for further processing
HTTP Queries
An HTTP URI can include a query component, instead of a straight request for a specific audio or multi-media resource, as in the following example:
<audio src="http://10.0.0.132/wavs/audio_handler?id=1234&sub=999>
This example allows the server to dynamically select an audio file to play.
Relative URIs
RadiSys Confidential
33
VoiceXML Overview
VoiceXML documents are always stored on HTTP servers. References to VoiceXML documents are similar to those for clips stored on external HTTP servers. VoiceXML documents can be referenced by either an absolute URI or a relative URI. A relative URI is recognized by the absence of the protocol (http:// or file://) scheme before the path fragment and file specification. The media server converts a relative URI to an absolute URI by concatenating with a base URI, which is either the URI of the fetching document or the value declared by using the xml:base attribute. A declared value takes precedence over the URI of the fetching document. The declaration can be made in multiple documents; the innermost declaration takes precedence. For example, suppose a VoiceXML document has a base URI is http://server2/path1/path2. Then consider the document reference in Example 1-6:
Example 1-6 Relative URI
<goto next="record.vxml">
This reference is a URI fragment. Accordingly, the VoiceXML interpreter considers it to be a relative URI. Thus, the base URI is concatenated to record.vxml, resulting in an HTTP GET to http://server2/path1/path2/ record.vxml. In accordance with the precedence rules for determining the base URL, within record.vxml (that is, while record.vxml is executing), the following applies: If record.xml itself has xml:base specified, the value of xml:base is used as the base URI while record.xml is executing. In that case, the base URI of the calling document is ignored. If xml:base is not specified, the base URI for record.xml is the URI that was used to fetch record.xml. In this case, that is http://server2/path1/path2. In Example 1-7, suppose a VoiceXML document again has a base URI is http://server2/path1/path2. Then consider the following document reference:
Example 1-7 Absolute URI
<goto next="http://newserver/path1/path2/path3/record.vxml">
This reference is to an absolute URI. The base URI is bypassed, resulting in an HTTP GET to http://newserver/path1/path2/path3/record.vxml. As in Example 1-6, within record.vxml, the following applies: If record.xml itself has xml:base specified, the value of xml:base is used as the base URI while record.xml is executing. In that case, the base URI of the calling document is ignored. If xml:base is not specified, the base URI for is the URI that was used to fetch record.xml. In this case, that is http://newserver/path1/path2/path3.
34
RadiSys Confidential
RadiSys Confidential
35
VoiceXML Overview
36
RadiSys Confidential
Chapter2:
VOICEXML PROPERTIES
This chapter describes the media servers support for VoiceXML properties. This chapter presents the following information: Properties Overview Generic Speech Recognizer Properties Generic DTMF Recognizer Properties Prompt Properties Fetching Properties Fax Detection Property
RadiSys Confidential
37
VoiceXML Properties
Properties Overview
Properties are variable settings that can be used to affect the behavior of the VoiceXML interpreter, such as DTMF recognition, timeout intervals, caching policy, and so on. VoiceXML properties are set using the <property> element. In some cases, global properties can be overridden using an attributes. For example, the bargein property can be updated by setting the bargein attribute in the <prompt> element. When values are not specifically assigned, properties inherit the platform defaults defined in this chapter. Any malformed property will result in an error.semantic exception being thrown and session termination. Table 2-1 summarizes the media servers support for VoiceXML properties
Table 2-1 Property Support Summary
Property Class Generic DTMF Recognizer Properties Prompt Properties Fetching Properties Fax Detection Property Generic Speech Recognizer Properties Object Fetching Properties Example Interdigit timeout values Prompt barge-in (interrupt) Fetch timeout value for retrieving documents, grammars, and scripts CED fax tone. inputmodes= voice N/A Reference Page 41 Page 43 Page 43 Page 46 Page 38 Ignored
38
RadiSys Confidential
These values can be configured on the external speech server. Alternatively, the media server can be configured to set these values through the control protocol by configuring the media server through the management interface. If VoiceXML is the control protocol, these values are set through the properties described in the following table. If MSML 1.1 is the control protocol, the default value listed for the property is set. Table 2-2 shows the mappings between VoiceXML properties and their equivalent MRCP header fields.
Table 2-2
Property confidencelevel
sensitivity
sensitivity-level
speedvsaccuracy
speed-vs-accuracy
completetimeout
speech-complete-timeout
incompletetimeout
speech-incomplete-timeout
no-input-timeout recognition-timeout
fetchtimeout
fetch-timeout
RadiSys Confidential
39
2
Table 2-2
Property
VoiceXML Properties
maxnbest
n-best-list-length
In addition to properties that have specific mappings to MRCP header fields, the media server implements the following properties to support voice:
Table 2-3 General Speech Property Elements
Property inputmodes Description Specifies the input mode that is currently active within the defined scope. Supported values are as follows: dtmf: DTMF only is accepted as input. voice: Voice only is accepted as input. dtmf voice: Both DTMF and voice are accepted as input. The default value for this property is configured in the media servers management interface. Note that these values are case-sensitive. A RadiSys extension. An application-defined property that can be used to specify the external ASR server for the current call. Limitations on this value are outlined below. The value is intended to match the External Server name as defined in the management interface and applies only to ASR servers.
externalserver
Note that the externalserver property is external to the VoiceXML specification. It is defined to support applications that may want to access a specific server or set of servers using a load balancer based on server capabilities or ownership of servers. Limitations on this value are as follows: The current value of the property is included in all set-up requests. This includes ASR servers only; there is no equivalent for accessing TTS servers. The property applies to only one type of server. If set after the ASR connection has been established then this element has no effect. The media server does not validate the value specified in the property.
40
RadiSys Confidential
If the external server is specified and does not map to an existing server, the service request fails.
interdigittimeout
termchar
RadiSys Confidential
41
VoiceXML Properties
longdigitduration
42
RadiSys Confidential
Prompt Properties
The IDT is stopped and restarted with the new value whenever a reset timer event is issued by the media server. Once the IDT expires, it does not restart until either a digit is received or it is explicitly requested to run, through a reset timer event.
Prompt Properties
Table 2-5 shows media server support for prompt properties.
Table 2-5
Property bargein
bargeintype
Fetching Properties
fetchhint Properties
In the VoiceXML specification, the fetchhint property defines when the interpreter context should retrieve the corresponding content from the server. A value of prefetch indicates that a file is to be downloaded when the page is loaded. A value of safe indicates that a file is only to be downloaded when actually needed.
RadiSys Confidential
43
VoiceXML Properties
The VoiceXML default value for fetchhint properties is prefetch. However, prefetching is not supported on the the media server, and it always behaves as if the value is safe.
Table 2-6
Property fetchhint audiofetchhint documentfetchhint grammarfetchhint scriptfetchhint
maxage Properties
In the VoiceXML specification, maxage properties ensure that the type of document the property governs does not use content whose age is greater than specified. These maxage properties are used in conjunction with corresponding maxstale properties to determine document fetching behavior. The maxage property is not supported. In general, the media server checks the date of the content on the server and fetches the content if it is newer than that on the media server.
Table 2-7
Property maxage audiomaxage documentmaxage grammarmaxage scriptmaxage
maxstale Properties
In the VoiceXML specification, maxstale properties indicate that the document is willing to use content that has exceeded its expiration time. These maxstale properties are used in conjunction with corresponding maxage properties to determine document fetching behavior.
44
RadiSys Confidential
Fetching Properties
RadiSys Confidential
45
VoiceXML Properties
Table 2-12 shows the interaction between the bargein property (or attribute) and the fax detection setting with respect to audio announcements.
Table 2-12 Interaction of bargein and Fax Tone Detection
bargein True True False False Fax Tone Detection Disabled Enabled Disabled Enabled Behavior Any DTMF digit will interrupt the announcement, but a fax tone will not. Fax tones are ignored. Any DTMF digit will interrupt the announcement, and a fax tone will also interrupt the announcement. The announcement is not interruptible: neither DTMF digits nor a fax tone will interrupt the announcement. No DTMF digit will interrupt the announcement; DTMF digits are ignored. However, a fax tone will interrupt the announcement.
46
RadiSys Confidential
Table 2-13 shows the interaction between the dtmfterm attribute of the <record> element and the fax detection setting with respect to recordings.
Table 2-13 Interaction of dtmfterm and Fax Tone Detection
dtmfterm True True False False Fax Tone Detection Disabled Enabled Disabled Enabled Behavior Any DTMF digit will interrupt the recording, but a fax tone will not. Fax tones are ignored. Any DTMF digit will interrupt the recording, and a fax tone will also interrupt the recording. The recording is not interruptible: neither DTMF digits nor a fax tone will interrupt the recording. No DTMF digit will interrupt the recording; DTMF digits are ignored. However, a fax tone will interrupt the recording.
RadiSys Confidential
47
VoiceXML Properties
48
RadiSys Confidential
Chapter3:
This chapter describes the media servers support for DTMF and voice grammars in VoiceXML. The following information is presented: Overview Input Mode Menu-Choice Grammars Option Grammars SRGS Grammars Arbitrary Grammars Built-In Grammars Maximum Length of Grammars Input Mode
RadiSys Confidential
49
Overview
This section presents the following topics: DTMF Grammars Speech Grammars The grammar definitions of a VoiceXML application provide a standard mechanism of validating user input. The grammar defines a set of rules, which are applied to user input to validate it. User input can take the form of either DTMF key presses or of speech (voice) utterances. Each form of input has its own grammars. Grammars can be defined according to the XML-based W3C Speech Recognition Grammar Specification (SRGS) language [11]. SRGS supports both grammars for speech recognition and grammars for DTMF user input validation. In addition to the above, the media server has a set of general purpose grammars built into the VoiceXML interpreter. These built-in grammars allow ease of application development, in that these grammars do not have to be defined using the SRGS. As with VoiceXML dialogs and subdialogs, a set of commonly used grammar rules can be maintained as a library. A grammar definition can be embedded within the application or it can be referenced from an externally located file. The concept of scope also applies to grammars. Multiple grammars may be active at the same time. For instance, when a grammar is defined with scope that applies to the entire VoiceXML document, then the grammar is active during all input collection phases. This mechanism is useful when defining common or global user input action items, such as Press *9 at any time to receive help.
DTMF Grammars
DTMF grammars define rules for collecting and validating user input supplied as DTMF key presses. DTMF grammars can be specified in either of the following ways: As a built-in grammar (see page 55) As an XML-based SRGS grammar (see page 53)
Speech Grammars
Speech grammars define rules for collecting and validating user input supplied as speech utterances. The processing of speech grammars is performed by the external speech server, not by the media server. Speech grammars can be specified in either of the following ways: As a built-in grammar (see page 55) As an XML-based SRGS grammar (see page 53) The actual support for speech grammars depends on the external speech server deployed. Provided the input mode (however defined) is voice, all grammars are passed directly to the speech server for evaluation. The determination of support for these grammars is then made by the external speech server.
50
RadiSys Confidential
Input Mode
Input Mode
The media server supports three modes of input: DTMF Voice DTMF and voice The VoiceXML Specification [13] defines different input defaults for different grammar types. These are shown in Table 3-1.
Table 3-1 Default Input Modes for VoiceXML Grammars Grammar Type XML-SRGS Built-In Menu-Choice Option Default Input Mode Voice DTMF and Voice No explicit default. The way in which the grammar is defined determines the input mode for the grammar. DTMF and Voice
Table 3-2 shows how the input mode is determined on the media server.
Table 3-2 Mechanisms for Setting Input Mode Scope Mechanism Configured input mode Description The default input mode as configured using the media servers management interface. The default is DTMF. An attribute of the <property> element that defines the input mode. If this attribute is not set, the value configured through the management interface is used. The starting (default) value is that set through the management interface. Scope/Precedence Scope: Session Precedence: Lowest Scope: Depends on scope of the <property> element. May be as high as Application or as low as Dialog. Precedence: Higher than configured input mode.
inputmodes attribute
Table 3-3 shows how the input mode interacts with the mode of the grammar, as defined by the mode attribute of the <grammar> element.
Table 3-3 Interaction of Input Mode and Grammar Mode Input Mode DTMF DTMF Grammar Mode DTMF Voice Behavior The media server detects, collects, and parses DTMF input. No grammars are active. The media server behaves as if no grammars were present in the script. Digits cannot barge clips and are not buffered. NOINPUT is reported for all collections.
RadiSys Confidential
51
Table 3-3 Interaction of Input Mode and Grammar Mode Input Mode Voice Grammar Mode DTMF Behavior No grammars are active. The media server behaves as if no grammars were present in the script. Digits cannot barge clips and are not buffered. NOINPUT is reported for all collections. The voice grammar is passed to the external speech server. The external speech server detects, collects, and parses coice collection and passes the results to the media server as an NLMSL script. DTMF digits are ignored. No voice grammar is activated. Only DTMF collection is valid. Only voice grammar is activated. DTMF digits are ignored. (In some cases two grammars would be required.) Both DTMF and coice grammars are active. DTMF input cancels the voice grammar and any voice input received until that point.
Voice
Voice
Menu-Choice Grammars
Menu-choice grammars are a simple mechanism for allowing the user to make a choice, and transitioning application control to another location is based on the users choice. Using audio prompts, the menu offers the user a set of choices, after which it waits for user input. The dialog transitions based on the user input. A menu-choice grammar can concurrently define both a DTMF and a speech grammar. Menu-choice grammars are implemented using the <menu> element and the <choice> element; please see those elements for details.
Option Grammars
Option grammars are a relatively simple way to specify grammars for collecting and processing user input. Simple DTMF or speech sequences or speech sequences are specified within the <option> element. The value attribute is assigned to the result of the collection, based on the option that was matched. An option grammar can concurrently define both a DTMF and a speech grammar. Option grammars are implemented using the <option> elementt; please see that element for details.
52
RadiSys Confidential
SRGS Grammars
SRGS Grammars
SRGS grammars are grammars defined according to the XML-based W3C Speech Recognition Grammar Specification (SRGS) Language [11]. The SRGS standard support both grammars for DTMF user input and for speech recognition. SRGS grammars consist of SRGS elements. The section SRGS Elements on page 9 shows the SRGS elements supported by the media server. The scope of a grammar rule can be either private or public. If the rules scope is private, then the rule can be referenced only from other rules in the local grammar. If the rules scope is public, and if the rule is activated for recognition, then the rule can also be from other grammars. XML-SRGS grammars can be defined either inline (that is, internal to the VoiceXML document) or external.
<grammar mode="dtmf" > <one-of> <item> <one-of> <item> 0 </item> <item> 1 </item> <item> 2 </item> <item> 3 </item> <item> 4 </item> </one-of> </item> <item> <one-of> <item> * 9 </item> <item> # 9 </item> </one-of> </item> </one-of> </grammar>
RadiSys Confidential
53
In Example 3-2, the grammar produces a match if the user enters utters exactly one of zero, one, two, three, four, star nine, or pound nine. Any other form of user input generates a nomatch event.
Example 3-2 Inline SRGS Voice Grammar
<grammar mode="voice" > <one-of> <item> <one-of> <item> <item> <item> <item> <item> </one-of> </item> <item> <one-of> <item> <item> </one-of> </item> </one-of> </grammar>
zero </item> one </item> two </item> three </item> four </item>
54
RadiSys Confidential
Arbitrary Grammars
If the configured input mode is dtmf and voice, the document is fetched by the media server and the decision of how to parse is made by determining whether or not a <grammar> element is present. If it is, the input mode is assumed to be voice; otherwise, the input mode is assumed to be DTMF.
Arbitrary Grammars
The media server has internal support for menu-choice, option, XML-SRGS and built-in grammars. Some speech servers also support arbitrary grammars, such as ABNF grammars, if specified within the the <grammar> element. Provided that the input mode is voice, all grammars are passed directly to the speech server for evaluation. The determination of support for the arbitrary grammar is then made by the speech server.
Built-In Grammars
In addition to SRGS grammars, there is a set of grammars built into the media servers VoiceXML interpreter. These are designed to facilitate development by eliminating the need to use SRGS for simple, general-purpose grammars. No XML definition is required to use these grammars. For speech grammars, the built-in grammars are converted to XML-SRGS grammars before being passed to the speech server. Built-in grammars are specified by using either the type attribute of the <field> element, or the src attribute of the <grammar> element. Built-in grammars are implicitly active for both DTMF and speech user input; however, some built-in grammar types (for example, Date or Currency grammars) are designed specifically for DTMF. For these built-in grammar types, collection and interpretation by the speech server may yield unpredictable results.If a built-in grammar type does not explicitly specify valid input values for voice, you should assume that the built-in grammar is valid for DTMF only. For any built-in DTMF grammar, the media sever can accumulate at most 30 DTMF digits. If not otherwise constrained, all grammars terminate upon receipt of the 30th digit. The received digits are then evaluated based on the specific grammar associated with the collection. Limitations on the length of speech depend on the external speech servers deployed. All <field> elementdefined grammars are converted into their <grammar> element equivalent before passing it to a speech server. The following built-in grammar types are defined: Boolean Date Digits Currency Number Phone
RadiSys Confidential
55
Time All speech grammars defined within a <field> element are converted to the equivalent <grammar> element grammar (that is, all built-in grammars are converted to an SRGS grammar) before the grammar is passed to the speech server. Table 3-4 shows examples of this conversion.
Table 3-4 Conversion of Built-In Speech Grammars to XML-SRGS Grammars Grammar Type Boolean
Built-In Representation <field type="boolean"> <grammar src=" builtin:dtmf/boolean"/> <field type="currency"> <grammar src=" builtin:dtmf/currency"/> <field type="date"> <grammar src=" builtin:dtmf/date"/> <field type="digits?length=1"> <grammar src="builtin:dtmf/digits?length=1"/> <field type="number"> <grammar src=" builtin:dtmf/number"/> <field type="phone"> <grammar src=" builtin:dtmf/phone"/> <field type="time"> <grammar src=" builtin:dtmf/time"/>
Currency
Date
Digits
Number
Phone
Time
Boolean
Boolean grammars accept a string of one or more DTMF digits, and assign a string value of true or false based on the digits entered. By default, the key 1 corresponds to true and 2 corresponds to false for DTMF grammars. For voice grammars, Yes corresponds to true and No corresponds to false. DTMF bindings may be changed by appending an HTTP URIstyle keyword=value query syntax to the grammar type. The keywords y and n are accepted as alternatives to true and false, respectively. The use of variable length digits is supported.
Example 3-3 Boolean Built-In Grammar
boolean?y=4;n=5, boolean?n=31;y=32
56
RadiSys Confidential
Built-In Grammars
Currency
Currency grammars accept entry of a variable number of DTMF or voice digits and the asterisk (*) key. Entries are assigned to a string in the format mm.nn format, where mm corresponds to zero or more digits in the major currency unit, and nn corresponds to zero or more digits the minor currency unit. The asterisk key is used as the decimal point to separate the major and minor currencies. With the exception of leading zeros, which are removed from the string, all entered digits are included in the resulting string.
Example 3-4 Currency Built-In DTMF Grammar
builtin:dtmf/currency
Date
Date grammars accept 2-, 4-, 6-, and 8-character DTMF or voice collections. These represent, respectively, days (dd), month and day (mmdd), year and month (yyyymm), and year, month, and day (yyyymmdd). The digits entered must form a valid date. The year component can be any four digits; that is, no validation is performed. The month must be between 01 and 12. Day must be between 01 and 31. Days are not checked for validity against the specified month. An error.nomatch event is thrown for invalidly entered dates. Note that, different from the specification [13], no question mark characters (?) are used to pad the input. Only the digits received are returned.
Example 3-5 Date Built-In DTMF Grammar
builtin:dtmf/date
RadiSys Confidential
57
Digits
Digit grammars accept entry of a variable number of DTMF or speech digits. The number of digits accepted may be constrained by appending an HTTP URIstyle keyword=value query syntax to the grammar type. Keywords accepted are minlength, maxlength, and length, where length specifies the exact number of digits accepted. An error.badfetch is thrown if there is a conflict between the keyword values. All digits are included in the resulting string.
Example 3-6 Digits Built-In Grammar
digits?minlength=2;maxlength=8
Number
Number grammars are identical to currency grammars, except that: The asterisk (*) key is interpreted as a decimal point, rather than a currency separator, and None of minlength, maxlength and length are specified. Leading zeros are removed from the resulting string. This allows the result to be used in an ECMAScript expression, as ECMA would interpret a leading 0 as representing an octal value.
Example 3-7 Number Built-In Grammar
builtin:dtmf/number
Phone
Phone grammars behave identically to number grammars, except that: All digits entered are included in the resulting string The asterisk key (*) is interpreted as representing an extension. For example, 8005551212*123 results in a returned string of 8005551212x123.
Example 3-8 Phone Built-In Grammar
builtin:dtmf/phone
58
RadiSys Confidential
Time
Time grammars accept entry of three or four DTMF digits representing a time, and return a five-character string in the format hhmmx, where hh is the hours between 00 and 24, mm is minutes between 00 and 59, and x is either h (for a 24-hour clock) or ? if the entry is ambiguous between a 12- and 24-hour clock. Because morning (AM) cannot be unambiguously expressed in DTMF, ? will be a common termination. If only three digits are entered, the media server adds a leading zero to the string.
Example 3-9 Time Built-In Grammar
builtin:dtmf/time
<grammar version="1.0" type="application/srgs+xml" mode="dtmf" root="root"> <rule id="root" scope="public"> <one-of> <item> 1 <item> 3 </item> </item> <item repeat="0"> 3 </item> <item repeat="3"> 4 </item> <item repeat="3-5"> 5 </item> <item repeat="4-"> 6 </item> <item repeat="0-1"> 8 </item> <item> 9 </item> </one-of> </rule> </grammar>
In this example, the possible maximum lengths are 1, 2, 3, and 30 digits. Since only one maximum length can be associated with a grammar at any time, the longest maximum length (in this case 30) is used.
RadiSys Confidential
59
The means that, for a given grammar, digit collection may not end immediately at the first input that satisfies one possible maximum length. In Example 3-10, the fourth item has a length of 3; as as such, an input string of 444 might be expected to end collection immediately. Instead, the maximum length is the longest possible maximum length30. In this case, the inter-digit timer is started and the system waits to see if additional input will be forthcoming. If no additional input is received within the inter-digit timeout interval, collection will end and the input string 444 is accepted as satisfying the fourth grammar item. Thus, for a grammar with variable length items, collection will only be terminated by either the longest possible maximum length or an inter-digit timeout.
Grammar Evaluation
All DTMF collections are evaluated in real time as digits are received. All matches and no-matches (for example, if the current digit results in a match or an impossible match) are recognized and reported as soon as the current digit is evaluated. For voice grammars the evaluation of incoming speech against the currently defined grammar is performed by the external speech server. The specific behavior will depend on the speech server employed.
60
RadiSys Confidential
Chapter4:
This chapter describes the VoiceXML 2.0 elements currently supported by the Convedia Media Server, including SRGS and SSML elements. The VoiceXML 2.0 language is defined by the W3C Candidate Recommendation specifying the language [13]. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported in this release of the Convedia Media Server. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.
RadiSys Confidential
61
<assign>
Assigns a value to a variable. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
name expr
Mandatory. The name of the variable being updated. Mandatory. An ECMAScript expression representing the new value of the variable.
Usage Guidelines
Use this element to assign a value to a variable. Note that the maximum size of a variable namewhether assigned, or newly created using the <var> elementis 256 characters. If the variable name exceeds this length, an error.semantic is thrown.
62
RadiSys Confidential
<audio>
<audio>
Plays an audio clip or multimedia file or renders a text-to-speech clip. Parent element:
<audio>, <block>, <catch>, <desc>, <emphasis>, <error>, <field>, <filled>, <help>, <if>, <initial>, <mark>, <menu>, <noinput>, <nomatch>, <p>, <phoneme>, <prompt>, <prosody>, <record>, <s>, <say-as>, <sub>, <subdialog>, <voice> <audio>, <break>, <emphasis>, <mark>, <p>, <prosody>, <s>, <say-as>, <value>, <voice>
Child elements:
Note that the SSML elements <break>, <desc>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, and <voice> do not appear a children or parents of the <audio> element within the XML schema.
Attributes
src
A URI or numeric index representing the media clip or TTS string to be played. A URI must comply with the XML anyURI format. In addition, the URI or numeric index must comply with the constraints described in the section Working with Media Files and TTS Strings on page 28. Exactly one of src and expr must be specified; otherwise, an error.badfetch is thrown.
expr
An ECMAScript expression evaluating to the URI or numeric index of the media clip or TTS string to be played. A URI resulting from the expression must comply with the XML anyURI format.In addition, the URI or numeric index resulting from the expression must comply with the constraints described in the section Working with Media Files and TTS Strings on page 28. Exactly one of src and expr must be specified; otherwise, an error.badfetch is thrown.
RadiSys Confidential
63
Usage Guidelines
The <audio> element requests the media server to play an audio clip or multimedia clip, or to render text-to-speech strings.
64
RadiSys Confidential
<audio>
Text with embedded SSML elements A URI to an external SSML file, expressed using either the src or expr attribute. For external URIs, the file extensions SSML, CSSML, and TXT are supported. Any other file extension is assumed to be a media clip. An <audio> element defined (that is, embedded) within a TTS string is only valid if there are active external TTS servers. For systems that do not deploy TTS servers, then the entire string, including any embedded <audio> elements, is ignored.
Alternate Audio
The media server supports the use of alternate audio or silence. Alternate audio allows an application the means to specify an audio clip, multimedia clip or TTS string to be played in case the primary clip fails. The primary clip can be either an audio or multimedia clip; it cannot be a TTS clip embedded within the VoiceXML document for the purpose of playing alternate audio. However, a TTS clip can be specified as an external URI. In that case, if the external speech server fails the playing of the clip (for example, because the file was not found or a parse error occurred), and alternate audio is defined, the alternate audio is queued and requested to be played. Alternate audio is specified by including a second <audio> element nested within the first. Silence is played by including a <break> element nested within the <audio> element. Alternate audio is played, when specified, if: The requested primary clip(s) are not found. However, if the clip was started but failed prematurely then the alternate audio will not be played. If a series of clips are specified as the primary clips and at least one of them plays then the alternate audio are not played. The primary clip is specified as an ECMA expression using the expr attribute and the ECMA variable does not exist. In this case, an ECMA error is thrown after evaluating the expr attribute. If this occurs, and an alternate <audio> or <break> element has been defined, then the <audio> or <break> element will be queued and played (assuming that it is validly specified). Otherwise the error is treated as non-fatal and the session transitions to the next element defined in the script. The <audio> element src and expr attributes support only prerecorded audio clips and not TTS strings. Alternate audio however does not have this restriction. Alternate audio can be:
RadiSys Confidential
65
A <break> element, which will play the specified silence An internal pre-recorded audio clip An external pre-recorded audio clip A TTS string. The <audio> element supports only two levels of nesting. Thus, there is at most one level of alternate audio clip(s) that can be specified. Anything below the second level is ignored.
Example 4-1 Alternate Audio Examples
<audio src=file://ClipstoPlay> <audio src="file://AlternateAudioClip/> </audio> <audio src=file://Welcome> Welcome to your life </audio> <prompt> This is a TTS string. <audio src=file://nextClip/> This is another string to play </prompt>
With respect to alternate audio, whether <audio> or <break> elements are used, only one alternate element can be defined. All others are ignored. Note that the <break> element will not appear as a child of <audio> in the XML schema. The <break> element is not defined as a standard element in the schema and as such does not appear in the normal child-parent relationships.
Encoding
The media server ignores the length specified in WAV file headers. The media server first uses the HTTP Content-Type header to determine the codec. The Content-Type header is analyzed in this order:
66
RadiSys Confidential
<audio>
1 If the content type is audio/basic, audio/x-alaw-basic, or audio-x-g729-basic, the media server assumes that
relies on the WAV header to determine the codec type. If a WAV header is not found, the media server fails the announcement.
3 If the value is audio/vnd.wave; codec=xxx, where xxx is any number, the media server interprets the file as a
WAV file but uses the codec=xxx encoding in preference to any defined within the file.
4 If the media server cannot determine the file type from the header (for example, only audio is specified),
the media server examines the file extension. If the extension is .wav, the media server uses the encoding specified in the file. If the extension is not .wav,, the media server assumes the file is a .wav file interprets the contents of file as containing a WAV header and accepts or rejects the request accordingly.
Interoperability Notes
For some speech servers: No audio output is heard if PCMA is the configured codec. If the configured codec is PCMU the speech is heard. For PCMA nothing is heard and there is no error indicating that there was an issue. The speech server fails to play a mixture of SSML, plain text, and CSSML text. All scripts are requested as external URIs. All three scripts can be heard being played separately but not as a group. The generated TTS speech is choppy and garbled. On some speech servers, a clicking sound occurs between the playing of TTS clips and local audio clips. This does not occur with other external servers. The speech server generates speech in English when Mandarin is specified in some scripts. A date speech grammar generates a mixture of Chinese and English when the xml:lang attribute is set to en-US.
RadiSys Confidential
67
<block>
Allows execution of code within a form. Parent element: Child elements:
<form> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
name
Optional. The name of the form item variable used to track whether this block is eligible to be executed. The default is an inaccessible internal variable.
expr
Optional. An ECMAScript expression representing the initial value of the form item variable. If initialized to a value, the form item will not be visited unless the form item variable is cleared. The default is the ECMAScript value undefined.
cond
Optional. A Boolean ECMAScript expression. The form item is visited if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
Usage Guidelines
The <block> element is a form item. It contains executable content that is executed if: The blocks form item variable has a value of undefined AND The blocks cond attribute (if any) evaluates to true. If cond is not specified, the behavior is as if cond is set to true.
68
RadiSys Confidential
<break>
<break>
Inserts a pause or silence into audio. Parent element:
<audio>, <prompt>
Attributes
time
Optional. The length of the interval of silence to be inserted, in seconds or milliseconds. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The time attribute takes precedence over both size and strength. If nothing is specified, the default interval is 200 milliseconds.
RadiSys Confidential
69
strength
Optional. The length of the interval of silence to be inserted, in predefined intervals. Supported values are as follows: x-weak: 50 milliseconds weak: 100 milliseconds medium: 200 milliseconds strong: 500 milliseconds x-strong: 2000 milliseconds none: 0 milliseconds The time attribute takes precedence over both size and strength. The size attribute takes precedence over strength.If nothing is specified, the default interval is 200 milliseconds. The strength attribute is always present in all <break> elements, whether specified or not. However, because of its low precedence, it is only used if it is specified and neither time nor size is specified.
size
Deprecated in favor of strength, which is compliant with [4]; however, this attribute is still accepted for backwards compatibility.
Usage Guidelines
The <break> element attribute allows silence intervals to be played within a VoiceXML script. The element is essentially treated like an <audio> element, where the clip played is silence. Instead of specifying an actual audio clip, the <break> element specifies the interval of silence. Up to one <break> element is supported within an <audio> element; others are ignored. Note that the <break> element will not appear as a child of <audio> in the XML schema. The <break> element is not defined as a standard element in the schema and as such does not appear in the normal child-parent relationships.
70
RadiSys Confidential
<catch>
<catch>
Handles (catches) events. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog>, <vxml> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
event
Optional. The event or events to be caught by this event handler. The format is a space-separated list of event names, where an event name is one of the supported events listed in the section Events on page 24. If more than one event is specified, a separate event counter (that is, a separate count attribute) is maintained for each event.
count
Optional. The number of occurrences of the event. The count attribute allows an application to handle different occurrences of the same event in different ways. Each <form>, <menu>, and form <item> maintains a counter for each event that occurs while it is being visited. These counters are reset each time the <menu> or form items <form> is re-entered. The form-level counters are used in the selection of an event handler for events thrown in a form-level <filled>. Counters are incremented against the full event name and every prefix-matching event name; for example, the occurrence of the event event.foo.1 increments the counters associated with handlers for event.foo.1, event.foo, and event. The count may not exceed a 32-bit unsigned integer. The default is 1.
cond
Optional. A Boolean ECMAScript expression. The catch handling routine is invoked if and only if this expression evaluates to true. The default is true.
RadiSys Confidential
71
Usage Guidelines
The <catch> element allows executable content to be defined for a number of events that the interpreter can generate. The cond attribute is used to test for event conditions. The special variable _event is supported to store the name of the event that is thrown. The special variable _message is also supported. This variable holds an optional message string, which may be set within the <throw> element. If a message has not been specified, then the variable will be set to the ECMAScript value undefined.
72
RadiSys Confidential
<choice>
<choice>
Provides menu choices. Parent element: Child elements:
<menu> <emphasis>, <grammar>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>. <break> is accepted but ignored.
Attributes
dtmf
Optional. Specifies a simple DTMF sequence which, when matched, will result in this choice. White space is permitted in the DTMF sequence specification; for example 1234# and 1 2 3 4 # are treated as equivalent. There is no default. Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see the section Generic DTMF Recognizer Properties on page 41.
accept
Optional in speech grammars; ignored for DTMF grammars. Tue only valid is exact. An accept value specified in a <menu> element, overrides the value set here.
next
Fetches the document at the specified URI. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
expr
Fetches the document at the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
event
Throws the specified event when this choice is made. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
RadiSys Confidential
73
eventexpr
Throws the event resulting from evaluation of the specified ECMAScript expression when this choice is made. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
message
Optional. Returns the specified message string to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
Ignored. Ignored. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
74
RadiSys Confidential
<choice>
Usage Guidelines
The <choice> element defines a menu item and allows the application to define a simple DTMF sequence or voice specification to indicate this menu choice. It also allows specification of a destination URI for fetching the next document when the menu choice has been made. Optionally, the element can be set to throw an event when the choice is made. All <choice> elements defined for voice are converted into an XML-SRGS format, which is then passed to an external speech server for processing. Note although <break> is a valid child of <choice> in the VoiceXML schema, it is ignored (though accepted) in this implementation, and no action is taken if specified.
Interoperability Notes
For some speech servers: Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch.
RadiSys Confidential
75
<clear>
Clears or resets form items (form fields). Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
namelist
Optional. Resets the specified variable(s), including any form item variables. The format is a space-separated list of variable names. By default, all form items for the current form are reset.
Usage Guidelines
The <clear> element resets the specified variable(s), including form item variables. When form items are cleared, the prompt and event counters are reinitialized and the form item variable is set to the ECMAScript value undefined.
76
RadiSys Confidential
<controlcmd>
<controlcmd>
Specifies the actions associated with DTMF key presses for prompt controls. Parent element:
<promptcontrol>
Ignored if specified as a child of <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, or <nomatch>. Child elements:
Attributes
None.
dtmf
Mandatory. Specifies a single DTMF key with the associated audio control action. Supported digits are 09, *, #, A, B, C, and D. (Note that a through d are not supported.) Whitespace is not permitted. Any other value, or use of whitespace, will cause an error.badfetch to be thrown. You may specify the same DTMF key for both the pause and resume actions, in order to achieve a toggle action. Also, you may specify the same DTMF key for a single action defined multiple times. Otherwise, you cannot specify the same DTMF key for different actions, and doing so will cause session termination with an error.semantic.
action
Mandatory. The audio control action to be performed when the specified DTMF digit is pressed. Supported values are as follows: pause: Pause the stream for an indefinite period of time. resume: Resume the paused stream. seek: Stream audio beginning at the location specified by the combination of the from and to attributes. volume: Adjust the volume by the amount specified by combination of the from and to attributes. The default volume is 0dB.
RadiSys Confidential
77
from
Optional with seek and volume actions; ignored otherwise. The starting value for the seek and volume actions. Supported values are as follows: begin: When used with seek, measure the change of location specified by the to attribute relative to the beginning of the file. When used with volume, interpret the volume specified by the to attribute as an absolute volume. current: When used with seek, measure the change in location specified by the to attribute relative to the current position. When used with volume, interpret the volume specified by the to attribute as a change relative to the current volume. The default is current.
78
RadiSys Confidential
<controlcmd>
to
Mandatory with seek and volume actions; ignored otherwise. When used with seek, this attribute represents the offset interval in seconds or milliseconds from the starting point specified by the from attribute. The format is <number><unit>, where <number> is an integer, and may optionally be preceded by a plus sign (+) or a minus sign (-), and where a plus sign moves the location forward (fast-forward) and a minus sign moves the location backward (rewind). <unit> may be one of ms (for milliseconds) or s (for seconds). Spaces between the numeric value and the unit are not permitted. The range is (2^311) milliseconds to +(2^311) milliseconds, with a precision of 10 milliseconds. If the specified value exceeds the range in either direction, then the media server automatically applies the offset limit (either positive or negative). Specifying a forward location past the end of the audio file results in audio stream completion. Specifying a rewind amount past the beginning of the file results in play starting at the beginning of the file. Examples of to values are: 100ms, 50s, and +600ms. When used with volume, this attribute represents a volume change. As an absolute volume specification (from=begin), the range is 96dB to +96dB, where the plus sign (+) is optional. Exceeding the range will cause an error.semantic to be thrown. As a change in volume relative to the current volume (from=current), the range is 192dB to +192dB, where the plus sign (+) is optional. Exceeding the range will cause an error.semantic to be thrown. If you specify a change of volume that is within the valid range, but which results in an absolute volume lower than the negative limit of 96dB or greater than the positive limit of +96dB, then the media server automatically applies the volume limit (either positive or negative). Note that all units are required. Omitting units will cause an error.semantic to be thrown.
Usage Guidelines
The <controlcmd> element specifies DTMF keys, and associates an action for audio prompt controls. This element is valid only for pre-record audio clips. TTS clips specified within a <controlcmd> element are ignored. Audio controls are limited to single DTMF keys, which are specified by the dtmf attribute. DTMF grammars (inline or external, and built-in or SRGS) are currently not supported in specifying audio controls.
RadiSys Confidential
79
While a prompt control is active, the DTMF keys and associated control actions override any currently active grammars or prompt barge-in. DTMF digits not consumed by <controlcmd> action keys are used by currently active grammars or prompt barge-in. Actions specified in the <controlcmd> element are active during the play of a prompt only if the <prompt> elements cvd:vcrprompt attribute is set to true. The same DTMF key can be defined for pause and resume actions, so that the user can between pausing and resuming a clip. These are the only actions that may use the same DTMF key. Also, the same action can be defined multiple times using the same key. In this case, the most recent definition overrides the previous ones. Any other combination of actions that uses the same key results in an error.semantic being thrown and session termination. Control actions specified in <controlcmd> apply or span a single <prompt> element. If several <prompt> elements are played back to back, with control commands enabled, then each is treated independently. All errors in specifying media controls result in the session terminating. An error.badfetch is thrown for any errors detected by the parser. This is generally cases where the value assigned to the attribute does not conform to the regular expression for that attributefor example, a value for the dtmf attribute that is not a valid DTMF digit. For all other errors, an error.semantic thrown. Possible error cases include the following: Omitting the to attribute for volume or seek actions. Specifying a value for from that is neither begin or current. Specifying a time value for the to attribute when the action is volume. Specifying a volume-based value for the to attribute when the action is seek. Failing to include units (s, ms, or dB) for the to attribute. Including a space between the value and the unit for the to attributefor example, 3 s. Specifying a value that is out of range for the to attribute for an absolute volume specificationthat is specifying a value that is less than 96 dB or greater than +96 dB when from=begin. Using the same DTMF key is for two different actions which are not pause and resume. Pause and resume are the only actions that may use the same key for the toggle function. Otherwise the same DTMF key cannot be used for different actions (although the same action can be defined multiple times using the same digit).
80
RadiSys Confidential
<desc>
<desc>
[SSML] Provides a textual description of audio content. Parent element: Child elements:
Attributes <audio>
None.
xml:lang
Optional. Indicates that content of this element is in a different language from that surrounding the element.
Usage Guidelines
The <desc> element provides a textual description of audio source (for example, door slamming). The <desc> element can only occur within the content of the <audio> element. If text-only output is being produced by the synthesis processor, the content of the <desc> element(s) should be rendered instead of other alternative content in audio. The optional xml:lang attribute can be used to indicate that the content of the element is in a different language from that of the content surrounding the element. Unlike all other uses of xml:lang in this document, the presence or absence of this attribute will have no effect on the output in the normal case of audio (rather than text) output.
Interoperability Notes
For some speech servers: The <desc> element is only supported as content of the <audio> element. The expected behavior of the VoiceXML script and the subsequent SSML TTS body is that the request be rejected; however, the speech is generated and played.
RadiSys Confidential
81
<disconnect>
Terminates the VoiceXML application, sending a SIP BYE. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
The <disconnect> element allows the VoiceXML interpreter context to disconnect the user. Execution of the disconnect element causes the connection.disconnect.hangup event to be thrown, which may optionally specify some clean-up actions. The current session is terminated, a SIP BYE is sent to the control agent, and all associated media port resources are released by the platform. See also the related elements <exit> and <return>.
82
RadiSys Confidential
<else>
<else>
Provides alternative logic for an <if> condition. Parent element: Child elements:
Attributes <if>
None.
The <else> element is an optional element. It defines the beginning of an else clause specifying the code to be executed if the conditions specified in the associated <if> element are not satisfied.
RadiSys Confidential
83
<elseif>
Provides alternative logic for an <if> condition. Parent element: Child elements:
Attributes <if>
None.
cond
Mandatory. A Boolean ECMAScript expression. The associated clause is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
Usage Guidelines
The <elseif> element is an optional element. It defines a new conditional clause specifying the code to be executed if the conditions specified in the associated <if> element are not satisfied. The new clause is entered only if the conditions specified by the cond attribute are satisfied.
84
RadiSys Confidential
<emphasis>
<emphasis>
[SSML] Directs the speech server to add emphasis to surrounded text. Parent element: Child elements:
<speak>
Attributes
level
Optional. Indicates the strength of emphasis to be applied. Defined values are as follows: strong moderate none reduced The default level is moderate. The meaning of strong and moderate emphasis is interpreted according to the language being spoken (languages indicate emphasis using a possible combination of pitch change, timing changes, loudness and other acoustic differences). The reduced level is effectively the opposite of emphasizing a word. For example, when the phrase going to is reduced it may be spoken as gonna. The none level is used to prevent the synthesis processor from emphasizing words that it might typically emphasize. The values "none", "moderate", and "strong" are monotonically non-decreasing in strength.
Usage Guidelines
The <emphasis> element requests that the contained text be spoken with emphasis (also referred to as prominence or stress). The synthesis processor determines how to render emphasis since the nature of emphasis differs between languages, dialects or even voices. The emphasis element can only contain text to be rendered.
Interoperability Notes
For some speech servers: The <emphasis> element with the level attribute has no effect. .
RadiSys Confidential
85
<error>
Handles (catches) all error events. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog>, <vxml> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
count
Optional. The number of times an error event may be thrown within its scope (form or menu), after which error handling is invoked. The count may not exceed a 32-bit unsigned integer. The default is 1. Optional. A Boolean ECMAScript expression. The error handling routine is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
cond
Usage Guidelines
The <error> element catches all events of type error. If multiple error handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=error>. For a list of supported events, please see the section Events on page 24.
86
RadiSys Confidential
<example>
<example>
[SRGS] Provides an example phrase that matches the input specification. Parent element: Child elements:
Attributes
None. None.
None.
Usage Guidelines
This SRGS element can be used within a grammar rule definition to illustrate an example of user input complying with the specification. No associated action for this element is performed within the interpreter context or the grammar engine; it is ignored by these components.
RadiSys Confidential
87
<exit>
Terminates the VoiceXML application, while keeping the port open. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
expr
Optional. An ECMAScript expression (such as field1 or Finished) to be returned to the interpreter context. By default, no expression is returned. Only one of expr and namelist may be specified; if both are specified, an error.badfetch is thrown. No error is generated if neither is specified.
namelist
Optional. A space-separated list of variables to be returned to the interpreter context. By default, no variables are returned. Only one of expr and namelist may be specified; if both are specified, an error.badfetch is thrown. No error is generated if neither is specified.
Usage Guidelines
The <exit> element allows control to be returned back to the interpreter context. Unlike session termination as a result of a <disconnect>, <exit> allows the media server to retain media port resources. Other resources (documents, variables, and so on) associated with the session are released; however, the media port resources are not released by the platform. A SIP BYE is not sent to the control agent. The port resources are kept on hold pending further direction from the control agent.
88
RadiSys Confidential
<field>
<field>
Collects user input. Parent element: Child elements:
<form> <audio>, <catch>, <error>, <filled>, <grammar>, <help>, <link>, <noinput>, <nomatch>, <option>, <prompt>, <promptcontrol>, <property>
Attributes
name
Optional. Defines a variable with the specified name, which will hold the result of the user collection defined by the <field> element. The variable name must be unique among all form items defined within the form; otherwise, an error.badfetch is thrown. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words. There is no default.
expr
Optional. An ECMAScript expression assigning the initial value of the form item variable defined by name. If the initial value is set using this attribute, the form item will not be executed until the variable is cleared (for example, by using the <clear> element). The default is the ECMAScript value undefined. Optional. A Boolean ECMAScript expression. The field is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true. Optional. Provides the definition of a built-in grammar. Instead of using this attribute, a grammar can be specified using the <grammar> element.
cond
type
slot
RadiSys Confidential
89
modal
Optional. Allows you to disable all other grammars while the field is being executed, so that only the grammar associated with field is active. Supported values are as follows: true: Disable all other grammars, leaving only this one active. false: Keep all grammars enabled. The default is false.
Usage Guidelines
The <field> element prompts the user to provide input based on the specified grammar. The grammar can be DTMF and/or voice. The type attribute takes one of the defined built-in grammars as an argument. Built-in grammars implicitly support DTMF and voice inputs unless the input mode is explicitly specified using the inputmodes attribute of the <property> element. As an alternative to specifying the grammar in the type attribute, the grammar for a <field> element can be specified using the <grammar> element. All voice grammars defined using the type attribute are are converted into their <grammar> equivalent before being passed to the external speech server. shows the conversion that takes place between a type-specified grammar and its <grammar> equivalent, and shows whether or not that representation is supported for DTMF and voice.
Table 4-1 Conversion of <field> type Attribute to <grammar>
Supported Mode DTMF and voice DTMF only DTMF and voice DTMF and voice DTMF and voice
<field> Representation <field type=boolean> <field type="boolean?y=5;n=6> <field type="digits"> <field type="digits?minlength=3; maxlength=5"> <field type="date"/>
<grammar> Equivalent <grammar src="builtin:grammar/boolean"/> <grammar src="builtin:dtmf/boolean?y=5;n=6/> <grammar src="builtin:grammar/digits"/> <grammar src="builtin:grammar/digits?minlength=3; maxlength=5"/> <grammar src="builtin:grammar/date"/>
For more information about DTMF and voice grammars, please see Chapter 3: DTMF and Voice Grammars.
90
RadiSys Confidential
<field>
Interoperability Notes
For some speech servers: The match rate for voice inputs is very low. There is an inconsistent match rate across match tests. A match is returned when a no-match is expected in some test cases. This occurs with different grammar types. A special SRGS rule (which is matched without the user speaking any word) does not work. The expected behavior is that the grammar can be used to match zero or silence. However, currently the rule is not matched. A special SRGS rule (which matches any speech up until the next rule match, the next token, or the end of spoken input) does not work. Currently the rule is not matched as expected. 0229 is recognized as 0529 for date grammars. Enter the values zero, two, two, nine and the speech server returns is returns 0529. Entering an invalid leap date returns a date. The expected behavior is to return an error or a nomatch. Some digits are dropped or mismatched for ASR digit grammar. 13456 was entered but 12345 was returned. Currency input of 100.798 drops the final digit and returns 100.79. The speech server accepts only up to 2 decimal places for number grammar. Entered 98.765 and 98.76 was returned. The speech server returns nomatch for a number grammar if the leading digits are zeros. The point character (.) is recognized as 1 instead of dot for number grammars. Phone grammars are incorrectly recognized. An input of 6044202978 returned 6004123457. Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch. Noinput was returned when a match was expected for the input: zero six zero six zero six zero six zero six. Speech input in a date grammar is incorrectly interpreted. Entering june, nineteen seventy eight results in a returned string of 780619. For MRCP v1, saying or generating the speech twelve oclock results in a no match being returned in a time grammar. For MRCP v2 this test succeeded. A grammar completion failure occurs setting up an ABNF grammar. Speech server running MRCP v1 does not return PCMA as the lead codec when only PCMA is offered. As a result, the external server actually uses the PCMU codec while the media server is streaming PCMU. When running MRCP v2 the speech server works as expected.
RadiSys Confidential
91
<filled>
Defines the code to be executed when user input is complete. Parent element: Child elements:
<field>, <form>, <record>, <subdialog> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
mode
Optional. Specifies when execution of this element should take place. Supported values are as follows: any: Execute when any of the input items has been filled by the user. all: Execute only when all of the input items have been filled by the user. The default is all.
namelist
Optional. A space-separated list of variable names representing the input items that must be filled in order for this element to be executed. When this element occurs within a form, this list defaults to the names (both implicit and explicit) of the forms input items; otherwise, there is no default.
Usage Guidelines
The <filled> element specifies actions to be executed when the associated <field> has been completed by the user.
92
RadiSys Confidential
<form>
<form>
Defines a dialog for collecting user input. Parent element: Child elements:
<vxml> <block>, <catch>, <error>, <filled>, <grammar>, <link>, <noinput>, <nomatch>, <promptcontrol>, <property>, <record>, <script>, <subdialog>, <var>
Attributes
id
Optional. A unique identifier for the document. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore. This identifier is optional. If specified, it can be used to within the current document or within another document to pass control to the formfor example, this-form in <goto next=#this-form>.
scope
Optional. The default scope of this forms grammar. Supported values are as follows: dialog: This grammar applies only to the current form. document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document. The default is dialog.
Usage Guidelines
The <form> element is a key mechanism in VoiceXML for presenting information to the user and collecting user input. A form consists of form items, which can be visited during the execution of the form. Form items can either be input items (which are visited as a result of user input) or control items (which are independent of user input). A form allows variable declarations and an event handler to be associated with the form. Additionally, the child element <filled> allows you to specify procedural logic that can be executed when user input is completed and a particular field item (or field) is filled.
RadiSys Confidential
93
<goto>
Transfers control to another dialog, abandoning the current dialog. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
next
The URI of the document to which to transition. The URI must comply with the XML anyURI format. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
expr
An ECMAScript expression evaluating to the URI of the document to which to transition. The URI resulting from the expression must comply with the XML anyURI format. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
nextitem
The name of the next item to transition to within the form. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
expritem
An ECMAScript expression evaluating to the name of the next item to transition to within the form. Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.
fetchaudio fetchhint
Ignored. Ignored.
94
RadiSys Confidential
<goto>
fetchtimeout
Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <goto> element provides the ability to transition control to another dialog, either within the current document, or within another document.
RadiSys Confidential
95
<grammar>
Defines user input rules for DTMF or voice. Parent element: Child elements:
Attributes <choice>, <field>, <form>, <link>, <record> <rule>
src
The URI of the grammar, if the grammar is to be fetched externally. The URI must comply with the XML anyURI format. This attribute can also be used to directly specify a built-in grammar, using the notation builtin:grammar/type?parameters (where grammar=dtmf). Either way, this attribute is mandatory if an inline grammar is not specified, and forbidden if an inline grammar is specified; that is, exactly one of src or an inline grammar must be specified. If both or neither are specified, an error.badfetch is thrown.
scope
Optional. The default scope of this grammar. Supported values are as follows: dialog: This grammar applies only to the current form. document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document. If not specified, the grammar scope is inherited from the parent element.
type
Optional. Identifies the MIME type of the grammar. If specified, this value takes precedence over file types or the HTTP Content-type header. If not specified and the grammar is fetched externally, then the file extension type or the media Content-type is used to determine the grammar type. If not specified and the grammar is inline, the type is assumed to be XML; that is, application/SRGS+xml.
weight fetchhint
Ignored. Ignored.
96
RadiSys Confidential
<grammar>
fetchtimeout
Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
This element inherits the following SRGS attributes for inline grammars.
version
Mandatory for an inline XML grammar; forbidden otherwise. Identifies the W3C specification version of the grammar. The only supported value is 1.0. Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server. A value for xml:lang specified at the <item> level overrides a value specified here.
xml:lang
RadiSys Confidential
97
mode
Optional. The type of the current grammar. Supported values are as follows: dtmf: The grammar is a DTMF-based grammar. voice: The grammar is a voice-based grammar. This attribute differs from the inputmodes property which represents the type of input that will be accepted. For a valid grammar (that is, a grammar that will be activated and can receive input), this attribute must align with the value of the inputmodes property. Grammars that mismatch between the mode attribute and the inputmodes property are ignored. The default for this attribute in the specification is voice. For backwards compatibility, the default for the media server is dtmf.
root tag-format
Optional for an inline grammar; forbidden otherwise. Identifies the grammars root rule. If not specified, the grammars default rule is used. Optional for an inline grammar; forbidden otherwise. A URI identifying the content type and version of Symantec processor to use. Defines the tag content format for all tags with the grammar. Optional. Allows a base URI to be defined. If set, any relative URIs within the inline grammar are resolved using this base URI. Otherwise, any relative URIs are resolved using the base URI specified within the <vxml> element.
xml:base
Usage Guidelines
The <grammar> element specifies the rules for a valid set of user inputs or utterances. The grammar definition can be inline, external, or built-in, and can be specified for both DTMF and/or voice. The grammar specification must be in the XML form of the notation specified by [11]. Exactly one of src or an inline grammar must be specified. If both or neither are specified, an error.badfetch is thrown. External grammars that are voice grammars are fetched, parsed and processed by the external speech server. For this case the URI will be passed directly (as-is) to the speech server. For this reason, the media server must determine the grammar type (that is, the input mode) before it can pass the URI. The input mode can be defined in any of the following ways: By specifying the default input mode as a VoiceXML parameter using the media servers management interface Using the inputmodes attribute of the <property> element Using the mode attribute of the <grammar> element To be valid, a grammar must evaluate to at least one digit sequence. Grammars that evaluate to be empty (that is, no valid collection sequence is specified), are rejected with an error.grammar event.
98
RadiSys Confidential
<grammar>
Interoperability Notes
For some speech servers: The match rate for voice inputs is very low. There is an inconsistent match rate across match tests. A match is returned when a no-match is expected in some test cases. This occurs with different grammar types. A special SRGS rule (which is matched without the user speaking any word) does not work. The expected behavior is that the grammar can be used to match zero or silence. However, currently the rule is not matched. A special SRGS rule (which matches any speech up until the next rule match, the next token, or the end of spoken input) does not work. Currently the rule is not matched as expected. 0229 is recognized as 0529 for date grammars. Enter the values zero, two, two, nine and the speech server returns is returns 0529. Entering an invalid leap date returns a date. The expected behavior is to return an error or a nomatch. Some digits are dropped or mismatched for ASR digit grammar. 13456 was entered but 12345 was returned. Currency input of 100.798 drops the final digit and returns 100.79. The speech server accepts only up to 2 decimal places for number grammar. Entered 98.765 and 98.76 was returned. The speech server returns nomatch for a number grammar if the leading digits are zeros. The point character (.) is recognized as 1 instead of dot for number grammars. Phone grammars are incorrectly recognized. An input of 6044202978 returned 6004123457. Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch. Noinput was returned when a match was expected for the input: zero six zero six zero six zero six zero six. Speech input in a date grammar is incorrectly interpreted. Entering june, nineteen seventy eight results in a returned string of 780619. For MRCP v1, saying or generating the speech twelve oclock results in a no match being returned in a time grammar. For MRCP v2 this test succeeded. A grammar completion failure occurs setting up an ABNF grammar. Speech server running MRCP v1 does not return PCMA as the lead codec when only PCMA is offered. As a result, the external server actually uses the PCMU codec while the media server is streaming PCMU. When running MRCP v2 the speech server works as expected.
RadiSys Confidential
99
<help>
Handles (catches) help events. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
count
Optional. The number of times a help event may be thrown, after which the help handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5. Optional. A Boolean ECMAScript expression. The help handling routine is executed if and only if the expression evaluates to true. The default is true.
cond
Usage Guidelines
The <help> element catches all events of type help. If multiple help handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=help>. For a list of supported events, please see the section Events on page 24.
100
RadiSys Confidential
<if>
<if>
Defines conditional logic. Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch> <assign>, <audio>, <clear>, <disconnect>, <else>, <elseif>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
cond
Mandatory. A Boolean ECMAScript expression. The associated clause is executed if and only if the expression evaluates to true.
Usage Guidelines
The <if> element defines procedural logic that is to be executed on satisfaction of a condition. The <if> element may have associated <else> and/or <elseif> clauses, which define alternate logical flows.
RadiSys Confidential
101
<initial>
Provides the initial prompt in a form. Parent element: Child elements:
<form> <audio>, <catch>, <link>, <noinput>, <nomatch>, <prompt>, <property>
Attributes
name
Optional. The name of the form item variable used to track whether the <initial> element is eligible for execution. The default is an inaccessible internal variable.
expr
Optional. An ECMAScript expression representing the initial value of the form item variable. If initialized to a value, the form item will not be visited unless the form item variable is cleared. The default is the ECMAScript value undefined.
cond
Optional. A Boolean ECMAScript expression. The form item is visited if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.
Usage Guidelines
The <initial> element defines procedural logic that is to be executed on satisfaction of a condition. In a typical mixed initiative form, the <initial> element is visited when the user is initially being prompted for form-wide information, and has not yet entered into the directed mode where each field is visited individually. Like input items, the <initial> element has prompts, catches, and event counters. Unlike input items, the <initial> element has no grammars, and no <filled> action.
102
RadiSys Confidential
<item>
<item>
[SRGS] Defines valid user input, as part of a DTMF or voice grammar rule. Parent element: Child elements:
Attributes <item>, <one-of>, <rule> <item>, <one-of>
repeat
Optional. Specifies additional user detection repeat rules for a match to be declared. Supported formats are as follows: repeat=n. Repeat n times. repeat=m-n. Repeat between m and n times, where m is less than or equal to n, and m and n are both greater than or equal to 0. repeat=m-. Repeat m or more times, where m is greater than or equal to 0. repeat=0-1. Indicates that expansion is optional.
repeat-prob
Optional for voice grammars; ignored for DTMF grammars. Sets the probability that the repeat attribute will succeed. Valid onlly for speech grammars and only if the repeat attribute is defined. The range is 0.0 to 1.0. Ignored. Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server. A value for xml:lang set here overrides a value set at the specified at the
<grammar> level overrides a value specified here.
weight xml-lang
Usage Guidelines
The <item> element is used in XML grammar specification rules to define valid user inputs. For DTMF items, grammars as defined in Appendix E of [11] may be used. These are the digits 09, #, *, and the digits AD. For voice-based grammars, any input acceptable by the external speech server may be used. Tokens not enclosed in <item> elements are ignored. A grammar that has no valid <item> elements defined is rejected with an error.grammar event. (Note that this deviates slightly from [13], which states that empty grammars should be allowed.) For information on how this differs for voice-based grammars, please see Chapter 3: DTMF and Voice Grammars. The <item> element can be nested at most three levels deep.
RadiSys Confidential
103
Interoperability Notes
For some speech servers: The repeat attribute used in a nested <item> element returns nomatch for input that should generate a match.
104
RadiSys Confidential
<link>
<link>
Specifies a destination URL when a grammar activates a match. Parent element: Child elements:
Attributes <field>, <form>, <vxml> <grammar>
next
Goes to the specified URI. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
expr
Goes to the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
event
Throws the specified event when one of the link grammars is matched. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
eventexpr
Throws the event resulting from evaluation of the specified ECMAScript expression when one of the link grammars is matched. For a list of supported events, please see the section Events on page 24. Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
message
Optional. Returns the specified message string to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
RadiSys Confidential
105
dtmf
Optional. Specifies a simple DTMF sequence which, when matched, activates the specified link. White space is permitted in the DTMF sequence specification; for example 1234# and 1 2 3 4 # are treated as equivalent. There is no default. Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see the section Events on page 24.
Ignored. Ignored. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <link> element provides a mechanism for transitioning to a new document or dialog. Alternatively, it can be used to throw an event instead of transitioning to a new document. The <link> element is activated when the grammar contained or specified within the element is matched. For this reason, grammars specified within the <link> element are not able to have a scope specified.
106
RadiSys Confidential
<link>
Grammars active for a link at the root document level are active throughout all documents referenced from the root document. Grammars active for a link at the <vxml> level are active throughout the document. Grammars active for a link at the <form> level are active while the user is in the form.
RadiSys Confidential
107
<log>
Generates messages for logging and troubleshooting. Parent element: Child elements:
Attributes <block>, <catch>, <filled>, <form>, <catch>, <help>, <if>, <noinput>, <nomatch>
None.
label expr
Optional. A string that can be used to label the logfor example, to indicate the purpose of the log. Optional. An ECMAScript expression evaluating to a string that can be used to label the logfor example, to indicate the purpose of the log.
Usage Guidelines
The <log> element allows an application to generate messages for the purpose of logging and debugging. The messages can include events, text information, and/or results from a VoiceXML script. This facility aids application developers in debugging an application by examining its flow control and variable contents. The element may contain any combination of text and <value> elements. The <value> element is used to de-reference ECMA script expressions and include them as a string in the message. The generated message consists of the concatenation of the text message and the string form of the value of the expr attribute in the <value> element. All log messages generated by the <log> element are written to syslog at a severity level of INFO.
108
RadiSys Confidential
<mark>
<mark>
[SSML] Places a marker into a text or tag sequence. Parent element: Child elements:
Attributes <speak>
None.
name
Mandatory. A token providing a unique name for the marked location; for example here.
Usage Guidelines
Use the <mark> element to reference a specific location in the text/tag sequence, or to insert a marker into an output stream for asynchronous notification. When processing a mark element, a synthesis processor does one or both of the following: Informs the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output. When audio output of the SSML document reaches the mark, issue an event that includes the required name attribute of the element. The hosting environment defines the destination of the event.
The <mark> element does not affect the speech output process.
Interoperability Notes
For some speech servers: The TTS server does not send MARK event to SPM when it reaches <mark> element in spoken text.
RadiSys Confidential
109
<menu>
Provides a fixed set of menu selections. Parent element: Child elements:
<vxml> <audio>, <catch>, <choice>, <error>, <help>, <noinput>, <nomatch>, <prompt>, <promptcontrol>, <property>, <script>
Attributes
id
Optional. A unique identifier for the menu. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore. This identifier is optional. If specified, it can be used to within the application to pass control to the menufor example, from a <goto> or a <submit>.
scope
Optional. The default scope of this menus grammar. Supported values are as follows: dialog: This grammar applies only to the current menu. document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document. The default is dialog.
dtmf
Optional. Defines whether <choice> elements that have not explicitly assigned DTMF key press attribute values are automatically assigned a corresponding DTMF key press. Supported values are as follows: true: <choice> elements not explicitly set are automatically assigned a DTMF key press. false: <choice> elements not explicitly set are not assigned a DTMF key press. The default is false.
accept
Ignored for DTMF and speech grammars; optional for speech recognition. For speech recognition, specifies whether user input must be exact or may be approximate. Menu grammars that specify speech are converted to XML-SRGS grammars. The supported value is exact; there is currently no mapping for approximate in XML-SRGS grammars. The default is exact.
110
RadiSys Confidential
<menu>
Usage Guidelines
The <menu> element provides a relatively simple mechanism (as compared to, say, a form) for allowing the user to make a choice, and transitioning to another location is based on the users choice. Using audio prompts, the menu offers the user a set of choices, after which it waits for user input. The dialog transitions based on the user input.
RadiSys Confidential
111
<meta>
Defines page information. Parent element: Child elements:
Attributes <vxml>
None.
name
A name for the metadata property describing page information. Exactly one of name or http-equiv must be specified; otherwise an error.badfetch is thrown. Mandatory. A value for the metadata; that is the page information to be recorded. This value can supply for an HTTP response header. This value can be accessed later by the session variable session.meta.name. If this attribute is omitted, an error.badfetch is thrown. Ignored. The name of an HTTP header for which the content attribute is supplying the response value. Exactly one of name or http-equiv must be specified; otherwise an error.badfetch is thrown.
content
http-equiv
Usage Guidelines
The <meta> element allows specification of information about a grammar document. This element is allowed but ignored by the media server.
Interoperability Notes
For some speech servers: Providing both the name and http-equiv attributes within the <meta> element is illegal and an error is expected; however, the speech server accepted the grammar, although it eventually returned a noinput event. In a test to verify that the <meta> element is accepted in an ABNF grammar, the grammar fails when being activated (that is, in the define grammar request). The expected behavior is for the grammar to be accepted and processed.
112
RadiSys Confidential
<metadata>
<metadata>
[SRGS] Defines information about a document using a metadata schema. Places a marker into a text or tag sequence. Parent element: Child elements:
Attributes <speak>
None.
Usage Guidelines
Use the <metadata> element to act as a container in which information about the document can be placed using a metadata schema. Although any metadata schema can be used with metadata, it is recommended that the XML syntax of the Resource Description Framework (RDF) [RDF-XMLSYNTAX] be used in conjunction with the general metadata properties defined in the Dublin Core Metadata Initiative [DC]. Document properties declared with the metadata element can use any metadata schema.
RadiSys Confidential
113
<noinput>
Handles (catches) a user input timeout event. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog>, <vxml> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
count
Optional. The number of times a noinput event may be thrown, after which the no-input handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5. Optional. A Boolean ECMAScript expression. The no-input handling routine is executed if and only if the expression evaluates to true. The default is true.
cond
Usage Guidelines
The <noinput> element catches all events of type noinput. If multiple no-input handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=noinput>. For a list of supported events, please see the section Events on page 24.
114
RadiSys Confidential
<nomatch>
<nomatch>
Handles (catches) an invalid user input event. Parent element: Child elements:
<field>, <form>, <menu>, <record>, <subdialog>, <vxml> <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Attributes
count
Optional. The number of times a nomatch event may be thrown, after which the no-match handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5. (Note that [13] sets the termination value to 4. This was changed to match the value for <noinput>, and to provide backward compatibility with a previous release of the software.)
cond
Optional. A Boolean ECMAScript expression. The no-match handling routine is executed if and only if the expression evaluates to true. The default is true.
Usage Guidelines
The <nomatch> element catches all events of type nomatch. If multiple no-match handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13]. This element is equivalent to <catch event=nomatch>. For a list of supported events, please see the section Events on page 24.
RadiSys Confidential
115
<one-of>
[SRGS] Allows one selection from a list of alternatives. Parent element: Child elements:
Attributes <item>, <rule> <item>
xml:lang
Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server. A value for xml:lang specified here overrides any value for xml:lang that may have been specified at a higher level and applies to all elements below this element.
Usage Guidelines
The <one-of> element identifies a set of alternative options that are mutually exclusive. The media server supports at most two levels of nested <one-of> elements. Deeper nesting results in the grammar being rejected, in which case an error.badfetch is thrown and the session terminated.
116
RadiSys Confidential
<option>
<option>
Provides a simple method for specifying grammars. Parent element: Child elements:
Attributes <field>
None.
accept dtmf
Ignored. Optional. Specifies a simple DTMF sequence for user input collection and handling. White space is permitted in the DTMF sequence specification; for example 1234# and 1 2 3 4 # are treated as equivalent. There is no default. Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see Chapter 2: VoiceXML Properties.
value
Optional. Specifies a string to be assigned to the <field> name variable when this option is selected. By default, the value of the dtmf attribute is used.
Usage Guidelines
The <option> element provides a relatively simple way to specify grammars for collecting and processing user input. Simple DTMF or speech sequences or speech sequences can be specified within this element, rather than specifying a complex grammar. An <option> grammar can concurrently define both a DTMF and a speech grammar in much the same way a <choice> element does. The value attribute is assigned to the result of the collection, based on the option that was matched. Example 4-2 shows a VoiceXML script defining an <option> grammar enabled for both DTMF and speech. For DTMF, the values 1, 2 and 3 will result in the <filled> element being executed. For speech, the words Vancouver, New York, or Paris will result in the <filled> element being executed.
Example 4-2 <option> Grammar Example
<form> <field name="city"> <prompt> Please select a city you would like to visit. <enumerate/>
RadiSys Confidential
117
</prompt> <option dtmf="1" value="vancouver "> Vancouver </option> <option dtmf="2" value="newyork "> New York </option> <option dtmf="3" value="paris "> Paris </option> <filled> <submit next="/cgi-bin/flyto.cgi" method="post" namelist="city"/> </filled> </field> </form>
Example 4-3 shows an XML-SRGS grammar that is equivalent to the one shown in Example 4-2. The grammar shown in Example 4-3 would be passed to the external speech server for evaluation while the grammar shown in Example 4-2 would be parsed and processed within the media server.
Example 4-3 XML-SRGS Grammar
<grammar mode="voice" version="1.0" root="optionRoot"> <rule id="optionRoot" scope="public"> <one-of> <item> Vancouver </item> <item> New York </item> <item> Paris </item> </one-of> </rule> </grammar>
118
RadiSys Confidential
<p>
<p>
[SSML] Represents a paragraph.
[
<speak>
.<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang
Usage Guidelines
The use of the <p> element is optional. Where text occurs without an enclosing <p> or <s> element, the synthesis processor attempts to determine the structure using language-specific knowledge of the format of plain text.
Interoperability Notes
For some speech servers: Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements.
RadiSys Confidential
119
<param>
Defines a parameter to a subdialog. Parent element: Child elements:
Attributes <subdialog> <param>
name value
Mandatory. Specifies the name of the parameter to be used in the <subdialog> element. The value to be assigned to the parameter within the <subdialog> element. Exactly one of value and expr must be specified.
expr
An ECMAScript expression resulting in the value to be assigned to the parameter within the <subdialog> element. Exactly one of value and expr must be specified.
valuetype
Optional. Specifies, only to an <object> within a <subdialog> element, whether the value is of type data or type ref. Since the media server only supports type data, any other value is ignored. Optional. Specifies the media type, if the valuetype is ref. Since the media server only supports a valuetype of data, the only supported value for type is data; any other value is ignored.
type
Usage Guidelines
The <param> element allows parameters to be passed to subdialogs. Nesting of <param> elements is not supported.
120
RadiSys Confidential
<phoneme>
<phoneme>
[SSML] Provides a phonemic/phonetic pronunciation for the contained text.
]
<speak>
None.
ph alphabet
Mandatory. Specifies the phoneme/phone string. Optional. Specifies the phonemic/phonetic alphabet, which in this context refers to a collection of symbols to represent the sounds of one or more human languages. Supported values are vendor-specific.
Usage Guidelines
The <phoneme> element provides a phonemic/phonetic pronunciation for the contained text. The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document. For example, the content may be displayed visually for users with hearing impairments.
Interoperability Notes
For some speech servers: The ph attribute is specified as a mandatory parameter for the <phoneme> element. However, the speech server accepts and processes the element within a SSML string without the ph attribute.
RadiSys Confidential
121
<prompt>
Specifies media output to be played to a user. Parent element:
<block>, <catch>, <error>, <field>, <filled>, <help>, <if>, <menu>, <noinput>, <nomatch>, <record>, <subdialog> <audio>, <break>, <say-as>
Child elements:
Attributes
.
bargein
Optional. Specifies whether the audio prompt can be interrupted (barge) by DTMF or speech input. Supported values are as follows: true: The prompt is bargeable, and DTMF or speech input will interrupt play. If any digits remain in the digit buffer at the time this element is executed, the clip is barged immediately and will not play. false: The prompt is not bargeable. Any digits currently in the digit buffer are cleared, and any digits received while clip(s) are playing are discarded. If not set, the value set for the bargein property applies. For information on the bargein property, please see Chapter 2: VoiceXML Properties. The setting of the bargein attribute can interact with the setting of the fax detection property com.cvd.faxdetect. For that information, please see Chapter 2: VoiceXML Properties.
Ignored. Optional. A Boolean ECMAScript expression. The prompt is played if and only if this expression evaluates to true. The default is true. Optional. The number of times the form item can be visited for the prompt to be played. The default is 1.
122
RadiSys Confidential
<prompt>
timeout
Optional. An interval after which, if initial DTMF user input has not been received, a noinput event is thrown. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The default is 10s.
cvd:vcrprompt
RadiSys extension. Optional for audio clips only; forbidden for TTS or multimedia clips. Specifies whether <promptcontrol> actions are active for this prompt. Supported values are as follows: true: Prompt controls are active for this prompt. false: Prompt controls are not active for this prompt. Any specified TTS prompts are ignored; they are neither queued nor played. Multimedia clips are played but return an error on play. Any other value results in the session terminating with an error.semantic event. The default is false.
cvd:cleardb
RadiSys extension. Optional. Flushes digits from the digit buffer. Supported values are as follows: true: All digits currently in the digit buffer will be cleared prior to playing the requested prompt. Digits are cleared independent of any value set for the bargein attribute. For details about the interaction between the cvd:cleardb attribute and the bargein attribute, please see Table 4-4 in the Usage Guidelines. false: The digit buffer is not cleared before playing the requested prompt. Any other value results in the session terminating with an error.semantic event. The default is false. This parameter does not apply to speech; only DTMF input is buffered.
RadiSys Confidential
123
cvd:varprompt
RadiSys extension. Mandatory if the prompt contains a <say-as> element; ignored otherwise. Specifies whether the variable prompt specified in the <say-as> element is to be played by the external TTS server or using the media servers built-in sets and variables processor. Supported values are as follows: tts: An external TTS server plays the prompt. If this value is specified but no external TTS server is configured, the varprompt attribute is ignored. sv: The media server plays the prompt using its internal sets and variables processor.
xml:lang
Optional if the prompt contains a <say-as> child element; ignored otherwise. Specifies the language to be used in rendering the prompt. If not specified, the value specified in the xml:lang attribute within the VoiceXML root document root is used. If the media servers sets and variables processor is to be used to render the variable, the only supported value is en (English). In this case, specifying an unsupported language (that is, any language other than en) causes an error.unsupported.language event. If an external TTS server is to be used to render the variable, the language value is not inspected by the media server, but is passed directly to the external TTS server.
xml:base
Optional. Allows a base URI to be defined. If set, any relative URIs within the prompt specification are resolved using this base URI. Otherwise, any relative URIs are resolved using the base URI specified within the <vxml> element. Note that a base URI can only be applied to the relative URI specified within a src attribute. It cannot be applied to a URI resulting from evaluation of an ECMAScript expression (that is, an expr attribute).
Shadow Variables
Whenever a prompt completes (with the exception of a user hang-up), a number of application-level scoped shadow variables are populated. These shadow variables provide the VoiceXML application with information about the last prompt played. Note that if the session terminates as the result of a SIP BYE, the shadow variables are not updated with information about the prompt. In this case, numeric variables report 0 and the lasturl variable reports undefined. For TTS clips, only the bargein variable is populated. All numeric variables report 0, and string variables report undefined.
124
RadiSys Confidential
<prompt>
Table 4-2 shows the shadow variables defined to provide information about prompt completion.
Table 4-2 Prompt Completion Shadow Variables
Shadow Variable application.cvd_lastprompt$.bargein Description RadiSys extension. Indicates whether the prompt was barged or not. Supported values are as follows: true: The prompt was barged. false: The prompt was not barged. RadiSys extension. The amount of time, in milliseconds, consumed by the last prompt played. This is the total amount of time for the last clip, or set of clips played. If the prompt was barged, then this represents the time up to the point of being barged. Although the duration includes all clips specified for the prompt, it does not include pauses that a result of user-defined pause/resume sequences. It does, however, include any silence included as a result of using the <break> element. For multimedia clips containing both audio and video components, the duration represents the larger of the video or audio components. For example, if audio played for 5400 milliseconds and video played for 4800 milliseconds, the duration parameter reports 5400 milliseconds. If the clip fails to start playing for any reason, the value of this variable is 0. If the prompt terminates because the user hangs up, the value of this variable is 0. RadiSys extension. A string identifying the URL of the last audio or multimedia file played. If the prompt consisted of a set of multiple clips, and the sequence was interrupted as a result of a DTMF digit, then the value of this variable will be the URL of the file that was playing when the digit was received. Note that the value of this variable will be undefined if no clips have been played. This includes the case where a clip is barged before it starts, and the case where a clip is stopped immediately after starting because of type-ahead digits remaining in the digit buffer. If the prompt terminates because the user hung up, the value of this variable will be undefined.
application.cvd_lastprompt$.duration
application.cvd_lastprompt$.lasturl
RadiSys Confidential
125
There is also a family of shadow variablesthe application.lastresult$.value shadow variablesdefined in [13], which can be used to reference the information resulting from DTMF collection. These are shown in Table 4-3
Table 4-3 DTMF Collection Variables
Shadow Variable application.lastresult$.interpretation Description Contains the last set of collected input with the following exceptions: For Boolean type, the variable contains true or false if there was a match. Contains the digits otherwise. For Currency type and DTMF, the asterisk (*) is converted to a period (.) in match cases. For example, input 1*23 is converted to 1.23. For no match cases the literal digits are assigned. For speech, contains the value returned from the speech server interpreting the input. Contains the raw input that was received. In the example given above for Currency, the variable would report 1*23 and not 1.23. For Boolean variables, the parameter would report it would be what was entered and not true or false. For most other cases utterance and interpretation will be the same. Contains the input mode. This is either dtmf or voice, depending on which grammars were active and which one produced the event. This shadow variable is initialized with a value of undefined and updated later with actual value (dtmf or voice) only when the <grammar> element is executed.
application.lastresult$.utterance
application.lastresult$.inputmode
126
RadiSys Confidential
<prompt>
application.cvd_lastresult$.termcond
application.cvd_lastresult$.faxtype
RadiSys Confidential
127
Usage Guidelines
The <prompt> element queues recorded audio, multimedia, Text to Speech (TTS), or recorded audio as prompts to be played to the user. Recorded media prompts are played by embedding the <audio> element within the <prompt> element. TTS clips can be specified as SSML, or as plain text strings with embedded SSML elements in the string. The variable prompt specified in the <say-as> element is treated as TTS string. All TTS strings (except those variable prompts to be played using the media servers built-in sets and variables subsystem) are compiled within the media server into SSML scripts and passed to the TTS speech synthesizer to be played, provided an active speech synthesizer server is configured. If no server is configured, the string is simply ignored. All attributes of the <prompt> element, with the exception of the vcrprompt attribute, apply to TTS clips in the same way that they do to prerecorded audio clips.
Prompt Controls
The vcrprompt attribute and the associated prompt controls are not supported for TTS or multimedia clips. If a TTS clip (including variable prompts to be played using the media servers built-in sets and variables subsystem) is specified within a <prompt> element that has prompt controls enabled (that is, vcrprompt is true), is ignored and will be neither queued nor played. If a multimedia clipclip is specified within a <prompt> element that has prompt controls enabled, an error is returned.
128
RadiSys Confidential
<prompt>
True
False
Empty
True
False
Contains digits
False
True
Empty
False
True
Contains digits
False
False
Empty
False
False
Contains digits
RadiSys Confidential
129
<promptcontrol>
Specifies media controls for user prompt manipulation. Parent element: Child elements:
Attributes <field>, <form>, <menu>, <vxml> <controlcmd>
The <promptcontrol> element allows you to define VCR-like controls for playing of audio files. Prompt controls are not supported for TTS clips. The <promptcontrol> element encloses the <controlcmd> element, which specifies a set of DTMF inputs and associated actions controlling the play of the specified audio. Voice inputs for prompt controls are not supported. The scope of the <promptcontrol> element and the setting of the vcrprompt attribute of the <prompt> element determine when prompt control actions are in effect. The media server supports the following prompt controls: Pause/resume Skip forward/skip backward Volume up/volume down
130
RadiSys Confidential
<property>
<property>
Sets the value of a property. Parent element: Child elements:
Attributes <field>, <form>, <menu>, <record>, <subdialog>, <vxml>
None.
name value
Mandatory. The name of the property being updated. Unrecognized properties are ignored. There is no default. Mandatory. The new value for the property. The range of values depends on the property. Specifying an invalid value for the property will result in an error.semantic. For information about the valid values for supported VoiceXML properties, please see Chapter 2: VoiceXML Properties.
Usage Guidelines
The <property> element allows an application to modify the value associated with a property. For a description of supported properties, please see Chapter 2: VoiceXML Properties. The scope of the propertys value of the property is inherited from the parent element, and applies to all child elements. The lowest level value assignment for the property value overrides all higher level assignments. If no values are explicitly assigned then the default property value will be used whenever required.
RadiSys Confidential
131
<prosody>
[SSML ] Permits control of the pitch, speaking rate and volume of the speech output Parent element: Child elements:
Attributes <speak>,
None.
Optional. The baseline pitch for the contained text. Optional. Sets the actual pitch contour for the contained text. Optional. Tthe pitch range (variability) for the contained text. Optional. The change in the speaking rate for the contained text. Optional. The desired time to take to read the element contents. Optional. The volume for the contained text in the range 0.0 to 100.0.
Usage Guidelines
The <prosody> element permits control of the pitch, speaking rate and volume of the speech output.. Although each attribute individually is optional, it is an error if no attributes are specified when the prosody element is used.
Interoperability Notes
For some speech servers: All values associated with the pitch attribute are ignored in elements supporting this attribute. All values associated with the duration attribute are ignored in elements supporting this attribute. The contour, duration, pitch, and range attributes of the <prosody> element are ignored.
132
RadiSys Confidential
<record>
<record>
Records user audio, video, or multimedia to a file. Parent element: Child elements:
<form> <audio>, <catch>, <error>, <filled>, <grammar>, <help>, <noinput>, <nomatch>, <prompt>, <property>
Attributes
name
Mandatory. Specifies the name of a variable that will hold the recording. For This name will be used as an internal reference to the file after the recording is complete. To play the recorded file, reference this variable name. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words. The name must be unique across all <record> elements within the same scope. Note that recordings stored internally are transient, and are deleted at the end of the session. To store recorded audio persistently, you must specify an external NFS or HTTP server. Unless you specify otherwise (using the cvd:dest or cvd:destexpr attribute) all recordings are internal and transient.
expr
Optional. An ECMAScript expression representing the initial value of the name variable. If initialized to a value, the recording will not start unless the name variable is cleared. The default is the ECMAScript value undefined.
cond
Optional. A Boolean ECMAScript expression. The recording is started if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true. Ignored.
modal
RadiSys Confidential
133
beep
Optional. Specifies whether to play a short fixed beep tone just prior to beginning the recording. The location of this beep tone is configurable using the media servers management interface. Supported values are as follows: true: The beep tone will be played just prior to recordings. false: No beep tone is played before recordings. The default is false.
maxtime
Optional. Specifies a maximum recording time. If reached, the recording is terminated. In this case, the shadow variable name$.maxtime is set to true. Optional. Specifies the duration of post-speech silence time which, if exceeded, will terminate the recording. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The default is 5s. Note that a finalsilence value of 0 specifies that no post-speech trimming should be performed on the recording. This applies to both externally and internally recorded files.
finalsilence
dtmfterm
Optional. Specifies whether a DTMF key press can terminate the recording. Supported values are as follows: true: The recording will be terminated if any DTMF key is pressed, provided that the inputmodes property is set to dtmf or dtmf voice. If the inputmodes property is set to voice, DTMF key presses are ignored and the recording is not stopped. false: DTMF key presses will not terminate the recording. The default is true, so that by default, any DTMF keypress will terminate the recording. The setting of the dtmfterm attribute can interact with the setting of the fax detection property com.cvd.faxdetect. For that information, please see Chapter 2: VoiceXML Properties.
134
RadiSys Confidential
<record>
format
Mandatory. Specifies the file type and encoding scheme for the recording. Supported formats are shown in Table 4-6 on page 137. The default is audio/wav.
cvd:dest
Optional. RadiSys extension. Specifies the destination for a recording for either of two cases:
1 External recording. Specifies the URI of an external NFS or HTTP
server where the audio recording is to be stored persistently. The URI must conform to the guidelines given for specifying external recordings described in Working with Media Files and TTS Strings on page 28. The recording is made in real time to the specified URI.
2 Appending to an existing recording. If cvd:append is set to true, the
media server appends the recording to an existing recording referenced by cvd:dest. The existing recording may be either internal or external. For details on appending recordings, please see the Usage Guidelines, below. Only one of cvd:dest and cvd:destexpr may be specified. cvd:destexpr Optional. Specifies an ECMAScript expression that evaluates the URI of an external NFS or HTTP server where the audio recording is to be stored persistently. The URI must conform to the guidelines given for specifying external recordings described in Working with Media Files and TTS Strings on page 28. An error.semantic being thrown and session termination if the script evaluates to the ECMAScript value undefined. Only one of cvd:dest and cvd:destexpr may be specified. cvd:append Optional. Directs the media server to append this recording to the existing recording specified by cvd:dest. For details on appending recordings, please see the Usage Guidelines, below. Valid only for audio files; append is not supported for files containing video. If this attribute is specified for a file containing multimedia, and error.badfetch is thrown.
RadiSys Confidential
135
Shadow Variables
A shadow ECMAScript variable is created for each recording. The shadow variable is name$ where name is the name specified by the name attribute. At the end of the recording, information about the recording, such as its total length, is available to the VoiceXML application. Table 4-5 shows the shadow variables defined to provide information about recordings.
Table 4-5 Recording Shadow Variables
Shadow Variable name$.duration Description Contains the length of the recording in milliseconds. The length reported includes the length of all announcements played plus any silence played between them. When appending to an existing audio recording, the duration amount indicates the length of just the appended portion. Contains the length of the recording in bytes. When appending to an existing recording, the size amount indicates the size of just the appended portion. Contains the DTMF termination key, if a DTMF termination key was specified at the time of start of recording and if the recording was terminated as a result of detecting the termination key. Detection of a fax tone terminating the record results in the termchar shadow variable being set to F. Indicates whether the recording was terminated as a result of reaching maximum recording time. Supported values are as follows: true: The recording terminated as a result of reaching the maximum allowed time. false: Reaching the maximum allowed time was not the reason for termination.
name$.size
name$.termchar
name$.maxtime
Usage Guidelines
The <record> element allows a user audio, video, or multimedia recording to be made. Recorded audio is assigned a variable name using the name attribute. This name can be referenced within the <audio> to play back the recorded media.
136
RadiSys Confidential
<record>
Memory for internal recordings is limited, and it is recommended that longer recordings be streamed to an external server. External recordings use the HTTP PUT method, which permits real-time transfer while the recording session is in progress. The destination is specified using either the cvd:dest or the cvd:destexpr attributes.
Encoding of Recordings
Recordings are encoded as either G.711 or G.729 for audio files, and as QuickTime or 3GP format for video files. The format of the recording can be specified using the format attribute. If not specified, the format of the recording will be that configured as default, which is set using the media servers management interface. Table 4-6 shows the encoding formats supported for recordings. The format must be entered exactly as shown; in particular, no spaces or other characters are permitted other than those shown. The codecs parameter used by 3GPP MIME types is defined in RFC 4281 [13].
Table 4-6 Supported Encoding Formats for Recordings
Format Description
audio/wav audio/x-wav audio/vnd.wave; codec=1 audio/vnd.wave; codec=6 audio/vnd.wave; codec=7 audio/vnd.wave; codec=83 video/quicktime; codecs=h263 video/quicktime; codecs=h263, alaw
PCMU-encoded WAV file. (Audio-only.) PCMU-encoded WAV file. (Audio-only.) PCM. (Audio-only.) G.711 a-lawencoded WAV file. (Audio-only.) G.711 u-lawencoded WAV file. (Audio-only.) G.729 Annex Aencoded WAV file. (Audio-only.) QuickTime file with H.263-encoded video. (Video-only.) QuickTime file with H.263-encoded video and G.711 a-lawencoded audio. (Multimedia.)
RadiSys Confidential
137
Format
Description
QuickTime file with H.263-encoded video and G.711 u-lawencoded audio. (Multimedia.) QuickTime file with G.711 a-lawencoded audio. (Audio-only.) QuickTime file with G.711 u-lawencoded audio. (Audio-only.) 3GPP file with H.263-encoded video and AMR-encoded audio. (Multimedia.) Note that order matters and extra spaces are not allowed. 3GPP file with H.263-encoded video. (Video-only.) 3GPP file with AMR-encoded audio. (Audio-only.)
video/3gpp;codecs=s263 audio/3gpp;codecs=samr
138
RadiSys Confidential
<record>
For recordings containing videos, including multimedia recordings, the timeout property represents the time to wait for the first video I-frame. A noinput event thrown for a multimedia recording always means that the I-frame was not received in the time specified by the timeout property at the time the recording was made.
Appending to a Recording
The media server supports appending to an existing recording, for internal files or files stored on NFS servers. This mechanism is not supported for recordings on HTTP servers and it is not supported for files containing video; if attempted for either of these, an error.badfetch is thrown. The append function is enabled by setting the cvd:append attribute to true. When you append to an existing recording, you essentially make a request to create a new recording, which consists of the original recording plus the appended audio. Recording names must be unique within a session. This means that the name of the original recording cannot be reused in the request to append. It is necessary to specify a new name for the appended recording because names for recordings must be unique within the session; therefore, the old recording name cannot be reused for the new file. Instead, the appended file must be given a new unique name. For example, suppose the original recording is given the name record1, using the following request to record.
The request to append must use a new identifier for the file that will result after appending: this is record2 in the example. The file to append to (that is, the original recording record1) is specified using the cvd:dest attribute, as follows:
The cvd:dest value in conjunction with the cvd:append=true expression notifies the VXML interpreter to record to an existing file and not to a new file. In this example, the shadow variables associated with record1 will reflect values associated with the original recording, while shadow values associated with the appended recording will be referenced using the name record2.
RadiSys Confidential
139
Also, note that in this example, the code would need to be executed in the same VXML script to ensure that the record1 variable does not go out of scope. If these operations are to span multiple documents, the value of this variable must be assigned to an application scope variable. Table 4-7 shows the recording behavior depending on the various values for cvd:append and cvd:dest (or cvd:destexpr). Note that all these cases assume that the value set for the name attribute is unique (that is, unfilled) for this session. If the name attribute is defined (filled) then the recording does not occur, as specified in [13].
Table 4-7 Summary of append Behavior
cvd:append False False False True True cvd:dest and cvd:destexpr Undefined Internal recording External recording Undefined Internal recording Behavior The recording is treated as a normal internal recording. An error.badfetch is thrown. The cvd:dest attribute must specify an external recording is UNLESS cvd:append=true. The recording is treated as a normal internal recording. If the external recording already exists, it is overwritten. Creates a new recording. This is equivalent to cvd:append missing or false. File exists: Current recording is appended to the existing file, assuming that internal recording variable used evaluates to valid recording content then the recording proceeds. If the variable used to represent the existing internal recording does not evaluate as defined, or is incorrectly formatted then the call is rejected with an error.semantic. File does not exist: New file is created and recording occurs on that new file, assuming that internal recording variable used evaluates to valid recording content then the recording proceeds. If the variable used to represent the existing internal recording does not evaluate as defined, or is incorrectly formatted then the call is rejected with an error.semantic. File exists: Current recording is appended to the existing file. File does not exist: New file is created and recording occurs on that new file.
True
External recording
140
RadiSys Confidential
<reprompt>
<reprompt>
Repeats a prompt for user input. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
The <reprompt> element allows the application to revisit an originating prompt from an event handler, such as the <catch> element. This mechanism, along with incrementing prompt counters, can be used to vary prompts to the user when user input does not match expected results.
RadiSys Confidential
141
<return>
Return from a subdialog to the calling dialog. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
event
Optional. Throws the specified event in the calling dialog after the return from the subdialog. For a list of supported events, please see the section Events on page 24. There is no default. Only one of event, eventexpr, and namelist may be specified. Otherwise, an error.badfetch is thrown.
eventexpr
Optional. Throws the event resulting from evaluation of the specified ECMAScript expression in the calling dialog after the return from the subdialog. For a list of supported events, please see the section Events on page 24. There is no default. Only one of event, eventexpr, and namelist may be specified. Otherwise, an error.badfetch is thrown.
namelist
Optional. Returns the specified list of variable names to the calling dialog. Format is a space-separated list of variable names. By default, the calling context receives an empty ECMAScript object back. Note that specifying a namelist does not cause an event to be thrown.
message
Optional. Returns the specified message string, along with the event name, to the calling dialog when an event is thrown. There is no default. The message string can be accessed within the <catch> element of the calling dialog using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler when an event is thrown, along with the event name. There is no default. The message string can be accessed within the <catch> element of the calling dialog using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
142
RadiSys Confidential
<return>
Usage Guidelines
The <return> element terminates the execution of a subdialog, and returns control back to the calling dialog and, optionally, data. The <return> element can also be used to throw an event in the calling dialog, such as a nomatch event. For example, <return event=nomatch/> will trigger the nomatch event handler in the calling dialog. In addition, the <return> element can be used to return results to the calling dialog. For example, suppose the variable cardnumber is defined within a subdialog and populated by user input. Then <return namelist=cardnumber/> returns the cardnumber to the calling dialog, which can access its value using subdialog-name.cardnumber, where subdialog-name is the name specified for the subdialog.
RadiSys Confidential
143
<rule>
[SRGS] Defines a grammar rule for an inline DTMF or voice grammar. Parent element: Child elements:
Attributes <grammar> <item>, <one-of>
id
Mandatory. An identifier for the rule. The identifier must be unique within the grammar. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore.
scope
The scope of this rules grammar. The supported value is as follows: public: This rule may be referenced by other rules within the current grammar, and by rules in other grammars. private: Not supported. Strictly speaking, this attribute is optional. However, the default defined in the VoiceXML 2.0 specification is private, which is not supported by the media server. Therefore, the application should explicitly include the scope attribute with a value of public (scope=public). This will ensure correct interworking with the media server if full grammar scoping capabilities are implemented.
Usage Guidelines
The <rule> element defines an inline XML grammar rule for DTMF or voice. All SRGS grammars must have a valid set of rules or items to be considered a valid grammar. Grammars that evaluate to empty, that is have no defined items within the grammar are rejected, with session termination and an error.grammar. Grammars that contain tokens not enclosed in <item> elements are ignored. Only one rule may be at any time. Thus, for inline grammars that could active concurrently, one grammar will actually be active . The second grammar that defines its own rule or omits the rule is ignored. To enable concurrent DTMF and voice grammars, two grammars must be defined at the same level of scope within a VoiceXML script.
144
RadiSys Confidential
<ruleref>
<ruleref>
[SRGS] Allows another voice grammar rule to be included. Parent element: Child elements:
Attributes <grammar> <item>, <one-of>
id
Mandatory. An identifier for the voice grammar rule. The identifier must be unique within the grammar. The format is an XML name token without colons (:). The name token may be composed of alphabetic letters, digits, period (.), underscore (_), and hyphen (-). The name must begin with a letter or underscore.
scope
The scope of this rules grammar. The supported value is as follows: public: This rule may be referenced by other rules within the current grammar, and by rules in other grammars. private: Not supported. Strictly speaking, this attribute is optional. However, the default defined in the VoiceXML 2.0 specification is private, which is not supported by the media server. Therefore, the application should explicitly include the scope attribute with a value of public (scope=public). This will ensure correct interworking with the media server if full grammar scoping capabilities are implemented.
Usage Guidelines
The <ruleref> element defines an inline XML grammar rule. Currently, only voice grammar rules are supported; DTMF grammar rules are not supported. All SRGS grammars must have a valid set of rules or items to be considered a valid grammar. Grammars that evaluate to empty, that is have no defined items within the grammar are rejected, with session termination and an error.grammar. Grammars that contain tokens not enclosed in <item> elements are ignored. Only one rule may be at any time. Thus, for inline grammars that could active concurrently, one grammar will actually be active. The second grammar that defines its own rule or omits the rule is ignored. To enable concurrent DTMF and voice grammars, two grammars must be defined at the same level of scope within a VoiceXML script.
RadiSys Confidential
145
<s>
[SSML] Represents a sentence.
[
<speak>
.<audio>, <break>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang
Usage Guidelines
The use of the <s> element is optional. Where text occurs without an enclosing <p> or <s> element, the synthesis processor attempts to determine the structure using language-specific knowledge of the format of plain text.
Interoperability Notes
For some speech servers: Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements.
146
RadiSys Confidential
<say-as>
<say-as>
[SSML] Defines a text string to be rendered as an audio clip. Parent element: Child elements:
Attributes <prompt> <value>
interpret-as
Mandatory. Used for VoiceXML variables to indicate the type of the variable. Supported values are date, time and digits. Supported variable types for VoiceXML are described in the Convedia Media Server Sets and Variables Interface Reference Guide. Optional or mandatory, depending on the variable type specified by the interpret-as attribute. (Currently, all supported variable types have mandatory subtypes.) Used for VoiceXML variables to indicate the subtype of the variable. Supported variable subtypes for VoiceXML are described in the Convedia Media Server Sets and Variables Interface Reference Guide. Ignored.
format
detail
Usage Guidelines
The media server uses the SSML <say-as> element to allow the control agent to use a subset of the media servers sets and variables processing subsystem. For general information on the media servers sets and variables feature, please see the Convedia Media Server Sets and Variables Interface Reference Guide. The <say-as> element is used as a child of the <prompt> element to specify the variable to be rendered in the prompt. The media server uses its built-in sets and variables processing subsystem; no TTS server is required. However, all the clips to be played must be internally provisioned on the media server and an audio segment configuration file (the sets and variables configuration file) must also be provisioned on the media server. See the Convedia Media Server Sets and Variables Interface Reference Guide for this information. The <say-as> element can contain either a child <value> element specifying the variable to be rendered or a plain text string specifying the variable. The variable type is indicated by the interpret-as attribute and the variable subtype is indicated by the format attribute. Supported variable types and subtypes are described in the Convedia Media Server Sets and Variables Interface Reference Guide. If the value of the variable is out of the supported range, the media server terminates the call, without throwing an error.semantic event. In general, the language in which the variable is to be rendered is specified by the xml:lang attribute at either the document level (that is, within the <vxml> element) or within the <prompt> element. Currently, the only supported value is en (English).
RadiSys Confidential
147
<script>
Executes ECMAScript (JavaScript) code. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <form>, <help>, <if>, <menu>, <noinput>, <nomatch>, <vxml>
None.
Optional. Specifies the URI to the script, if the script is external. If not specified, the media server expects the script to be defined inline. Ignored. Ignored. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
148
RadiSys Confidential
<script>
Usage Guidelines
The <script> element specifies ECMAScript client-side logic. The results of the computation performed by the script can be returned to the caller and stored in a variable. The contents of the variable can be used later by the VoiceXML application for general use, such as conditional logic or dialogs utilizing the variable. The script can be fetched externally or it can be specified in-line.
RadiSys Confidential
149
<speak>
[SSML] The root element of SSML.
[
<?xml>
.<audio>, <break>, <emphasis>, <mark>, <meta>, <metadata>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
Mandatory. Specifies the language of the root document. Optional. Specifies the base URI of the root document Mandatory. The SSML version. The only supported value is 1.0.
Usage Guidelines
The <speak> element is the root element of the Speech Synthesis Markup Language (SSML), which is an XML application for speech synthesis. The <speak> element is not supported directly in VoiceXML scripts. Rather, all TTS scripts are rendered into <speak> SSML XML scripts, which are then passed to an external server for playing. Including a <speak> element with TTS text in a VoiceXML document will cause a parse error.
Interoperability Notes
For some speech servers: Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements.
150
RadiSys Confidential
<sub>
<sub>
[SSML] Replaces the contained text with a substitute.
[
<speak>
.None.
alias
Usage Guidelines
The <sub> element The sub element is employed to indicate that the text in the alias attribute value replaces the contained text for pronunciation. This allows a document to contain both a spoken and written form. The required alias attribute specifies the string to be spoken instead of the enclosed string. The processor should apply text normalization to the alias value. The <sub> element can only contain text to be rendered.
Interoperability Notes
For some speech servers: The specification states that the alias attribute of the <sub> element is mandatory; however, this is not enforced by the speech server.
RadiSys Confidential
151
<subdialog>
Invokes another dialog, from which control will eventually return. Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <form>, <help>, <noinput>, <nomatch> <audio>, <catch>, <error>, <filled>, <help>, <noinput>, <nomatch>, <param>, <prompt>, <property>
Attributes
name
Optional. Defines a variable with the specified name, which will hold the return values returned by the <subdialog> element. The scope of the returned value is limited to the form. The values are returned from the subdialog in the namelist specified in the <return> element. The return values can be accessed using the shadow variable name$.ReturnedVariableName. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words. There is no default.
expr
Optional. An ECMAScript expression assigning the initial value of the form item variable defined by name. If the initial value is set using this attribute, the form item will not be executed until the variable is cleared (for example, by using the <clear> element). The default is the ECMAScript value undefined. Optional. A Boolean ECMAScript expression. The subdialog is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true. Optional. Specifies a set or list of variables to submit to the subdialog. Any declared VoiceXML or ECMAScript variable, including shadow variables, can be included in the list. By default, no variables are submitted.
cond
namelist
152
RadiSys Confidential
<subdialog>
src
The URI of the subdialog. The URI must comply with the XML anyURI format. If the subdialog is contained within the current document, the format is #dialog-name; for example #SubdialogX. Exactly one of src and srcexpr must be specified. Otherwise, an error.badfetch is thrown.
srcexpr
An ECMAScript expression evaluating to the URI of the subdialog. The URI resulting from the expression must comply with the XML anyURI format. Exactly one of src and srcexpr must be specified. Otherwise, an error.badfetch is thrown.
method
Optional. Specifies the HTTP method to be used in submitting. Supported values are as follows: get: An HTTP GET method will be used. post: An HTTP POST method will be used. The default is get.
enctype
Optional. The MIME encoding method to be used in submitting. The only supported value is application/x-www-form-urlencoded. This is the default. Ignored. Ignored.
fetchaudio fetchhint
RadiSys Confidential
153
fetchtimeout
Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <subdialog> element provides ability to transition to a new interaction, much like a function call. Subdialogs are useful in creating and organizing commonly used dialog functions as a libraries, which can be reused by many applications. When the subdialog is complete, control is returned to the calling dialog. The state of the calling dialog (active grammars, variables, event handlers, and so on) are preserved when the called dialog is invoked, and restored when the called dialog returns control back to the calling dialog. The calling dialog can pass variables to the called dialog using the namelist attribute of the <subdialog> element. The called dialog returns control back to the calling dialog by executing the <return> element, and the <return> element can also return variables from the subdialog to the calling dialog. Unlike a subroutine, the called dialog does not have access to any information from the context of the calling dialog. This is because the calling and the called dialogs execute in two separate and independent execution contexts. Thus, for example, events thrown in the called dialog must be handled in that dialog; they cannot invoke event handlers in the calling dialog. In addition, variables scoped by the calling dialog are not accessible by the called dialog, and any variables scoped by the called dialog are not accessible when control returns back to the calling dialog.
154
RadiSys Confidential
<submit>
<submit>
Submit application values and fetch a new document, transitioning to a new dialog. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
next
Submits to the specified URI. The URI must comply with the XML anyURI format. Exactly one of next and expr must be specified. Otherwise, an error.badfetch is thrown.
expr
Submits to the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format. Exactly one of next and expr must be specified. Otherwise, an error.badfetch is thrown.
namelist
Optional. The variables to submit as data. Format is a space-separated list of variable names. Both VoiceXML and ECMAScript variables can be included. By default, all named input item variables are submitted. Optional. Specifies the HTTP method to be used in submitting. Supported values are as follows: get: An HTTP GET method will be used. post: An HTTP POST method will be used. The default is get.
method
enctype
Optional. The MIME encoding method to be used in submitting. The only supported value is application/x-www-form-urlencoded. This is the default. Ignored. Ignored.
fetchaudio fetchhint
RadiSys Confidential
155
fetchtimeout
Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.
maxage maxstale
Ignored. Ignored.
Usage Guidelines
The <submit> element allows the application to submit variables to an external HTTP server and transition control to a new VoiceXML document. The variables to be sent are listed in the namelist attribute. This data is sent as URI-encoded parameters to the HTTP server. Data can be sent using either the HTTP GET or the HTTP POST method. The values submitted can be fixed strings, internal variables (for example, field items or property variables), or ECMAScript expressions. Expressions are evaluated first and then converted to strings before submitting. The execution of a <submit> element will always result in a document fetch. The document specified by the next or the expr attribute is returned by the HTTP server, and application control transitions to this document.
156
RadiSys Confidential
<throw>
<throw>
Generates an event to be handled by <catch>. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
event
Throws the specified event. The event may be predefined, or application-specific. For a list of supported events, please see the section Events on page 24. Exactly one of event and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
eventexpr
Throws the event resulting from evaluation of the specified ECMAScript expression. The event may be predefined, or application-specific. For a list of supported events, please see the section Events on page 24. Exactly one of event and eventexpr must be specified. Otherwise, an error.badfetch is thrown.
message
Optional. Returns the specified message string to the event handler, along with the event name. There is no default. The message string can be accessed using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
messageexpr
Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default. The message string can be accessed using the _message implicit variable. Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.
Usage Guidelines
The <throw> element throws the specified event to be caught by the <catch> element. The event can be pre-defined (for example, a nomatch event), or it may be application-specific.
RadiSys Confidential
157
<value>
Inserts the value of an expression into a log message or prompt. Parent element: Child elements:
Attributes <log>, <say-as>
None.
expr
Mandatory. An ECMAScript expression, the value of which will be inserted into the log message.
Usage Guidelines
The <value> element is used in the <log> element to insert the text of the log message into the log. In this context, the <value> element can be used to de-reference ECMA script expressions and include them in the output of the <log> message. Note that all <log> messages are written to syslog at a severity level of ERROR The <value> element is used in the <say-as> element to insert the value of an expression into a prompt.
158
RadiSys Confidential
<var>
<var>
Declares a variable and assigns it a value. Parent element: Child elements:
Attributes <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>
None.
name
Mandatory. The name of the variable. The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (_) or contain a period (.). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words.
expr
Optional. An ECMAScript expression representing the value of the variable. If not specified, then if the variable was previously declared, it retains its original value. Otherwise, the ECMAScript value undefined is assigned to the variable.
Usage Guidelines
The <var> element declares a variable and assigns it a value. Proper scoping rules are observed as defined in [13]. The naming of user-defined variables adheres to the naming convention specified in Section 5.1 of [13]. The maximum length of a variable is 256 characters. In general, naming errors result in an error.semantic being thrown.The exception is the error where variable names end in the dollar sign ($). This error results in an error.badfetch.
RadiSys Confidential
159
<voice>
[SSML] Requests a change in speaking voice.
[
<speak>
.<audio>, <break>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>
Attributes
xml:lang gender
Optional. Specifies the language of the paragraph. Optional. Indicates the preferred gender of the voice to speak the contained text. Supported values are as follows: male: Use a male voice. female: Use a female voice. neutral: Use a neutral voice.
age variant
Optional. Indicates the preferred age since birth, in years, of the voice to speak the contained text. The range is a non-negative integer. Optional. Indicates a preferred variable of the other voice characteristics to speak the contained text (for example, the second male child voice). Valid values are of the type positive integer. Optional Indicates a processor-specific voice name to speak the contained text. The value may be a space-separated list of names ordered from most-preferred to least-preferred. Consequently, a name may not contain any white space.
name
Usage Guidelines
The <voice> element is a production element that requests a change in speaking voice. Although each attribute individually is optional, it is an error if no attributes are specified when the <voice> element is used. The <voice> element is commonly used to change the language. When there is not a voice available that exactly matches the attributes specified in the document, or there are multiple voices that match the criteria, a voice selection algorithm must be used. Approximately speaking, the xml:lang attribute has the highest priority and all other attributes are equal in priority but below xml:lang.
160
RadiSys Confidential
<voice>
Interoperability Notes
For some speech servers: Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements. All attributes of the <voice> element are ignored.
RadiSys Confidential
161
<vxml>
The root element for VoiceXML. Defines the set of actions that form a VoiceXML dialog. Parent element: Child elements: None. The root element for VoiceXML.
<catch>, <error>, <form>, <help>, <link>, <menu>, <noinput>, <nomatch>, <promptcontrol>, <property>, <script>, <var>
Attributes
Mandatory. The W3C specification of the enclosed VoiceXML document. Supported values are 2.0 and 2.1. Mandatory. The namespace of the VoiceXML document. The only supported value is http://www.w3.org/2001/vxml. Optional. Allows a base URI to be defined. If set, any relative URIs within the document are resolved using this base URI. Optional. Specifies the language identifier for this document. If not specified, the default is en (English). If specified, the language identifier is inherited by all elements in the document that use the xml:lang attribute. Note that a value specified for xml:lang within an element overrides that specified at the document level. Specifying an unsupported language results in an error.unsupported.language event. Optional. The URI of this documents root document, if any. If specified, the implication is that this document is a leaf document. Optional. The namespace of the XML schema defined for the cvd prefix, which indicates a RadiSys extension. This is optional for any VoiceXML script that uses a cvd prefix, such as cvd:append, cvd:dest, or cvd:destexpr.
application xmlns:cvd
162
RadiSys Confidential
<vxml>
The media server accepts 2.1 as the value of the version attribute of the <vxml> elemement. Where VoiceXML version 2.1 differs from version 2.0, the media server complies with version 2.0, with the following exceptions: The media server supports the elements described in Chapter 4: VoiceXML 2.0 Elements, as defined in [13]. The media server supports the ECMAScript binding Level 2 subset of the Document Object Module (DOM) as described in Chapter 4: VoiceXML 2.0 Elements.
RadiSys Confidential
163
164
RadiSys Confidential
165
RadiSys Confidential
166
RadiSys Confidential
167
RadiSys Confidential
168
RadiSys Confidential
Chapter5:
This chapter describes the VoiceXML 2.1 elements currently supported by the Convedia Media Server. The VoiceXML 2.1 language is defined by the W3C Recommendation specifying the language [14]. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported in this release of the Convedia Media Server. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.
RadiSys Confidential
169
<data>
Fetches XML data from a document server without transitioning to a new VoiceXML document. Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <foreach>, <help>, <if>, <noinput>, <nomatch>, <vxml>
None.
Attributes
src
The URI representing the location of the XML data to retrieve. Only HTTP URIs are supported. The URI must comply with the XML anyURI format. If a relative URI is specified, it is qualified using the base URI Exactly one of src and srcexpr must be specified; otherwise, an error.badfetch is thrown.
name
Optional. The name of a variable exposing the Document Object Module (DOM). If this attribute is not specified, the retrieved content is ignored. An ECMAScript expression representing the new value of the variable. This value dynamically determines the URI at the time that the data needs to be fetched. A URI resulting from the expression must comply with the XML anyURI format. Exactly one of src and srcexpr must be specified; otherwise, an error.badfetch is thrown.
srcexpr
method
Optional. The request method. Supported values are get and post.The default value is get.
170
RadiSys Confidential
<data>
namelist
Optional. The list of variables to submit. Supported values are as follows: Individual variable references which are submitted with the same qualification used in the namelist. Declared VoiceXML and ECMAScript variables can be referenced. The media server supports ECMAScript objects in namelist with following restrictions: The value of the method attribute must be post. If the value of the method attribute is get, the media server raises an error.badfetch exception. The maximum nesting level is four if the ECMAScript object contains other objects. The body of the post request contains the ECMAScript object as an XML file. The XML file contains all nested objects, each contained within an XML element. Properties of all objects are each represented as an XML element, for which the property name is the element name and the property value is the content. When the enctype is application/x-www-form-urlcoded the XML is sent in the post body as a single line using standard escaping rules and without whitespace. When the enctype is text/xml the XML is sent in the post body in standard XML format. By default, no variables are submitted.
enctype
Optional. The media encoding type of the submitted document. Supported value are as follows: application/x-www-form-urlencoded text/xml (only when the namelist is an ECMAScript object) The media server returns an error.batch if an unsupported value is specified (e.g. multipart/form-data) or if text/xml is specified when the namelist is not an ECMAScript object. The default value is application/x-www-form-urlencoded.
RadiSys Confidential
171
fetchaudio The maximum length of the URI string is 255 characters. The supported fetchaudio source is internal provisioned clips and external NFS or HTTP. Clip type must be audio-only, video-only, or multimedia. TTS, RTSP media and sets and variables are not supported. The playing of the audio clip is governed by the fetchaudiodelay and fetchaudiominimum properties in effect at the time of the fetch. fetchhint fetchtimeout Optional. Ignored. Optional. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out. The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (.) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (+). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted. For time values, the media server supports a range from 0 milliseconds to 2^311 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^311. Examples of time are: 100ms, 50s, 20.5s, and +600ms. The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties. maxage maxstale Optional. Ignored. Optional. Ignored.
Usage Guidelines
The <data> element fetches XML data without transitioning to a new XML document. The XML data fetched by the <data> element is bound to an ECMAScript through the variable named by the name attribute; this variable exposes a read-only subset of the W3C Document Object Model (DOM). If the content cannot be retrieved, the media server raises an error.badfetch exception. If the retrieved content is not well-formed XML, the media server raises an error.semantic exception.
172
RadiSys Confidential
<data>
The media server supports only US-ASCII characters in UTF-8 encoding format in XML documents retrieved with the <data> element. The media server does not support the access-control feature of the <data> element.
RadiSys Confidential
173
<foreach>
Allows a VoiceXML application to iterate through an ECMAScript array, executing the content of each array item.. Parent and child elements for a <foreach> element used within executable content: Parent element: Child elements:
<block>, <catch>, <error>, <filled>, <foreach>, <help>, <if>, <noinput>, <nomatch> <audio>, <assign>, <clear>, <data>, <disconnect>, <exit>, <foreach>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>
Parent and child elements for a <foreach> element used within a <prompt> element: Parent element: Child elements:
<foreach>, <prompt> <audio>, <break>, <foreach>
Attributes
array item
Mandatory. An ECMAScript expression that must evaluate to an ECMAScript array. Mandatory. The variable that stores each array item upon each iteration of the loop. If the variable is not already defined within the parents scope, a new variable is declared.
Usage Guidelines
The <foreach> element allows a VoiceXML element to execute content from within an ECMAScript array. Both the array and item attributes must be specified; otherwise, the media server raises an error.badfetch exception. If the resulting evaluation of the array does not satisfy the instanceof(Array) statement in the ECMAScript, the media server raises an error.semantic exception. The <foreach> element operates on a shallow copy of the array specified by the array attribute; this means that only the reference is copied. For example, a shallow copy of an array of pointers to strings copies only the pointers, leaving the underlying character strings as the actual data (not copies). The <foreach> element may appear within executable content and as a chiild element of the <prompt> element. When the <foreach> element is within executable content it may itself contain elements of executable content. When the <foreach> element is within a <prompt> element, it can contain only elements that are valid in the <enumerate> element; that is: <audio>, <break>, and <foreach>.
174
RadiSys Confidential
<foreach>
The media server supports up to two levels of nesting with a <foreach> element. If the level of nesting is greater than two, the media server raises an error.semantic exception.
RadiSys Confidential
175
176
RadiSys Confidential
Chapter6:
This chapter describes the ECMAScript binding for the subset of Level 2 of the DOM. The ECMAScript binding for the subset of Level 2 of the Document Object Model (DOM) exposed by the <data> element is specified in Appendix D of the W3C Recommendation Voice Extensible Markup Language (VoiceXML) 2.1 [14]. The media server supports the following objects from this specification; specific support is described in this chapter. Attr Object CDATASection Object CharacterData Object Comment Object Document Object DOMException Prototype Object Element Object EntityReference Object NamedNodeMap Object Node Prototype Object NodeList Object ProcessingInstruction Object Text Object
RadiSys Confidential
177
Attr Object
For the Attr object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property name specified value ownerElement nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Boolean String Element String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes Yes No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Type
Parameter Name
178
RadiSys Confidential
CDATASection Object
CDATASection Object
For the CDATASection object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property data length nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Number String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name offset, count Parameter Type Number Number
RadiSys Confidential
179
CharacterData Object
For the CharacterData object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property data length nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Number String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name offset, count Parameter Type Number Number
180
RadiSys Confidential
CharacterData Object
Usage Guidelines
If a DOMException object is raised on retrieval of the CharacterData.data property or the CharacterData.substringData method and not caught by an ECMAScript execution handler, the media server raises an error.semantic exception.
RadiSys Confidential
181
Comment Object
For the Comment object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property data length nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Number String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name offset, count Parameter Type Number Number
182
RadiSys Confidential
Document Object
Document Object
For the Document object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property documentElement nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type Element String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name tagname Parameter Type String
RadiSys Confidential
183
Properties
Property code Type Number Read-Only No
Methods
None.
Usage Guidelines
If a DOMException object is raised and not caught by an ECMAScript execution handler on retrieval of the Node.nodeValue property, the CharacterData.data property, or the CharacterData.substringData method, the media server raises an error.semantic exception.
184
RadiSys Confidential
Element Object
Element Object
For the Element object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property tagName nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name name name name Parameter Type String String String
namespaceURI, String localName String namespaceURI, String localName String namespaceURI, String localName String name String
RadiSys Confidential
185
Parameter Name
Parameter Type
186
RadiSys Confidential
EntityReference Object
EntityReference Object
For the EntityReference Prototype object, the media server supports the following constants, properties, and methods.
Constants
Constant ELEMENT_NODE ATTRIBUTE_NODE TEXT_NODE CDATA_SECTION_NODE ENTITY_REFERENCE_NODE PROCESSING_INSTRUCTION_NODE COMMENT_NODE DOCUMENT_NODE Type Number Number Number Number Number Number Number Number Value 1 2 3 4 5 7 8 9
Properties
Property nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
RadiSys Confidential
187
6
Methods
Parameter Name
Parameter Type
188
RadiSys Confidential
NamedNodeMap Object
NamedNodeMap Object
For the NamedNodeMap object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property length Type Number Read-Only Yes
Methods
Parameter Name name index Parameter Type String Number
RadiSys Confidential
189
Properties
Property nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
190
RadiSys Confidential
Methods
Parameter Name Parameter Type
Usage Guidelines
If a DOMException object is raised on retrieval of the Node.nodeValue property and not caught by an ECMAScript execution handler, the media server raises an error.semantic exception.
RadiSys Confidential
191
NodeList Object
For the NodeList object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property length Type Number Read-Only Yes
Methods
Parameter Name index Parameter Type Number
Method item
Returns Node
192
RadiSys Confidential
ProcessingInstruction Object
ProcessingInstruction Object
For the ProcessingInstruction object, the media server supports the following constants, properties, and methods.
Constants
Constant ELEMENT_NODE ATTRIBUTE_NODE TEXT_NODE CDATA_SECTION_NODE ENTITY_REFERENCE_NODE PROCESSING_INSTRUCTION_NODE COMMENT_NODE DOCUMENT_NODE Type Number Number Number Number Number Number Number Number Value 1 2 3 4 5 7 8 9
Properties
Property target data nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String String String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only Yes No Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
RadiSys Confidential
193
6
Methods
Parameter Name
Parameter Type
Usage Guidelines
The media server parses and interprets the xmlprocessing instruction. The media server does not support any other processing instruction. Unsupported processing instructions cause the media server to raise an error.semantic exception. The media server does not generate any processing instruction objects.
194
RadiSys Confidential
Text Object
Text Object
For the Text object, the media server supports the following constants, properties, and methods.
Constants
None.
Properties
Property data length nodeName nodeValue nodeType parentNode childNodes firstChild lastChild previousSibling nextSibling attributes ownerDocument namespaceURI prefix localName Type String Number String String Number Node NodeList Node Node Node Node NamedNodeMap Document String String String Read-Only No Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes
Methods
Parameter Name offset, count Parameter Type Number Number
RadiSys Confidential
195
196
RadiSys Confidential
ApendixA:
This appendix describes some development practices that can help you maximize performance and capacity of your VoiceXML applications.
RadiSys Confidential
197
The coding practices recommended in this appendix are designed to guide developers in developing code for the RadiSys Convedia Media Servers VoiceXML interpreter. They are designed to help development partners achieve optimal performance on the RadiSys Convedia Media Servers VoiceXML interface.
1 Store permanent audio clips on the media server.
Provisioning permanent audio clips internally on the media server, rather than on an external NFS or HTTP server, allows more efficient clip retrieval. In addition, storing clips internally removes any issues relating to interconnectivity with the NFS or HTTP server that could occur, reducing debugging time.
2 If you store permanent clips externally, use NFS.
If you must store provisioned audio clips on an external server, RadiSys recommends using an NFS server. RadiSys currently does not recommend using HTTP for recording and playing back permanent audio clips. If you must record to an external HTTP server, use the <submit> element. This element records the file internally until it completes, and then uses the HTTP POST method to post the file to the HTTP server.
3 Record temporary audio clips on the media server.
If the application records audio clips for temporary use, it is most efficient to store the temporary clips internally on the media server. Clips that are recorded on the media server are transient: they are deleted when the connection with which they are associated is closed. They are also volatile: they will not survive a reset cycle.
4 Consolidate VoiceXML documents.
The number of document transitions, which have a high CPU overhead, can vary per application. In order to achieve higher capacity, consolidate the VoiceXML logic or flow to minimize the number of document transitions. In calculating performance characteristics, RadiSys assumes that the average number of transitions in a voicemail-type application to be 2 to 3.
5 Reduce application root document size.
The application root document size can grow large if several variables and several catch handlers are defined. Since root documents may be called with every document fetch, having a large root document can cause high CPU consumption, impacting performance. Remove any unused or unnecessary variables and catch handlers from the application root document, and define them within the VoiceXML leaf document where they are required. This guideline interacts with the previous guideline. Since root documents are called with every document fetch, a large number of VoiceXML documents calling a large root document can exacerbate CPU consumption.
6 Reduce the number of subdialogs.
198
RadiSys Confidential
REFERENCES
3GPP TS 26.244. 3GPP File Format (2GP) Specification. V7.1.0. Audio-Video Transport Working Group, Casner, S., and P. Hoschka. MIME Type Registration of RTP Payload Formats. Internet Draft, Internet Engineering Task Force, November 2001. Bos, B., et al. (eds). Cascading Style Sheets, Level 2 (CSS2) Specification. W3C Candidate Recommendation, World Wide Web Consortium, May 1998. Bray, T., et al. (eds). Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation 04, World Wide Web Consortium, February 2004. Burnett, D., et al. (eds). Speech Synthesis Markup Language Specification. W3C Working Draft, World Wide Web Consortium, April 2002. Burnett, D., et al. (eds). SSML 1.0 say-as attribute values. W3C Working Note 26, World Wide Web Consortium, May 2005. Cable Television Laboratories. PacketCable Audio Server Protocol Specification, PKT-SP-ASP-I02-010620. June 2001. Dahl, D. (ed). Natural Language Semantics Markup Language for the Speech Interface Framework. W3C Working Recommendation, World Wide Web Consortium, November 2000. Freed, N., and Borenstein, N. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. RFC 2046, Internet Engineering Task Force, November 1998.
[10] Gellens, R., Singer, D., and P. Frodjh. The Codecs Parameter for "Bucket" Media Types. RFC 1738, Internet Engineering Task Force, November 2005. [11] Hunt, A., and S. McGlashan. Speech Recognition Grammar Specification Version 1.0. W3C Candidate Recommendation, World Wide Web Consortium, June 2002. [12] International Organization for Standardization. Codes for the representation of names and languages -Part 2:Alpha-3 code. ISO 639-2:1998, October 1998. [13] McGlashan, S. et al. (eds.). Voice Extensible Markup Language: VoiceXML, Version 2.0. W3C Candidate Recommendation, World Wide Web Consortium, March 2004. [14] Oshry, Matt et al. (eds.) Voice Extensible Markup Language: VoiceXML, Version 2.1. W3C Recommendation 19, World Wide Web Consortium, June 2007. [15] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler. SIP: Session Initiation Protocol. RFC 3261, Internet Engineering Task Force, June 2002. [16] Schulzrinne, H., and S. Petrack. RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals.
RadiSys Confidential
199
References
RFC 2833, Internet Engineering Task Force, May 2000. [17] Shanamugham, S. and D. Burnett. Media Resource Control Protocol Version 2 (MRCPv2). Internet Draft, Internet Engineering Task Force, November 2008. [18] Shanamugham, S., Monaco, P., and B. Eberman. A Media Resource Control Protocol (MRCP). RFC 4463, Internet Engineering Task Force, April 2006. [19] Sjoberg, J., Westerlund, M., and Q. Xie. Real-Time Transfer Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. Internet Engineering Task Force, January 2005. Work in progress.
200
RadiSys Confidential
RadiSys Confidential
201
References
202
RadiSys Confidential
GLOSSARY OF ACRONYMS
Third-Generation Wireless A file format standardized by the 3GPP. Third-Generation Partnership Project Third Party Call Control Degrees Centigrade Degrees Fahrenheit Acoustic Echo Cancellation Address Resolution Protocol Automatic Speech Recognition Building Integrated Timing Source British Thermal Unit. A measure of heat energy. The amount of heat required to raise 1 pound of water by one degree Fahrenheit. Control Agent Communications Assistance for Law Enforcement Act Canadian Standards Association Compact Disk Conformite Europenne Fax called station identification tone International Special Committee for Radio Interference Convedia Media Server Fax calling tone Central Office
RadiSys Confidential
203
Glossary of Acronyms
CPA CPAMD CPTD CPVAD DC DNS DSP DTMF EMI FCC FRU FQDN FTP GUI HTTP ID IMMS IMS I/O IP IPBCP IuFP IuUP IPCC ITU IVR kbps kg
Call progress analysis Call progress answering machine detection Call progress call detection Call progress voice activity detection Direct Current Domain Name System Digital Signal Processor Dual Tone Multi Frequency Electromagnetic Interference Federal Communications Commission Field-Replaceable Units fully qualified domain name File Transfer Protocol Graphical User Interface HyperText Transport Protocol Identifier Integrated Mobile Media Server IP Media Subsystem Input/Output Internet Protocol IP Bearer Control Protocol Iu Framing Protocol Iu Interface User Plane IP Call Center International Telecommunications Union Interactive Voice Response Kilobits per second Kilogram(s)
204
RadiSys Confidential
Local Area Network Pound(s) (weight) Law Enforcement Agency Light Emitting Diode Megabits per second Management Information Base Media Gateway Control Protocol Media Objects Markup Language Media Processor Card. Minimum Picture Interval. The minimum time that can occur between pictures selected for encoding. Multimedia Resource Function Processor Media Resource Control Protocol Media Server [except when used in conjunction with a Microsoft product, where it represents Microsoft] Mobile Switch Controller Media Sessions Markup Language Mean Time Between Failures Mean Time to Restore Nb Interface User Plane Network Equipment-Building System Network File System Noise Reduction Operations, Administration, Maintenance, and Provisioning Object-Oriented Portable Document Format Public Mobile Land Network Plain Old Telephone System
MRFP MRCP MS
MSC MSML MTBF MTTR NbUP NEBS NFS NR OAMP OO PDF PLMN POTS
RadiSys Confidential
205
Glossary of Acronyms
PSTN QoS RF RFC RFI RJ-45 RPC RS-232 RTCP RTP RU SCC SDP SIP SIT SNMP SRGS SSRC TAC TCP TCP/IP TFTP ToS TTS UAC UDP UL URL
Public Switched Telephone Network Quality of Service Radio Frequency Request for Comments Radio Frequency Interference Registered Jack 45 Remote Procedure Call Recommended Standard 232 Real Time Control Protocol Real Time Protocol Rack Unit. 1.75 in (4.4 cm) in height. Shelf Control Card. Session Description Protocol Session Initiation Protocol Special Information Tone Simple Network Management Protocol Speech Recognition Grammar Specification Synchronization source Technical Assistance Center Transmission Control Protocol Transmission Control Protocol/Internet Protocol Trivial File Transfer Protocol Type of Service Text to Speech User Agent Client User Datagram Protocol Underwriters Laboratory Uniform Resource Locator
206
RadiSys Confidential
Universal Time Coordinated [formerly GMT] Volts, Alternating Current Voice Activity Detector Volts, Direct Current Voice eXtensible Markup Language: An XML language designed for defining voice segments and enabling access to the Internet via telephones and other voice-activated devices Voice over Internet Protocol Voice Response Unit Watts eXtensible Markup Language
RadiSys Confidential
207
Glossary of Acronyms
208
RadiSys Confidential