The Mobile VIFF Interactive Voice Guide

A key application for the Mobile VIFF project was the Interactive Voice Guide which is a VXML-based, fully dynamic IVR (Interactive Voice Response) film and event guide.  Users can call a dedicated toll-free number and navigate the film guide and other festival information using voice commands and/or touch-tone control. 


Essentially, the system provides a voice-driven front-end to the VIFF’s on-line database of festival films and events that allows users to quickly retrieve a wide variety of information using a combination of pre-recorded prompts and descriptions, and text-to-speech.  It can be used from any phone, not just mobiles, although certain functions are clearly only applicable to mobile.


General system features:


can be controlled using touch-tone and/or voice commands

is dynamic in the sense that changes made to the back-end database are reflected immediately

is available 24 hours a day and operates in an unattended mode

allows the user to request that certain information be sent to them via text messages

provides a simple, well-understood interface


It provides functions to:

have the system send the user a text-message containing a link to the WAP guide

access schedule by day

access schedule by venue

hear general film and program information (pre-recorded prompt file)

vote for films (see my blog posting about the mobile voting system)

retrieve general, pre-recorded and categorized festival information (rules, venue procedures and will-call, parking facilities, etc)

hear schedule change information

and other features


Simple and logical drill-down menus are provided to allow the user to navigate up and down hierarchies of information.  For example, if you select the option to hear “Festival Schedule By Day”, the system responds with:


“Schedule By Day.
Make your selection at any time by pressing a digit or SAYING the option number:
ONE for today's schedule, TWO for Tomorrow's schedule, THREE To select another day. To return to the previous menu say ‘previous’ or press the number sign.”
Mobile VIFF Project Wrap-up – Blog Entries


Then, if for example you selected THREE, you would hear:


"Schedule By Day.
Enter, or tell me the day you want. For example say or enter "twenty-nine" for September twenty ninth or ONE for October first."


Once a particular day is selected (or if you used one of the first two options), you hear:


"Please listen carefully before making your selection.
Enter or say the number for the hour of show. For example, SEVEN for shows starting
at seven PM or later.  For shows starting BEFORE one PM, press or say ZERO
For shows starting at nine PM or later, press or say NINE"


Then, the system reads screening information to you and allows you to drill-down further including hearing a short description of a given film or event.


A feature we’re particularly proud of is the ability to have users simply SAY the title of the film they are interested in knowing about.  The system has about a ninety per-cent hit rate in finding the right film, but provides alternative ways of retrieving that information if it doesn’t understand the user.  If it finds a perfect match for what the user speaks, it reads back the event title and presents more options:


"To hear the film description, press or say 1.
To hear this film's screening schedule, press or say 2.
To have the description and schedule for this film text-messaged to your
mobile phone, press or say 3"


I won't discuss the above features in much more detail here though, as that's described elsewhere, and would like instead to talk about the challenges in developing an application of this scale. 


The Future is...Voice?


While it may seem to some that this is really "older" tech, the fact is that IVR systems are only now coming in to their own.  Previous generations of VR (voice recognition) software were unreliable, slow and extremely expensive and were consequently only used by large businesses for large-scale customer-service apps.  The current generation of systems - based largely on VXML (Voice XML) - are available cheaply as hosted solutions and are extremely powerful.  For a relatively small investment, you can build a complex VR application and easily integrate it with existing data-centric systems.  The primary reasons for this is the development of the open standard VXML, and the proliferation of VXML hosting providers competing in the market.


It is worth mentioning that for most people, voice is a much more comfortable interface than a typical mobile device's keyboard.  As such, I believe there is a huge and growing market for IVR systems now that the primary barrier to entry - systems cost - has been effectively eliminated.  A system that a few years ago might have cost $150,000 can now perhaps be done for less than $25,000.


Size does matter…


The VIFF screened 386 films this year at more than 620 screenings, which is a very large data-set for this kind of application.  We realized early on that the performance of the contracted provider’s platform degraded as the number of prompts in a given prompt-tree branch grew beyond a certain point and so we had to be careful of the number of items in any given branch.


This posed two intertwined challenges for the development team: extrapolating the optimal ranges for the various hierarchies (i.e. balancing navigability with speed) and ensuring that system performance was not degraded by large prompt trees. 


A careful analysis of the various data-sets (schedules, alphabetic lists, etc.) and an analysis of the provider’s system capabilities was instrumental in ensuring that the system responded well under all conditions.  The corollary to this is that it implies that developers will need to examine the specific functions of an application with respect to the data-set it operates on and the environment it will operate in order to determine the design of the prompt trees. 


VXML Service Providers


As we discovered, however, much of the VXML spec is "optional" - at least in the eyes of service providers.  I cannot stress enough the importance of finding the right provider; going cheaper is - as often the case - perhaps not the best course.  The provider that MUSE contracted promised complete VXML 2.0 compliance.  This includes implementing a function known as "voice barge-in" that allows a user to override the playing of a prompt by speaking a command.  This was obviously important to our application as many of the prompt trees are long and no one really wants to wait until the end to initiate an action.  What we discovered was that while touch-tone barge-in worked fine, voice barge-in did not work at all.  This is a much bigger problem than you might think because we didn't realize this until after all our prompts had been recorded by a very expensive professional voice talent!  Which in turn meant that to launch, we had to re-record many system prompts to remove references to the user being able to speak a command while a prompt was playing. Caveat emptor as always.


Another issue that will not be obvious to the neophyte VXML applications developer is the cost of professionally recorded prompts; this can be a major barrier to entry for such applications and was significant expense in the developing the Mobile VIFF voice guide.  The key is to apply good development methodologies and very carefully design the prompt trees so that you only need record the minimum of prompts and avoid re-recording.  You should try to re-use prompts wherever possible, and split voice prompt files into smaller pieces and use code to stitch them together where appropriate.


How did it fare?


The VIFF Voice Guide received a reasonable amount of traffic during it’s operational period although perhaps not as much as we would have liked.  While several hundred calls were made to the system, we would have liked to see several thousand and hope that future implementations will see larger usage.