Voice, whether in our heads, or playing on headphones, have ability to move us in a profound way. if you were in your senses during 2000’s, you would have witnessed the power of a deep baritone voice uttering ‘sahi jawab’ on primetime TV and magically, his unmistakable voice would cure palpitation of millions of viewers watching KBC.
Voice has been the most natural form of human interaction, historically we have domesticated wild animal to understand our voices. And in this decade, we can summon computers and devices to listen and respond to us with just voice; we call this Voice First, where a visual display is completely absent or secondary.
Voice User Interface (VUI) is the art and science of designing and programming for man and machine to have a meaning and actionable interactions using voice as the primary input. VUI is what makes Voice First possible, and creating a great VUI need fundamental understanding and tenacious unlearning of how we design for GUI. (graphical UI like apps and computers)
Voice powered assistants are all the rage and here’s a guide to get your brand’s voice heard:
In fiction, JARVIS as a virtual butler helping his master Tony Stark in the movie Ironman is a great example of voice powered assistant. In reality, you can get a voice assistant on Amazon or Flipkart for as low as Rs. 3500, they are called Amazon Echo or Google Home.
Echo and Home are hardware devices, like our mobile phones and the ‘apps’ equivalents are called ‘Alexa Skills’ and ‘Actions on Google’. Building for voice starts with imagining how to turn an idea into a Skill or Action.
What’s so different about voice and why must I imagine?
Voice to text has been around for four decades now, Stephan Hawking’s speech synthesizer was a great example, but it had great limitation for lesser mortals like us to use on a daily basis as an input mechanism to instruct computers and devices. That is because they lacked context and did only conversion of text to voice and vice versa.
New age voice assistants like Alexa and Google are run on Cloud powered by advanced computing power to have traits like Automatic Speech Recognition (ASR), Natural Language Understanding (NLU) and Machine Learning (ML) and of course Text to Speech (TTS).
For example, when you say ‘for ti timez’ it can mean ‘for tee times’ in a golf skill, or ‘for tea times’ in a recipe skill or ‘forty times’ in a math skill. With ASR and ML, voice assistants can contextually tell the difference and respond accordingly. That leads us to wake words and intents.
Wake word, utterance, and intent are fundamental elements in any voice assistant. When you ask your smart assistant a question like ‘Alexa, when is sunrise’ there’s a combination of processes triggered both on device and on cloud.
A wake word detection like ‘Alexa’ or ‘ok Google’ is what wakes the device from dormant state, this is similar to unlocking a phone. The following words ’when is sunrise’ are utterance, which are parsed through ASR on the cloud where Alexa actually resides, is converted from speech to text through a NLU filter, which identifies the intent behind the command, and pass the baton to ‘Skills’, the app equivalent, which computes results based on the intent (time of sunrise) and responds back in the device in speech.
Waking your brand up by Alexa: or invocation is a crucial piece of VUI strategy. Invocation is a word or phrase that triggers ‘your skill’. This is somewhat equal to an app’s icon or a website’s URL. Must be easy, memorable and preferably same as your brand name. Users wanting to invoke a skill usually start with ‘Alexa, open India Panchang’ or ‘Alexa, talk to Ola cabs’. Here Ola and India Panchang are both brand names and invocations.
Optimising your brand for a single faculty:
One of the greatest challenges to optimize VUI is not how much you can learn about it, but how quickly you can unlearn what you know about building for graphically aided interfaces like App and Web.
When building programs in a voice first world, empathy plays a crucial world, the team building the skill may know its features but and end user will have no clue and cant avail visual hints as well. For they may be using the voice assistant in an ambient environment while doing another task.
Imagine hailing a cab on voice assistant placed on the living room as your hurriedly munch your breakfast by the adjacent table. The voice assistants must have verbal cues and confirmations at every step including pick up location, destination, cab fare, surge fare if any, type of ride and estimated time of arrival.
Giving a real voice to your brand:
While smart assistants are adding more voices, and accents, brands can narrate their stories in an immersive way by adding sound effects, or even have their skill function with pre-recorded voice of their brand ambassador. Content heavy brands such as movie studios, sports, children’s programs, games, etc., have a huge advantage to create amazing experiences and compelling use cases on voice assistants.
A brand manager must ask these essential questions before foraying their brand in a voice first world: Value, what can the Skill do for customers. Roles, clearly defined for your skill and Alexa in a mutually inclusive way. Empathy, based on end user’s need and challenges and how your skills fits in.
This is a series Sreeraman publishes on The Economic Times to help brand manages and CXOs to embrace a voice first world.