Where Voice Command is Falling Short (But Not For Long)

As originally published in the American City Business Journals.

—

Are you talking to your technology yet?

For a few years now we’ve seen an uptick in speech recognition technology. We’ve had Siri in our pockets since 2011 (hands-free since 2014), we’ve had Google Voice Search since about that same time, and we’ve had Cortana as part of Windows 10 since 2015.

While mildly useful, this functionality hasn’t stirred up a heck of a lot of enthusiasm; a Creative Strategies study found that 70% of iOS users and 62% of Android users called upon their voice assistants “rarely or sometimes.” We’re also quite shy about it all: only 6% of participants felt comfortable enough to speak to their devices in public, and a mere 1.3% used Siri, OK Google, or Cortana at work.

Enter: the Echo

What’s proving to be a far more lucrative interpretation of speech recognition is the voice command device (VCD) that we can talk to in the privacy of our own homes. The main contender here is Amazon’s Echo (Alexa), though Google Home recently joined the race in November 2016.

Consumer Intelligence Research Partners reports that Amazon sold over 5.1 million Echo devices in 2016—which is more than double the numbers they saw in 2015.

From a development perspective, Alexa’s “skills” (capabilities comparable to a smartphone app) have skyrocketed from a modest 135 in January 2016, to over 3,000 in September, to over 7,000 this January.

As you can see, there’s some serious traction gaining here.

Still, devices like the Echo aren’t yet compelling to the point where the majority of consumers need to buy them in the same way that they need the latest smartphone; it’s basically only us early adopters that are dropping money on them.

What is it that’s holding these devices back? I see three main obstacles:

1. People are still freaked out by devices listening to their every word.

By design, these devices have to “listen” at all times for their wake words (“Alexa,” “OK Google”). While the device only records what you say after it hears those wake words, there’s already been an instance where police issued a warrant for data from a murder suspect’s Echo on the—unlikely—chance that audio was captured accidentally or otherwise. (Amazon has so far refused to release the data.) For many, this is unsettling enough to be a total deal-breaker.

2. The skills/capabilities are kind of boring.

I can listen to music, play games, pull up recipes, play some workout routines, and of course reorder items from Amazon with my Echo. If you happen to have WiFi-enabled devices in your home like the Nest thermostat, you do have some (limited) capabilities there. But remember when apps on your phone were more gimmicky than practical (the classic iBeer, for example)? That’s about where we are with Alexa’s skills despite the growing quantity.

3. The speech recognition capabilities are limited.

If you don’t say the exact right words in the exact right order, Alexa won’t understand what you mean. I get why this is the case from a technical perspective, but many consumers don’t have the patience to memorize a ton of stilted phrases just to be able to make their device work. The user experience must be intuitive if you want to attract the masses.

All told, this technology isn’t quite mature enough to appeal to those who aren’t self-professed early adopters.

But if developers are able to effectively address numbers two and three, number one will be out the window in short order; if we’re able to start our cars, pull up a movie on our TV within seconds, or control other electronics in our home without it being a big clunky production, sheer convenience will negate any underlying suspicions and sales will all but explode.

And once we get there, it’ll be high time for our businesses to start finding ways to ride the wave before our competitors do.

In fact, it wouldn’t be a bad idea to start thinking about it now.

Where Voice Command is Falling Short (But Not For Long)

Enter: the Echo

1. People are still freaked out by devices listening to their every word.

2. The skills/capabilities are kind of boring.

3. The speech recognition capabilities are limited.

More Insights

Scam Tactics You Need to Know

CNN: How to Recognize if You Are the Target of ‘Smishing’ Scam

Chief Executive: The Key to Thriving Through the Great Resignation

Services

Industries

Company

Resources