Voice is everywhere.
People have always used their own voices (supplemented with facial and other movement) as the natural UI for real-time person-to-person communication. However, while human-to-machine user interfaces have changed and progressed through different paradigms over the decades of computing, voice as one of those until only recently was merely a cute toy which didn’t really work well (enough).
But now the race is on between Apple and Amazon to make voice as the primary input & output everywhere:
- Amazon’s strategy is to put voice I/O in every conceivable location in your home (and outside it). Echo’s Alexa started in your kitchen and is now multiplying to every room in your house and is becoming portable.
- Apple’s strategy is to put voice I/O as close to you as possible. Moving Siri from your pocket, to your wrist, and now directly into your ear.
Yes, in theory Google and Microsoft are running the race, too, but their current offerings are buried too deep in your pocket on your phone (or even worse, on your desktop)*. The past few years of consumer behavior have shown that it’s too much friction for most consumers to always pull out their phone out of their pocket or purse and initiate a conversational routine. (Of course, not all people complete the same computing tasks the same way: for example, some power-users use keyboard strokes in a faster way than a mouse/touchpad, even though perhaps the latter is more “intuitive” and therefore easier.) But the rule of thumb is that people employ the UI which is the easiest, fastest, and least prone to error… for them.
So I believe that one of the most impactful but underappreciated announcements from last week is that, with upcoming release of iOS 10, Siri will be enabled to control third-party apps, not just Apple’s. Voice can now direct everything on an Apple iPhone, opening voice control to the entire distributed app universe rather than just those initiated from top-down development fiat. Couple that with the new AirPods which people will certainly quickly become miniaturized with subsequent product generations, people will just inconspicuously leave in them in their ears during their waking hours. Instantly you have a ubiquitous voice app store.
Amazon already moved to create a ubiquitous voice app store by opening up their Alexa Skills Kit (ASK) and Alexa Voice Services (AVS) offerings about a year ago. And we’re on the verge of realizing the promise of this “voice-first” app store. There are about a 2,000 skills on the Echo thus far. Many are pretty mundane, like suggesting Powerball lottery numbers for you or sharing facts about cats. And just like Apple iPhone app store was showcased by ordering a pizza from an iPhone app in 2008, so too can you voice order a pizza from your Echo in 2016. However, bringing rich computing to a mobile phone created a realm of apps which weren’t likely even conceived of when the iPhone app store opened. Like revolutionizing transportation with a distributed set of drivers and riders enabled by a mobile phone. (By contrast, it seems pretty ho-hum that of course you can order an Uber ride by voice on the Echo today.) Yet bringing rich computing to seamless voice control has similar potential to revolutionize. A few of today’s voice-first apps are interesting, and I think it’s about to get a lot more interesting.
Last year Amazon launched a $100M venture “fund” for startups working on apps which leverage the Echo. As an observer from the outside looking at the list of funded companies detailed on their website, most of these startups just make a hardware device which control some aspect of your home leveraging Amazon’s AVS. Think: use your voice to open your garage, control your thermostat, or monitor your pool. Does the list go on? Yes, set your sprinklers’ schedule, find your keys, and feed your cat. OK, I get it. Extremely practical (more so than cat facts), though not really revolutionary.
There are, however, a few notable exceptions to the litany of companies on their public list funded over the past year:
- Invoxia (Alexa-enabled portable speaker) and Luma (distributed wifi connectivity with devices specific for room coverage). Both of these investments signal that in the path to voice ubiquity, Amazon doesn’t believe it necessarily needs to solely provide the hardware, but merely empower AVS to be pervasive.
- Kitt’s “conversational understanding as a service.” Though sparse brochureware on the website, this enabling layer-technology seems poised to further enable the “Echosystem.”
Meanwhile, outside the Amazon venture capital realm, much of the related entrepreneurial startup activity and corresponding venture funding has been focused only tangentially. It’s been a wild ride the past two years watching the rise of “chatbots” permeate our conversations (and our VC checkbooks). But the predominance of chatbot companies are text-based (including our investment here at NextView in Troops, which rides upon Slack as the conversational platform). Yes, texting is pervasive, especially with the SnapChat generation, but I suspect chatbots will soon veer into actually, you know, actual chatting with voice.
But there has already been some promising activity in voice startupland with both voice-centric apps (e.g. virtual voice assistants) and enabling-layer technologies (e.g. analytics for Alexa Skills). And I believe we’re just at the very beginning. The rate of Alexa skills creation is increasing 40% per month and Jeff Bezos has said there are more than 1,000 people working on Amazon Echo and Alexa. Before too long there will be a “Pokemon moment” for voice like that application was for augmented reality (AR) – a truly innovative creation which will delight consumers that showcasing the platform’s core capabilities.
As a seed VC, I have my money on this forthcoming platform shift. Sure, most of these Alexa Skills may seem like a toy or trivial now, but soon some will be gold. And the proverbial picks & shovels businesses which help facilitate these applications will be lucrative as well. (We’re a while off from discussing dinner with your Echo and Alexa suggests ordering a pizza from Papa John’s instead of Dominos… but not too far.)
Alexa and Siri will increasingly have a louder voice and will be screaming soon. Because voice is everywhere and so are they.
* Don’t count the other Google out just yet. Android is quite pervasive, much more modular at the OS level, and Google has been cooking up something.