Digital sight will change the smart home forever

In a remarkably short period we’ve become accustomed to phones and gadgets that interpret and react to our voices. Google Home’s voice match can even go as far as to recognizing different people, shifting preferences between entire accounts based on who it perceives it is hearing.

With the speed at which this technology has iterated, it’s strange to even imagine that in 2011 Popular Science and others were bearish on it having truly ubiquitous success.

Yet, now, nearly every big tech company worth its weight in silicon has launched a voice-controlled device, or integrated voice tech deeply into one of its core products.

And while voice adds usability and functionality to these devices, the next big transformation will come from when our devices can truly recognize us — and that requires them to open not just their ears, but their eyes, too. Computer vision has the potential to radically change how smart our homes — and our lives — are.

Presence in technology

The facial recognition capabilities now proliferating in high-end smartphones succeed in a magical, almost invisible way. They recognize you by mapping your face with a spray of infrared light, a process which is seamless and invisible to the user.

The <3 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

These facial recognition systems largely perform binary tasks at present — allowing a payment or not, unlocking your phone or not — but they give us a hint at what’s to come when even smarter computer vision-based systems become more pervasive.

Though Microsoft has stopped selling Kinect to consumers, they created an early and impressive version of consumer-ready computer vision that can recognize different users and bring up different accounts based on who is seen. And as we’ve seen how Google Home reacts to voices and change accounts, it’s easy to imagine a Samsung TV that adapts the viewing experience based on who it sees.

It might change the color of your Philips Hue bulbs and close the blinds when it senses you’re ready to watch a movie. And complaints from your children about limited TV hours and choices can now be conveniently shifted to your TV. “It’s not my fault, son. The Samsung TV just doesn’t want you to watch after 7PM!”

At some point in the future, computer vision-powered recognition could even become completely decoupled from specific devices. It’s not hard to imagine a dedicated user detection camera that sees and reacts to you walking through your front door.

It could then turn on the lights in the living room to a setting you’ve previously chosen, flip your Nest Thermostat from Away to your “Home” setting, and play your favorite post-work playlist on your Sonos — all based entirely on recognizing you when you enter.

We effectively sit at a time when the triggers exist in the software — they are just simply waiting for the technology to provide the 1 to their 0. While voice recognition requires relatively simple analysis of user voices and speech patterns, what is required to perform reliable vision-based recognition is a larger challenge — ambient lighting, the physical orientation of the user, and even changes in hairstyles and clothing lead to a more inconsistent model to regularly recognize. Yet the technology required to perform these tasks reliably is here now and getting better daily.

A camera-equipped system that recognizes you offers a natural progression from what we’ve known. While voice recognition has lead more people to trust the devices they use to perform an ever-increasing set of convenient tasks for them, once we have sophisticated computer vision based user recognition, seeing will truly be believing.

Story by Jeff Powers

Jeff Powers is co-founder and CEO of Occipital, a San Francisco & Boulder based company focused on mobile computer vision. With co-found (show all) Jeff Powers is co-founder and CEO of Occipital, a San Francisco & Boulder based company focused on mobile computer vision. With co-founder Vikas Reddy, Jeff led the development of RedLaser, a popular barcode-scanning application that was acquired by eBay. In 2013, Jeff and the Occipital team launched Structure Sensor, the first 3D sensor for mobile devices. Most recently, Occipital introduced Bridge, a mixed reality (MR) headset for iPhone, and Structure Core, a next-generation embeddable depth sensor. Jeff is also an angel investor in several tech startups including via the seed-stage accelerator TechStars.