Dogs can’t talk. I know. It sucks, because we talk to them all the time, and although they learn to listen, there’s not much we can do to help them articulate themselves. Dogs can do things like smell and hear things from very far away, and provide a warning bark to people, and people are smart enough to learn how to use those signals.
As Artificial Intelligence becomes a more integral part of most software products, we seem to have forgotten we don’t need every tool of ours to talk to us in English. Perhaps, demos made in our own image capture our imaginations better than demos of machines performing menial tasks and grunting do.
Yet, we would all agree that a machine is a better fit than people for digging through 37 hours of GoPro footage to find a bird that appeared for a few seconds even if we can’t bark our orders to the machine. We often see startups are trying to build the equivalent of C-3PO (human cyborg relations) even when a conversational interface may not be necessary. Last year, Rex Sorgatz wrote at length of the perils of building C-3PO over R2D2.
R2D2 “aspires to be a great computer” and is the vastly preferred droid because he just does his job, and makes funny noises and we all think he’s the cutest. C-3PO “aspires to be a mediocre human” and since his only skill is talking to things, the defining skill that people have, we all find C-3PO to be a weirdo and we don’t know what to do with him.
We already know that talking to computers is awkward, because we’ve all tried and then stopped using Siri. Tim Tuttle of Expect Labs believes the user experience problem is that we’re usually required to talk directly to the computer by saying ’Siri’ or ‘Ok Google’ and it’s just too much pressure.
Instead, Tuttle looks to end the brittle nature of AI by forcing the use case of having the microphone to be on at all times so as to not require the user to address the machine. If the microphone is always on, Expect Labs’ MindMeld API can work silently in the background on suggestions based on what it hears.
The first use cases for the MindMeld API were to support calls between customers and service representatives at call centers, where the MindMeld API ingests product manuals and then listens-in and learns how to be useful through providing suggestions to the service rep. If we let the computer just take notes, like a child or a junior employee, maybe that’s a better idea than making them sell the products.
It’s rare that talking to a device feels right, but in another robot-in-the-middle sort of use case last week, Google showed a preview of Now on Tap for Android M, where the user asks Google for help with what we are looking at on the phone. In cases where we give the computer the data upfront, like what we’re looking at on the screen, paired with the time of day and where we are in the world, computers can make good educated guesses.
Talk is expensive. Without visual aides it’s quite difficult to speak in enough detail to provide the context a computer needs to be useful. Lars Hard, the CTO of Expertmaker.com, has figured out that interacting with buttons, sliders and images is a much cheaper way to get people to communicate metadata about boring products.
For example, you may be the type of person who joyfully edits Wikipedia or Genius. It’s more likely that media inspires people to interact and edit content for free, but for non-media products, it sounds like work to require someone to describe the properties of product categories like packaged food.
The only metadata that you get for free in that case is government-mandated nutritional data, not the adjectives describing how a human would categorize what’s in the cup of Fage yogurt. You won’t get ‘creamy’ or ‘ healthy’ or…’best if eaten with smaller spoon because its crescent shaped compartments’.
By presenting the answers, in a Pinterest meets Jeopardy! style with simple on/off switches, Expertmaker can quickly get that metadata from people, because people can answer questions of taste as fast as a computer can add numbers.
Sometimes the only communication necessary between a human and a machine is the upload button. Google also launched Google Photos last week, and unlike Apple who makes a conscious decision not to leverage identity in its software, Google started including facial recognition to improve sorting options.
If we grant that technology can help organize the many trillions of photos we upload each year there’s a lot more that machines can help us with, and recognition is only the start. Clarifai CEO Matthew Zeiler thinks “People shouldn’t have to tag things or sit through video. Usually, professional photographers use 20-80 tags when they post stock photos and that’s lots of work. What Clarifai does is automatically tags those photos, which not only speeds up the work but it standardizes the indexing.”
For removing the terrible task of tagging, Clarifai extracts a vector of numbers for each image it processes as a reference. With that reference, Clarifai aims to match photos against 10,000 concepts that could be as simple as colors and objects, or as abstract as adjectives like ‘exciting’.Clarifai sees the future of visual search as much different than how the old search models work where you put in one query and you’re done. Instead the computer needs to remember what you’re looking for in sets of searches, more like a dialogue where new queries refine the results.
When we start to think of interactions with AI not as single queries but in terms of many over a time-series, we can see how computers are much better than people at finding trends over big datasets. “Language lessons” as Lane Meyer of Better Off Dead would agree, are a great way to get to know your counterpart.
And in the case of Duolingo, its a great way for the computer to get to know you. Luis von Ahn’s team in Pittsburgh have designed Duolingo as a mobile game for learning languages, that currently personalizes the lesson order and difficulty based on candy crush-like addiction algorithms.
But what’s most interesting about Duolingo is not the reptilian-brain tactics to keep people playing, but Duolingo’s goal of fully personalizing the learning experience for each user to eventually become the free alternative to having a 1-on-1 tutor.
Imagine having a conversation about a relatively open-ended topic with your computer or mobile phone, and having it periodically correct your pronunciation or ask you to clarify things it didn’t understand. That sort of practice can really help build your confidence on your journey to conversations with fluent human speakers. This is the future of computer-assisted language education we are working toward, and we’ve a dedicated speech team here at Duolingo working to push these boundaries. – Burr Settles. Data Dude at Duolingo
CB Insights shows there has been $1.1 Billion in venture capital invested in AI in the past 2 years (Q2 2013 to Q1 2015), and Intelligence.org estimates 10 percent of all computer science research is invested in AI as well. Shivon Zilis of Bloomberg Beta has shown that the landscape of Machine Intelligence, which includes nearly all definitions of AI and Machine Learning, is much more than just personal assistants that require direct communication between people and computers.
In fact, personal assistants are only one of 35 categories of investment. The silent majority of artificial intelligence are selfless machines that can perform superhuman tasks, but are too meek to address us personally. They may just be man’s new best friends.
Image credit: Shutterstock