A recently developed neural network is capable of captioning a series of images in a method which imitates human storytelling. Rather than simply identifying and describing objects, the AI makes inferences about what’s happening in a picture. And it’s eerily good at its job.
The team, researchers from UC Santa Barbara, developed the AI to determine if a neural network could be used to deduce novel, abstract stories from images.
According to a whitepaper published by the the team:
Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images.
The neural network the researchers developed is called an Adversarial REward Learning (AREL) framework. What’s different about it, compared to similar AI, is that it doesn’t rely on an automatic evaluation system, thus it avoids cloning (and regurgitating) human efforts.
Teaching a neural network to come up with abstract stories that actually make sense is no small feat, but AREL has taken things a step further. Not only can it make up its own stories, but those tales are convincing enough to fool humans into thinking a fellow person wrote them.
In order to test AREL out, the team employed the humans of Amazon’s Mechanical Turk to conduct two separate tests. First, a Turing test, which simply asked the Turk workers to determine if a story was created by a person or a computer.
According to the research, AREL passed the Turing test three out of five times.
In a separate test, the researchers asked Turk workers to pick between AREL, a human story, and one created by previous state of the art AI. Nearly half the time the human workers chose AREL.
The implications for a storytelling AI are exciting. As developers figure out how to make the outputs generated by a neural network better align with human-thinking, we’ll begin to see far-reaching advantages to plain language processors.
Sports referees, for example, could either be replaced or augmented with an AI capable of understanding and explaining a series of events. Do we really need to pay someone $188,322 to determine if Tom Brady is cheating or not?
It stands to reason that once AI is robust enough to explain its decision-making, by telling ‘stories’ about images in real-time, like “Number 66, defense, offsides, the play results in a 5 yard pentalty. Repeat first down,” we won’t need people to do rules-based jobs that require an agent to do nothing more than observe and report.
And, let’s not forget that there’s an actual market for on-the-fly storytelling. If this technology ever fell into the hands of the developers at Telltale Games, or the designers at Wizards of the Coast (the company that makes Dungeons and Dragons), it could be used to generate a never-ending stream of unique, personal, entertainment.
AREL isn’t quite ready for prime time yet though, this research merely lays the groundwork for future endeavors to create a better neural network. According to the researchers:
We believe there are still lots of improvement space in the narrative paragraph generation tasks, like how to better simulate human imagination to create more vivid and diversified stories.
But eventually, barring an undiscovered dead-end, neural networks like AREL are going to mature and gain a level of social intelligence that could become universally comparable to that of an average human.
If this AI can fool half-to-most people right now, imagine what it’ll do in five years.
The Next Web’s 2018 conference is just a few weeks away, and it’ll be ??. Find out all about our tracks here.