Google’s London-based sister company, DeepMind, recently developed a training method for teaching AI how to play video games. Rather than painstakingly feed it data, they just leave it alone with some YouTube videos.
DeepMind’s latest training method is designed to solve a problem AI faces in exploration. AI pretty much sucks at exploring new places or figuring out which way to go. And AI developers struggle to find ways to reward AI in environments where there’s little to achieve. DeepMind’s AlphaGo AI, for example, was designed to win at Go — a game with very specific rules. But when you introduce an AI to a game like Pitfall or Montezuma’s Revenge, both games that require exploration, it’s difficult for a machine to determine what it’s supposed to do.
Humans take for granted our ability to make decisions. If we’re playing a game, it’s relatively easy for us to figure out which way to go based on what we perceive as obstacles. For AI, the same challenge can be astronomical, according to the team’s whitepaper:
Such tasks are practically impossible using naive -greedy exploration methods, as the number of possible action trajectories grows exponentially in the number of frames separating rewards. For example, reaching the first environment reward in MONTEZUMA’S REVENGE takes approximately 100 environment steps, equivalent to 100(to the 18th power) possible action sequences.
Right now, developers solve this problem by giving the AI huge datasets of perfectly formatted frames of information. This approach isn’t very helpful in situations where such wonderfully labelled datasets aren’t available.
With DeepMind’s new method, the AI basically takes noisy images and figures out how to format them into something it can then generate data from. If you show it a video of a human playing Pitfall, or Montezuma’s Revenge, it can isolate the movements that made the human successful and imitate them.
This allows researchers to set rewards (do it faster, get more points) while simultaneously providing a baseline for an AI’s training to start from. And it’s as simple as loading a few YouTube videos into the neural network, because this method produces one-shot training.
Once developed properly, this technology could allow a robot study new environments – like the surface of Mars – using landmarks provided by rover footage, or train for a working environment simply by watching a walk through video on YouTube.
One thing’s for certain: if anyone asks me to record an orientation video for the robots that are going to one day replace me, they’re gonna get a very human response.