DeepMind's AI mastered Go, now it's playing Atari

Google’s London-based sister company, DeepMind, recently developed a training method for teaching AI how to play video games. Rather than painstakingly feed it data, they just leave it alone with some YouTube videos.

DeepMind’s latest training method is designed to solve a problem AI faces in exploration. AI pretty much sucks at exploring new places or figuring out which way to go. And AI developers struggle to find ways to reward AI in environments where there’s little to achieve. DeepMind’s AlphaGo AI, for example, was designed to win at Go — a game with very specific rules. But when you introduce an AI to a game like Pitfall or Montezuma’s Revenge, both games that require exploration, it’s difficult for a machine to determine what it’s supposed to do.

Humans take for granted our ability to make decisions. If we’re playing a game, it’s relatively easy for us to figure out which way to go based on what we perceive as obstacles. For AI, the same challenge can be astronomical, according to the team’s whitepaper:

Such tasks are practically impossible using naive -greedy exploration methods, as the number of possible action trajectories grows exponentially in the number of frames separating rewards. For example, reaching the first environment reward in MONTEZUMA’S REVENGE takes approximately 100 environment steps, equivalent to 100(to the 18^th power) possible action sequences.

Right now, developers solve this problem by giving the AI huge datasets of perfectly formatted frames of information. This approach isn’t very helpful in situations where such wonderfully labelled datasets aren’t available.

With DeepMind’s new method, the AI basically takes noisy images and figures out how to format them into something it can then generate data from. If you show it a video of a human playing Pitfall, or Montezuma’s Revenge, it can isolate the movements that made the human successful and imitate them.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

This allows researchers to set rewards (do it faster, get more points) while simultaneously providing a baseline for an AI’s training to start from. And it’s as simple as loading a few YouTube videos into the neural network, because this method produces one-shot training.

Once developed properly, this technology could allow a robot study new environments – like the surface of Mars – using landmarks provided by rover footage, or train for a working environment simply by watching a walk through video on YouTube.

One thing’s for certain: if anyone asks me to record an orientation video for the robots that are going to one day replace me, they’re gonna get a very human response.

Story by Tristan Greene

Editor, Neural by TNW

Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: (show all) Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: He/him

Get the TNW newsletter

Get the most important tech news in your inbox each week.

DeepMind’s AI mastered Go, now it’s playing Atari

Get the TNW newsletter

Can AI replace the humanity of Classical Music?

How Flippa Is Removing the Language Barrier from Global Deal-Making

Discover TNW All Access

Mews raises €255M to accelerate AI and automation in hospitality

Synthesia’s valuation jumps to $4B after $200M raise