DeepMind's AI mastered Go, now it's playing Atari

Google’s London-based sister company, DeepMind, recently developed a training method for teaching AI how to play video games. Rather than painstakingly feed it data, they just leave it alone with some YouTube videos.

DeepMind’s latest training method is designed to solve a problem AI faces in exploration. AI pretty much sucks at exploring new places or figuring out which way to go. And AI developers struggle to find ways to reward AI in environments where there’s little to achieve. DeepMind’s AlphaGo AI, for example, was designed to win at Go — a game with very specific rules. But when you introduce an AI to a game like Pitfall or Montezuma’s Revenge, both games that require exploration, it’s difficult for a machine to determine what it’s supposed to do.

Humans take for granted our ability to make decisions. If we’re playing a game, it’s relatively easy for us to figure out which way to go based on what we perceive as obstacles. For AI, the same challenge can be astronomical, according to the team’s whitepaper:

Such tasks are practically impossible using naive -greedy exploration methods, as the number of possible action trajectories grows exponentially in the number of frames separating rewards. For example, reaching the first environment reward in MONTEZUMA’S REVENGE takes approximately 100 environment steps, equivalent to 100(to the 18^th power) possible action sequences.

Right now, developers solve this problem by giving the AI huge datasets of perfectly formatted frames of information. This approach isn’t very helpful in situations where such wonderfully labelled datasets aren’t available.

With DeepMind’s new method, the AI basically takes noisy images and figures out how to format them into something it can then generate data from. If you show it a video of a human playing Pitfall, or Montezuma’s Revenge, it can isolate the movements that made the human successful and imitate them.

3 free tickets to TNW Conference? Get them now!

For a limited time, groups can get up to three extra free tickets! Book now and increase your visibility and connections at TNW Conference

Get Tickets

This allows researchers to set rewards (do it faster, get more points) while simultaneously providing a baseline for an AI’s training to start from. And it’s as simple as loading a few YouTube videos into the neural network, because this method produces one-shot training.

Once developed properly, this technology could allow a robot study new environments – like the surface of Mars – using landmarks provided by rover footage, or train for a working environment simply by watching a walk through video on YouTube.

One thing’s for certain: if anyone asks me to record an orientation video for the robots that are going to one day replace me, they’re gonna get a very human response.

Story by Tristan Greene

Editor, Neural by TNW

Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: (show all) Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: He/him

Get the TNW newsletter

Get the most important tech news in your inbox each week.

DeepMind’s AI mastered Go, now it’s playing Atari

Get the TNW newsletter

Nvidia, Accel back Netherlands-based AI firm Nebius in $700M deal

AI startup Sereact lands €25M to give dumb robots better brains

Discover TNW All Access

Netherlands strikes deal with Nvidia for AI supercomputing hub

AI startup Gendo — the Midjourney for architecture — secures fresh capital