
DeepMind today unveiled a new multi-modal AI system capable of performing more than 600 different tasks.
Dubbed Gato, itâs arguably the most impressive all-in-one machine learning kit the worldâs seen yet.
According to a DeepMind blog post:
The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

And while it remains to be seen exactly how well itâll do once researchers and users outside the DeepMind labs get their hands on it, Gato appears to be everything GPT-3 wishes it could be and more.
Hereâs why that makes me sad: GPT-3 is a large-language model (LLM) produced by OpenAI, the worldâs most well-funded artificial general intelligence (AGI) company.
Before we can compare GPT-3 and Gato however, we need to understand where both OpenAI and DeepMind are coming from as businesses.
OpenAI is Elon Muskâs brainchild, it has billions in support from Microsoft, and the US government could basically care less what itâs doing when it comes to regulation and oversight.
Keeping in mind that OpenAIâs sole purpose is to develop and control an AGI (thatâs an AI capable of doing and learning anything a human could, given the same access), itâs a bit scary that all the companyâs managed to produce is a really fancy LLM.
Donât get me wrong, GPT-3 is impressive. In fact, itâs arguably just as impressive as DeepMindâs Gato, but that assessment requires some nuance.
OpenAIâs gone the LLM route on its path to AGI for a simple reason: nobody knows how to make AGI work.
Just like it took some time between the discovery of fire and the invention of the internal combustion engine, figuring out how to go from deep learning to AGI wonât happen overnight.
GPT-3 is an example of an AI that can at least do something that appears human: it generates text.
What DeepMindâs done with Gato is, well, pretty much the same thing. Itâs taken something that works a lot like an LLM and turned it into an illusionist capable of more than 600 forms of prestidigitation.
As Mike Cook, of the Knives and Paintbrushes research collective, recently told TechCrunchâs Kyle Wiggers:
It sounds exciting that the AI is able to do all of these tasks that sound very different, because to us it sounds like writing text is very different to controlling a robot.
But in reality this isnât all too different from GPT-3 understanding the difference between ordinary English text and Python code.
This isnât to say this is easy, but to the outside observer this might sound like the AI can also make a cup of tea or easily learn another ten or fifty other tasks, and it canât do that.
Basically, Gato and GPT-3 are both robust AI systems, but neither of them are capable of general intelligence.
Hereâs my problem: Unless youâre gambling on AGI emerging as the result of some random act of luck â the movie Short Circuit comes to mind â itâs probably time for everyone to reassess their timelines on AGI.
I wouldnât say ânever,â because thatâs one of scienceâs only cursed words. But, this does make it seem like AGI wonât be happening in our lifetimes.
DeepMindâs been working on AGI for over a decade, and OpenAI since 2015. And neither has been able to address the very first problem on the way to solving AGI: building an AI that can learn new things without training.
I believe Gato could be the worldâs most advanced multi-modal AI system. But I also think DeepMindâs taken the same dead-end-for-AGI concept that OpenAI has and merely made it more marketable.
Final thoughts: What DeepMindâs done is remarkable and will probably pan out to make the company a lot of money.
If Iâm the CEO of Alphabet (DeepMindâs parent company), Iâm either spinning Gato out as a pure product, or Iâm pushing DeepMind into more development than research.
Gato could have the potential to perform more lucratively on the consumer market than Alexa, Siri, or Google Assistant (with the right marketing and applicable use cases).
But, Gato and GPT-3 are no more viable entry-points for AGI than the above-mentioned virtual assistants.
Gatoâs ability to perform multiple tasks is more like a video game console that can store 600 different games, than itâs like a game you can play 600 different ways. Itâs not a general AI, itâs a bunch of pre-trained, narrow models bundled neatly.
Thatâs not a bad thing, if thatâs what youâre looking for. But thereâs simply nothing in Gatoâs accompanying research paper to indicate this is even a glance in the right direction for AGI, much less a stepping stone.
At some point, the goodwill and capital that companies such as DeepMind and OpenAI have generated through their steely-eyed insistence that AGI was just around the corner will have to show even the tiniest of dividends.
Get the TNW newsletter
Get the most important tech news in your inbox each week.