Deep learning, the main innovation that has renewed interest in artificial intelligence in the past years, has helped solve many critical problems in computer vision, natural language processing, and speech recognition. However, as the deep learning matures and moves from hype peak to its trough of disillusionment, it is becoming clear that it is missing some fundamental components.
This is a reality that many of the pioneers of deep learning and its main component, artificial neural networks, have acknowledged in various AI conferences in the past year. Geoffrey Hinton, Yann LeCun, and Yoshua Bengio, the three “godfathers of deep learning,” have all spoken about the limits of neural networks.
The question is, what is the path forward?
At NeurIPS 2019, Bengio discussed system 2 deep learning, a new generation of neural networks that can handle compositionality, out of order distribution, and causal structures. At the AAAI 2020 Conference, Hinton discussed the shortcomings of convolutional neural networks (CNN) and the need to move toward capsule networks.
But for cognitive scientist Gary Marcus, the solution lies in developing hybrid models that combine neural networks with symbolic artificial intelligence, the branch of AI that dominated the field before the rise of deep learning. In a paper titled “The Next Decade in AI: Four Steps Toward Robust Artificial Intelligence,” Marcus discusses how hybrid artificial intelligence can solve some of the fundamental problems deep learning faces today.
Connectionists, the proponents of pure neural network-based approaches, reject any return to symbolic AI. Hinton has compared hybrid AI to combining electric motors and internal combustion engines. Bengio has also shunned the idea of hybrid artificial intelligence on several occasions.
But Marcus believes the path forward lies in putting aside old rivalries and bringing together the best of both worlds.
What’s missing in deep neural networks?
The limits of deep learning have been comprehensively discussed. But here, I would like to talk about the generalization of knowledge, a topic that has been widely discussed in the past few months. While human-level AI is at least decades away, a nearer goal is robust artificial intelligence.
Here’s how Marcus defines robust AI: “Intelligence that, while not necessarily superhuman or self-improving, can be counted on to apply what it knows to a wide range of problems in a systematic and reliable way, synthesizing knowledge from a variety of sources such that it can reason flexibly and dynamically about the world, transferring what it learns in one context to another, in the way that we would expect of an ordinary adult.”
Those are key features missing from current deep learning systems. Deep neural networks can ingest large amounts of data and exploit huge computing resources to solve very narrow problems, such as detecting specific kinds of objects or playing complicated video games in specific conditions.
However, they’re very bad at generalizing their skills. “We often can’t count on them if the environment differs, sometimes even in small ways, from the environment on which they are trained,” Marcus writes.
Case in point: An AI trained on thousands of chair pictures won’t be able to recognize an upturned chair if such a picture was not included in its training dataset. A super-powerful AI trained on tens of thousands of hours of StarCraft 2 gameplay can play at a championship level, but only under limited conditions. As soon as you change the map or the units in the game, its performance will take a nosedive. And it can’t play any game that is similar to StarCraft 2, such as Warcraft or Command & Conquer.
The current approach to solve AI’s generalization problem is to scale the models: Create bigger neural networks, gather larger datasets, use larger server clusters, and train the reinforcement learning algorithms for longer hours.
“While there is value in such approaches, a more fundamental rethink is required,” Marcus writes in his paper.
In fact, the “bigger is better” approach has yielded modest results at best while creating several other problems that remain unsolved. For one thing, the huge cost of developing and training large neural networks is threatening to centralize the field in the hands of a few very wealthy tech companies.
When it comes to dealing with language, the limits of neural networks become even more evident. Language models such as OpenAI’s GPT-2 and Google’s Meena chatbot each have more than a billion parameters (the basic unit of neural networks) and have been trained on gigabytes of text data. But they still make some of the dumbest mistakes, as Marcus has pointed out in an article earlier this year.
“When sheer computational power is applied to open-ended domain—such as conversational language understanding and reasoning about the world—things never turn out quite as planned. Results are invariably too pointillistic and spotty to be reliable,” Marcus writes.
What’s important here is the term “open-ended domain.” Open-ended domains can be general-purpose chatbots and AI assistants, roads, homes, factories, stores, and many other settings where AI agents interact and cooperate directly with humans. As the past years have shown, the rigid nature of neural networks prevents them from tackling problems in open-ended domains. In his paper, Marcus discusses this topic in detail.
Why do we need to combine symbolic AI and neural networks?
Connectionists believe that approaches based on pure neural network structures will eventually lead to robust or general AI. After all, the human brain is made of physical neurons, not physical variables and class placeholders and symbols.
But as Marcus points out in his essay, “Symbol manipulation in some form seems to be essential for human cognition, such as when a child learns an abstract linguistic pattern or the meaning of a term like sister that can be applied in an infinite number of families, or when an adult extends a familiar linguistic pattern in a novel way that extends beyond a training distribution.”
Marcus’ premise is backed by research from several cognitive scientists over the decades, including his own book The Algebraic Mind and the more recent Rebooting AI. (Another great read in this regard is the second chapter of Steven Pinker’s book How the Mind Works, in which he lays out evidence that symbol manipulation is an essential part of the brain’s functionality.)
We already have proof that symbolic systems work. It’s everywhere around us. Our web browsers, operating systems, applications, games, etc. are based on rule-based programs. “The same tools are also, ironically, used in the specification and execution of virtually all of the world’s neural networks,” Marcus notes.
Decades of computer science and cognitive science have proven that being able to store and manipulate abstract concepts is an essential part of any intelligent system. And that is why symbol-manipulation should be a vital component of any robust AI system.
“It is from there that the basic need for hybrid architectures that combine symbol manipulation with other techniques such as deep learning most fundamentally emerges,” Marcus says.
Examples of hybrid AI systems
Despite the heavy dismissal of hybrid artificial intelligence by connectionists, there are plenty of examples that show the strengths of these systems at work. As Marcus notes in his paper, “Researchers occasionally build systems containing the apparatus of symbol-manipulation, without acknowledging (or even considering the fact) that they have done so.” Marcus iterates several examples where hybrid AI systems are silently solving vital problems.
One example is the Neuro-Symbolic Concept Learner, a hybrid AI system developed by researchers at MIT and IBM. The NSCL combines neural networks to solve visual question answering (VQA) problems, a class of tasks that is especially difficult to tackle with the pure neural network-based approaches. The researchers showed that NCSL was able to solve the VQA dataset CLEVR with impressive accuracy. Moreover, the hybrid AI model was able to achieve the feat using much less training data and producing explainable results, addressing two fundamental problems plaguing deep learning.
Google’s search engine is a massive hybrid AI that combines state-of-the-art deep learning techniques such as Transformers and symbol-manipulation systems such as knowledge-graph navigation tools.
AlphaGo, one of the landmark AI achievements of the past few years, is another example of combining symbolic AI and deep learning.
“There are plenty of first steps towards building architectures that combine the strengths of the symbolic approaches with insights from machine learning, in order to develop better techniques for extracting and generalizing abstract knowledge from large, often noisy data sets,” Marcus writes.
The paper goes into much more detail about the components of hybrid AI systems, and the integration of vital elements such as variable binding, knowledge representation and causality with statistical approximation.
“My own strong bet is that any robust system will have some sort of mechanism for variable binding and for performing operations over those variables once bound. But we can’t tell unless we look,” Marcus writes.
Lessons from history
One thing to commend Marcus on is his persistence in the need to bring together all achievements of AI to advance the field. And he has done it almost single-handedly in the past years, against overwhelming odds where most of the prominent voices in artificial intelligence have been dismissing the idea of revisiting symbol manipulation.
Marcus sticking to his guns is almost reminiscent of how Hinton, Bengio, and LeCun continued to push neural networks forward in the decades where there was no interest in them. Their faith in deep neural networks eventually bore fruit, triggering the deep learning revolution in the early 2010s, and earning them a Turing Award in 2019.
It will be interesting to see where Marcus’ quest for creating robust, hybrid AI systems will lead to.