Ben Dickson is the founder of TechTalks. He writes regularly about business, technology and politics. Follow him on Twitter and Facebook Ben Dickson is the founder of TechTalks. He writes regularly about business, technology and politics. Follow him on Twitter and Facebook
This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.
Consider the animal in the following image. If you recognize it, a quick series of neuron activations in your brain will link its image to its name and other information you know about it (habitat, size, diet, lifespan, etc…). But if like me, you’ve never seen this animal before, your mind is now racing through your repertoire of animal species, comparing tails, ears, paws, noses, snouts, and everything else to determine which bucket this odd creature belongs to. Your biological neural network is reprocessing your past experience to deal with a novel situation.
Our brains, honed through millions of years of evolution, are very efficient processing machines, sorting out the ton of information we receive through our sensory inputs, associating known items with their respective categories.
That picture, by the way, is an Indian civet, an endangered species that has nothing to do with cats, dogs, and rodents. It should be placed in its own separate category (viverrids). There you go. You now have a new bucket to place civets in, which includes this variant that was sighted recently in India.
While we have yet to learn much about how the mind works, we are in the midst (or maybe still at the beginning) of an era of creating our own version of the human brain. After decades of research and development, researchers have managed to create deep neural networks that sometimes match or surpass human performance in specific tasks.
[Read: Artificial vs augmented intelligence: what’s the difference?]
But one of the recurring themes in discussions about artificial intelligence is whether artificial neural networks used in deep learning work similarly to the biological neural networks of our brains. Many scientists agree that artificial neural networks are a very rough imitation of the brain’s structure, and some believe that ANNs are statistical inference engines that do not mirror the many functions of the brain. The brain, they believe, contains many wonders that go beyond the mere connection of biological neurons.
A paper recently published in the peer-reviewed journal Neuron challenges the conventional view of the functions of the human brain. Titled “Direct Fit to Nature: An Evolutionary Perspective on Biological and Artificial Neural Networks,” the paper discusses that contrary to the beliefs of many scientists, the human brain is a brute-force big data processor that fits its parameters to the many examples that it experiences. That’s the kind of description usually given to deep neural networks.
Authored by researchers at Princeton University, the thought-provoking paper provides a different perspective on neural networks, analogies between ANNs and their biological counterparts, and future directions for creating more capable artificial intelligence systems.
AI’s interpretability challenge
Neuroscientists generally believe that the complex functionalities of the brain can be broken down into simple, interpretable models.
For instance, I can explain the complex mental process of my analysis of the civet picture (before I knew its name, of course), as such: “It’s definitely not a bird because it doesn’t have feathers and wings. And it certainly isn’t a fish. It’s probably a mammal, given the furry coat. It could be a cat, given the pointy ears, but the neck is a bit too long and the body shape a bit weird. The snout is a bit rodent-like, but the legs are longer than most rodents…” and finally I would come to the conclusion that it’s probably an esoteric species of cat. (In my defense, it is a very distant relative of felines if you insist.)
Artificial neural networks, however, are often dismissed as uninterpretable black boxes. They do not provide rich explanations of their decision process. This is especially true when it comes to the complex deep neural networks that are composed of hundreds (or thousands of layers) and millions (or billions) or parameters.
During their training phase, deep neural networks review millions of images and their associated labels, and then they mindlessly tune their millions of parameters to the patterns they extract from those images. These tuned parameters then allow them to determine which class a new image belongs to. They don’t understand the higher-level concepts that I just mentioned (neck, ear, nose, legs, etc.) and only look for consistency between the pixels of an image.
The authors of “Direct Fit to Nature” acknowledge that neural networks—both biological and artificial—can differ considerably in their circuit architecture, learning rules, and objective functions.
“All networks, however, use an iterative optimization process to pursue an objective, given their input or environment—a process we refer to as ‘direct fit,’” the researchers write. The term “direct fit” is inspired from the blind fitting process observed in evolution, an elegant but mindless optimization process where different organisms adapt to their environment through a series of random genetic transformations carried out over a very long period.
“This framework undercuts the assumptions of traditional experimental approaches and makes unexpected contact with long-standing debates in developmental and ecological psychology,” the authors write.
Another problem that the artificial intelligence community faces is the tradeoff between interpretability and generalization. Scientists and researchers are constantly searching for new techniques and structures that can generalize AI capabilities across vaster domains. And experience has shown that, when it comes to artificial neural networks, scale improves generalization. Advances in processing hardware and the availability of large compute resources have enabled researchers to create and train very large neural networks in reasonable timeframes. And these networks have proven to be remarkably better at performing complex tasks such as computer vision and natural language processing.
The problem with artificial neural networks, however, is that the larger they get, the more opaque they become. With their logic spread across millions of parameters, they become much harder to interpret than a simple regression model that assigns a single coefficient to each feature. Simplifying the structure of artificial neural networks (e.g., reducing the number of layers or variables) will make it easier to interpret how they map different input features to their outcomes. But simpler models are also less capable in dealing with the complex and messy data found in nature.
“We argue that neural computation is grounded in brute-force direct fitting, which relies on over-parameterized optimization algorithms to increase predictive power (generalization) without explicitly modeling the underlying generative structure of the world,” the authors of “Direct Fit to Nature” write.
AI’s generalization problem
Say you want to create an AI system that detects chairs in images and videos. Ideally, you would provide the algorithm with a few images of chairs, and it would be able to detect all types of normal as well as wacky and funky ones.
This is one of the long-sought goals of artificial intelligence, creating models that can “extrapolate” well. This means that, given a few examples of a problem domain, the model should be able to extract the underlying rules and apply them to a vast range of novel examples it hasn’t seen before.
When dealing with simple (mostly artificial) problem domains, it might be possible to reach extrapolation level by tuning a deep neural network to a small set of training data. For instance, such levels of generalization might be achievable in domains with limited features such as sales forecasting and inventory management. (But as we’ve seen in these pages, even these simple AI models might fall apart when a fundamental change comes to their environment.)
But when it comes to messy and unstructured data such as images and text, small data approaches tend to fail. In images, every pixel effectively becomes a variable, so analyzing a set of 100×100 pixel images becomes a problem with 10,000 dimensions, each having thousands or millions of possibilities.
“In cases in which there are complex nonlinearities and interactions among variables at different parts of the parameter space, extrapolation from such limited data is bound to fail,” the Princeton researchers write.
The human brain, many cognitive scientists believe, can rely on implicit generative rules without being exposed to rich data from the environment. Artificial neural networks, on the other hand, do not have such capabilities, the popular belief is. This is the belief that the authors of “Direct Fit to Nature” challenge.
Direct fitting neural networks to the problem domain
“Dense sampling of the problem space can flip the problem of prediction on its head, turning an extrapolation-based problem into an interpolation-based problem,” the researchers note.
In essence, with enough samples, you will be able to capture a large enough area of the problem domain. This makes it possible to interpolate between samples with simple computations without the need to extract abstract rules to predict the outcome of situations that fall outside the domain of the training examples.
“When the data structure is complex and multidimensional, a ‘mindless’ direct-fit model, capable of interpolation-based prediction within a real-world parameter space, is preferable to a traditional ideal-fit explicit model that fails to explain much variance in the data,” the authors of “Direct Fit to Nature” write.
In tandem with advances in computing hardware, the availability of very large data sets has enabled the creation of direct-fit artificial neural networks in the past decade. The internet is rich with all sorts of data from various domains. Scientists create vast deep learning data sets from Wikipedia, social media networks, image repositories, and more. The advent of the internet of things (IoT) has also enabled rich sampling from physical environments (roads, buildings, weather, bodies, etc.).
In many types of applications (i.e., supervised learning algorithms), the gathered data still requires a lot of manual labor to associate each sample with its outcome. But nonetheless, the availability of big data has made it possible to apply the direct-fit approach to complex domains that can’t be represented with few samples and general rules.
One argument against this approach is the “long tail” problem, often described as “edge cases.” For instance, in image classifications, one of the outstanding problems is that popular training data sets such as ImageNet provides millions of pictures of different types of objects. But since most of the pictures were taken under ideal lighting conditions and from conventional angles, deep neural networks trained on these datasets fail to recognize those objects in rare positions.
“The long tail does not pertain to new examples per se, but to low-frequency or odd examples (e.g. a strange view of a chair, or a chair shaped like an unrelated object) or riding in a new context (like driving in a blizzard or with a flat tire),” co-authors of the paper Uri Hasson, Professor at Department of Psychology and Princeton Neuroscience Institute, and Sam Nastase, Postdoctoral researcher at Princeton Neuroscience Institute, told TechTalks in written comments. “Note that biological organisms, including people, like ANNs, are bad at extrapolating to contexts they never experienced; e.g. many people fail spectacularly when driving in snow for the first time.”
Many developers try to make their deep learning models more robust by blindly adding more samples to the training data set, hoping to cover all possible situations. This usually doesn’t solve the problem, because the sampling techniques don’t widen the distribution of the data set, and edge cases remain uncovered by the easily collected data samples. The solution, Hasson and Nastase argue, is to expand the interpolation zone by providing a more ecological, embodied sampling regime for artificial neural networks that currently perform poorly in the tail of the distribution.
“For example, many of the oddities in classical human visual psychophysics are trivially resolved by allowing the observer to simply move and actively sample the environment (something essentially all biological organisms do),” they say. “That is, the long-tail phenomenon is in part a sampling deficiency. However, the solution isn’t necessarily just more samples (which will in large part come from the body of the distribution), but will instead require more sophisticated sampling observed in biological organisms (e.g. novelty seeking).”
This observation is in line with recent research that shows employing a more diverse sampling methodology can in fact improve the performance of computer vision systems.
In fact, the need for sampling from the long tail also applies to the human brain. For instance, consider one of the oft-mentioned criticisms against self-driving cars which posits that their abilities are limited to the environments they’ve been trained in.
“Even the most experienced drivers can find themselves in a new context where they are not sure how to act. The point is to not train a foolproof car, but a self-driving car that can drive, like humans, in 99 percent of the contexts. Given the diversity of driving contexts, this is not easy, but perhaps doable,” Hasson and Nastase say. “We often overestimate the generalization capacity of biological neural networks, including humans. But most biological neural networks are fairly brittle; consider for example that raising ocean temperatures 2 degrees will wreak havoc on entire ecosystems.”
Challenging old beliefs
Many scientists criticize AI systems that rely on very large neural networks, arguing that the human brain is very resource-efficient. The brain is a three-pound mass of matter that uses little over 10 watts of electricity. Deep neural networks, however, often require very large servers that can consume megawatts of power.
But hardware aside, comparing the components of the brain to artificial neural networks paints a different picture. The largest deep neural networks are composed of a few billion parameters. The human brain, in contrast, is constituted of approximately 1,000 trillion synapses, the biological equivalent of ANN parameters. Moreover, the brain is a highly parallel system, which makes it very hard to compare its functionality to that of ANNs.
“Although the brain is certainly subject to wiring and metabolic constraints, we should not commit to an argument for scarcity of computational resources as long as we poorly understand the computational machinery in question,” the Princeton researchers write in their paper.
Another argument is that, in contrast to ANNs, the biological neural network of the human brain has very poor input mechanisms and doesn’t have the capacity to ingest and process very large amounts of data. This makes it inevitable for human brains to learn new tasks without learning the underlying rules.
To be fair, calculating the input entering the brain is complicated. But we often underestimate the huge amount of data that we process. “For example, we may be exposed to thousands of visual exemplars of many daily categories a year, and each category may be sampled at thousands of views in each encounter, resulting in a rich training set for the visual system. Similarly, with regard to language, studies estimate that a child is exposed to several million words per year,” the authors of the paper write.
Beyond System 1 neural networks
One thing that can’t be denied, however, is that humans do in fact extract rules from their environment and develop abstract thoughts and concepts that they use to process and analyze new information. This complex symbol manipulation enables humans to compare and draw analogies between different tasks and perform efficient transfer learning. Understanding and applying causality remain among the unique features of the human brain.
“It is certainly the case that humans can learn abstract rules and extrapolate to new contexts in a way that exceeds modern ANNs. Calculus is perhaps the best example of learning to apply rules across different contexts. Discovering natural laws in physics is another example, where you learn a very general rule from a set of limited observations,” Hasson and Nastase say.
These are the kind of capabilities that emerge not from the activations and interactions of a single neural network but are the result of the accumulated knowledge across many minds and generations.
This is one area that direct-fit models fall short, Hasson and Nastase acknowledge. Scientifically, it is called System 1 and System 2 thinking. System 1 refers to the kind of tasks that can be learned by rote, such as recognizing faces, walking, running, driving. You can perform most of these capabilities subconsciously, while also performing some other task (e.g., walking and talking to someone else at the same time, driving and listening to the radio). System 2, however, requires concentration and conscious thinking (can you solve a differential equation while jogging?).
“In the paper, we distinguish fast and automatic System 1 capacities from the slow and deliberate cognitive functions,” Hasson and Nastase say. “While direct fit allows the brain to be competent while being blind to the solution it learned (similar to all evolved functional solutions in biology), and while it explains the ability of System 1 to learn to perceive and act across many contexts, it still doesn’t fully explain a subset of human functions attributed to System 2 which seems to gain some explicit understanding of the underlying structure of the world.”
So what do we need to develop AI algorithms that have System 2 capabilities? This is one area where there’s much debate in the research community. Some scientists, including deep learning pioneer Yoshua Bengio, believe that pure neural network-based systems will eventually lead to System 2 level AI. New research in the field shows that advanced neural network structures manifest the kind of symbol manipulation capabilities that were previously thought to be off-limits for deep learning.
In “Direct Fit to Nature,” the authors support the pure neural network–based approach. In their paper, they write: “Although the human mind inspires us to touch the stars, it is grounded in the mindless billions of direct-fit parameters of System 1. Therefore, direct-fit interpolation is not the end goal but rather the starting point for understanding the architecture of higher-order cognition. There is no other substrate from which System 2 could arise.”
An alternative view is the creation of hybrid systems that incorporate classic symbolic AI with neural networks. The area has drawn much attention in the past year, and there are several projects that show that rule-based AI and neural networks can complement each other to create systems that are stronger than the sum of their parts.
“Although non-neural symbolic computing—in the vein of von Neumann’s model of a control unit and arithmetic logic units—is useful in its own right and may be relevant at some level of description, the human System 2 is a product of biological evolution and emerges from neural networks,” Hasson and Nastase wrote in their comments to TechTalks.
In their paper, Hasson and Nastase expand on some of the possible components that might develop higher capabilities for neural networks. One interesting suggestion is providing a physical body for neural networks to experience and explore the world like other living beings.
“Integrating a network into a body that allows it to interact with objects in the world is necessary for facilitating learning in new environments,” Hasson and Nastase said. “Asking a language model to learn the meaning of words from the adjacent words in text corpora exposes the network to a highly restrictive and narrow context. If the network has a body and can interact with objects and people in a way that relates to the words, it is likely to get a better sense of the meaning of words in context. Counterintuitively, imposing these sorts of ‘limitations’ (e.g. a body) on a neural network can force the neural network to learn more useful representations.”
This article was originally published by Ben Dickson on TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech and what we need to look out for. You can read the original article here.
Get the TNW newsletter
Get the most important tech news in your inbox each week.