Why the future of AI is flexible, reusable foundation models

Content provided by IBM and TNW

When learning a different language, the easiest way to get started is with fill in the blank exercises. “It’s raining cats and …”

By making mistakes and correcting them, your brain (which linguists agree is hardwired for language learning) starts discovering patterns in grammar, vocabulary, and word sequence — which can not only be applied to filling in blanks, but also to convey meaning to other humans (or computers, dogs, etc.).

That last bit is important when talking about so-called ‘foundation models,’ one of the hottest (but underreported) topics in artificial intelligence right now.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

According to a review paper from 2021, foundation models are, “trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.”

In non-academic language, much like studying fill in the blank exercises, foundation models learn things in a way that can later be applied to other tasks, making them more flexible than current AI models.

Why are foundation models different?

The way foundation models are trained solves one of the biggest bottlenecks in AI: labeling data.

When (to prove you’re not a robot) a website asks you to select “all the pictures containing a boat,” you’re essentially labeling. This label can then be used to feed images of boats to an algorithm so it can, at some point, reliably recognize boats on its own. This is traditionally how AI models are trained; using data labeled by humans. It’s a time-consuming process and requires many humans to label data.

Foundation models don’t need this type of labeling. Instead of relying on human annotation, they use the fill in the blanks method and self-generated feedback to continuously learn and improve performance, without the need for human supervision.

This makes foundation models more accessible for industries that don’t already have a wide-range of data available. In fact, according to Dakshi Agrawal, IBM Fellow and CTO at IBM AI, depending on the domain you’re training a foundation model in, a few gigabytes of data can suffice.

These complex models might sound far removed from a user like you, but you’ve almost certainly seen a foundation model at work at some point online. Some of the more famous ones are the GPT-3 language model, which, after being fed works by famous writers, can produce remarkable imitations, or DALL-E, which produces stunning images based on users’ prompts.

But foundation models are not limited to human language.

Beyond creating new entertainment, the flexibility that foundation models bring could help accelerate groundbreaking medical research, scientific advances, engineering, architecture, and even programming.

Emergent properties

Foundation models are characterized by two very interesting properties: emergence and homogenization.

Emergence means new unexpected properties that models show which were not available in previous generations. It typically happens when model sizes grow. A language model doing basic arithmetic reasoning is an example of an emergent property of a model which is somewhat unexpected.

Homogenization is a complicated term for a model that’s trained to understand and use the English language to perform different tasks. This could include summarizing a piece of text, outputting a poem in the style of a famous writer or interpreting a command given by a human (the GPT-3 language model is a good example of this).

But foundation models are not limited to human language. In essence, what we’re teaching a computer to do is to find patterns in processes or phenomena that it can then replicate given a certain condition.

Let’s unpack that with an example. Take molecules. Physics and chemistry dictate that molecules can exist only in certain configurations. The next step would be to define a use for molecules, such as medicines. A foundation model can then be trained, using reams of medical data, to understand how different molecules (i.e. drugs) interact with the human body when treating diseases.

Of course, models like these can also generate controversy.

This understanding can then be used to ‘fine tune’ the foundation model so it can make suggestions as to which molecule might work in a certain situation. This can speed up medical research significantly, allowing professionals to simply ask the model to come up with molecules that might have certain antibacterial properties, or might work as a drug against a certain virus.

However, as mentioned, this can at times produce unexpected results. Recently, a group of scientists using an AI foundation model to discover cures for rare diseases found that the same model could also be used to discover the most potent chemical weapons known to humankind.

Foundational worries

One small indication of what a sea change these models can bring has been the sprouting of companies offering ‘prompt generators’, which use humans to come up with prompts for models like Midjourney or DALL-E that reliably output interesting or accurate images.

Of course, models like these generate controversy. Lately, a number of artists have spoken out against the use of their artwork for training image generating models.

There’s also a case to be made about the energy use needed to train a large-scale model. Add to that the fact that the significant computing resources needed to create a foundation model mean that only the world’s largest tech companies can afford to train them.

Then again, as Agrawal explained, increasing efficiency in the training and use of these models means that they’re becoming more accessible to more people at an ever-increasing pace –– bringing down both energy consumption and costs.

Another, more foundational (sorry) problem with these models is that any biases or mistakes in the original model can be transferred to tools built with them. So if racist language is used as training data for a language model, it can lead to some offensive outputs and even lawsuits against the company in question.

One way to avoid this is by manually weeding out unwanted training data, but another, more futuristic method is through the use of so-called synthetic data. Synthetic data is essential fake data that’s generated by an AI model to mimic the real thing, but in a more controlled way. This can be useful for ensuring a foundation model doesn’t intake any offensive or privacy-sensitive data during the learning process.

Will more advanced AI models take our jobs?

Well, yes and no.

The way most AI researchers see these models is as a tool. Just like an electric screwdriver meant less hours were needed to put together a wooden structure, a person was still needed to wield the electric screwdriver.

Take IBM’s foundation model Ansible Wisdom. In a quest to find out whether computers can be taught to program computers, researchers fine-tuned a model to generate Ansible code snippets that previously had to be manually written. With it, developers can use natural language to ask the model to e.g. suggest the ansible automation to deploy a new web server.

Agrawal thinks this will completely revolutionize programmer’s jobs.

The whole innovation cycle will accelerate thanks to AI. For example, if you look at code, by using foundation models, coding becomes much faster using the first generation of foundation models. I’m sure it will double productivity in just a few years.

The company is releasing the model as an open source project in collaboration with Red Hat, most famous for distribution and maintenance of the open source operating system Linux.

This use is similar to the electric screwdriver. It takes a mundane task and uses a tool to automate parts of it for the task to be performed more efficiently, saving developers time which they can then use for more creative endeavors.

“It can take over activities that humans are doing today, and humans will just move on to some other activity. I think 80% of the US population used to be in farming. Less than 2% are now (according to the USDA ERS – Ag and Food Sectors and the Economy) –– humans moved on to other activities and, along with that, our quality of living has improved,” Agrawal said.

Foundation models have the potential to change many processes which are now tedious or repetitive for humans. They also offer the possibility for creating radical and unpredicted solutions to some of the hardest problems we’re facing. In effect, foundation models could mean a complete paradigm shift in how knowledge is created and applied. The key will be ensuring that these models are made accessible to the wider public, with the right safeguards in place.