AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer parameters

AI researchers from the Ludwig Maximilian University (LMU) of Munich have developed a bite-sized text generator capable of besting OpenAI‘s state of the art GPT-3 using only a tiny fraction of its parameters.

GPT-3 is a monster of an AI system capable of responding to almost any text prompt with unique, original responses that are often surprisingly cogent. It’s an example of what incredibly talented developers can do with cutting-edge algorithms and software when given unfettered access to supercomputers.

But it’s not very efficient. At least not when compared to a new system developed by LMU researchers Timo Schick and Hinrich Schutze.

[Read: OpenAI reveals the pricing plans for its API — and it ain’t cheap]

According to a recent pre-print paper on arXiv, the duo’s system outperforms GPT-3 on the “superGLUE” benchmark test with only 223 million parameters:

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements.

Parameters are variables used to tune and tweak AI models. They’re intimated from data – in essence the more parameters an AI model is trained with, the more robust we expect it to be.

When a system using 99.9% less model parameters is able to best the best at a benchmark task, it’s a pretty big deal. This isn’t to say that the LMU system is better than GPT-3, nor that it’s capable of beating it in tests other than the SuperGLUE benchmark – which isn’t indicative of GPT-3’s overall capabilities.

The LMU system’s results come courtesy of a training method called pattern-exploiting training (PET). According to Open AI policy director Jack Clark, writing in the weekly ImportAI newsletter:

Their approach fuses a training technique called PET (pattern-exploiting training) with a small pre-trained Albert model, letting them create a system that “outperform GPT-3 on SuperGLUE with 32 training examples, while requiring only 0.1% of its parameters.”

Clark goes on to point out that, while it won’t outperform GPT-3 in every task, it does open new avenues for researchers looking to push the boundaries of AI with more modest hardware.

For more information check out the duo’s paper here.

H/t: Jack Clark and ImportAI

So you’re interested in AI? Then join our online event, TNW2020, where you’ll hear how artificial intelligence is transforming industries and businesses.

Story by Tristan Greene

Editor, Neural by TNW

Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: (show all) Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: He/him

Get the TNW newsletter

Get the most important tech news in your inbox each week.

AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer parameters

Get the TNW newsletter

Amazon admits its AI models lag behind OpenAI and Anthropic, but says it can catch up

A 21-year-old Stanford grad just raised $11.6M to build a wearable hormone monitor that doesn’t need blood

Discover TNW All Access

Half of Americans now use AI chatbots, but 40% think AI will make society worse and two-thirds don’t trust the government to regulate it

Arcade raised $60M to fix the real wall blocking enterprise AI agents: what they’re allowed to do