AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer parameters

AI researchers from the Ludwig Maximilian University (LMU) of Munich have developed a bite-sized text generator capable of besting OpenAI’s state of the art GPT-3 using only a tiny fraction of its parameters.

GPT-3 is a monster of an AI system capable of responding to almost any text prompt with unique, original responses that are often surprisingly cogent. It’s an example of what incredibly talented developers can do with cutting-edge algorithms and software when given unfettered access to supercomputers.

But it’s not very efficient. At least not when compared to a new system developed by LMU researchers Timo Schick and Hinrich Schutze.

[Read: OpenAI reveals the pricing plans for its API — and it ain’t cheap]

According to a recent pre-print paper on arXiv, the duo’s system outperforms GPT-3 on the “superGLUE” benchmark test with only 223 million parameters:

In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements.

TNW City Coworking space - Where your best work happens

A workspace designed for growth, collaboration, and endless networking opportunities in the heart of tech.

Book a tour now

Parameters are variables used to tune and tweak AI models. They’re intimated from data – in essence the more parameters an AI model is trained with, the more robust we expect it to be.

When a system using 99.9% less model parameters is able to best the best at a benchmark task, it’s a pretty big deal. This isn’t to say that the LMU system is better than GPT-3, nor that it’s capable of beating it in tests other than the SuperGLUE benchmark – which isn’t indicative of GPT-3’s overall capabilities.

The LMU system’s results come courtesy of a training method called pattern-exploiting training (PET). According to Open AI policy director Jack Clark, writing in the weekly ImportAI newsletter:

Their approach fuses a training technique called PET (pattern-exploiting training) with a small pre-trained Albert model, letting them create a system that “outperform GPT-3 on SuperGLUE with 32 training examples, while requiring only 0.1% of its parameters.”

Clark goes on to point out that, while it won’t outperform GPT-3 in every task, it does open new avenues for researchers looking to push the boundaries of AI with more modest hardware.

For more information check out the duo’s paper here.

H/t: Jack Clark and ImportAI

So you’re interested in AI? Then join our online event, TNW2020, where you’ll hear how artificial intelligence is transforming industries and businesses.

Story by Tristan Greene

Editor, Neural by TNW

Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: (show all) Tristan is a futurist covering human-centric artificial intelligence advances, quantum computing, STEM, physics, and space stuff. Pronouns: He/him

Get the TNW newsletter

Get the most important tech news in your inbox each week.

AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer parameters

Get the TNW newsletter

Also tagged with

When the machines started talking to each other

Synthesia’s valuation jumps to $4B after $200M raise

Discover TNW All Access

Bananas, champagne, and robots: Why automation still needs humans

Kembara closes €750M first close to fuel growth of European deep tech startups