Save over 40% when you secure your tickets today to TNW Conference 💥 Prices will increase on November 22 →

This article was published on September 21, 2020

AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer parameters


AI devs created a lean, mean, GPT-3-beating machine that uses 99.9% fewer parameters Image by: Rog01

AI researchers from the Ludwig Maximilian University (LMU) of Munich have developed a bite-sized text generator capable of besting OpenAI’s state of the art GPT-3 using only a tiny fraction of its parameters.

GPT-3 is a monster of an AI system capable of responding to almost any text prompt with unique, original responses that are often surprisingly cogent. It’s an example of what incredibly talented developers can do with cutting-edge algorithms and software when given unfettered access to supercomputers.

But it’s not very efficient. At least not when compared to a new system developed by LMU researchers Timo Schick and Hinrich Schutze.

[Read: OpenAI reveals the pricing plans for its API — and it ain’t cheap]

According to a recent pre-print paper on arXiv, the duo’s system outperforms GPT-3 on the “superGLUE” benchmark test with only 223 million parameters:

In this work, we show that performance similar to GPT-3 can be obtained with language models whose parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain some form of task description, combined with gradient-based optimization; additionally exploiting unlabeled data gives further improvements.

Parameters are variables used to tune and tweak AI models. They’re intimated from data – in essence the more parameters an AI model is trained with, the more robust we expect it to be.

When a system using 99.9% less model parameters is able to best the best at a benchmark task, it’s a pretty big deal. This isn’t to say that the LMU system is better than GPT-3, nor that it’s capable of beating it in tests other than the SuperGLUE benchmark – which isn’t indicative of GPT-3’s overall capabilities.

The LMU system’s results come courtesy of a training method called pattern-exploiting training (PET). According to Open AI policy director Jack Clark, writing in the weekly ImportAI newsletter:

Their approach fuses a training technique called PET (pattern-exploiting training) with a small pre-trained Albert model, letting them create a system that “outperform GPT-3 on SuperGLUE with 32 training examples, while requiring only 0.1% of its parameters.”

Clark goes on to point out that, while it won’t outperform GPT-3 in every task, it does open new avenues for researchers looking to push the boundaries of AI with more modest hardware.

For more information check out the duo’s paper here.

H/t: Jack Clark and ImportAI

So you’re interested in AI? Then join our online event, TNW2020, where you’ll hear how artificial intelligence is transforming industries and businesses.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with


Published
Back to top