Facebook's chief AI scientist says GPT-3 is 'not a very good' Q&A system

Facebook’s chief AI scientist says GPT-3 is ‘not very good’ as a dialog system

The GPT-3 language model has inspired both awe and fear since OpenAI unveiled the system in June. But one person who isn’t overly impressed is Facebook‘s Yann LeCun.

In a Facebook post published Tuesday, the social network’s chief AI scientist said the text generator is “not very good” as a question-answering or dialog system, and that other approaches produce better results.

“It’s entertaining, and perhaps mildly useful as a creative help,” LeCun wrote. “But trying to build intelligent machines by scaling up language models is like building high-altitude airplanes to go to the moon. You might beat altitude records, but going to the moon will require a completely different approach.”

To support his claims, LeCun pointed to a new study of the model’s performance in healthcare scenarios by Nabla, a medtech firm cofounded by two of his former colleagues at Facebook.

[Read: What audience intelligence data tells us about the 2020 US presidential election]

TNW City Coworking space - Where your best work happens

A workspace designed for growth, collaboration, and endless networking opportunities in the heart of tech.

Book a tour now

The researchers note that Open AI’s GPT-3 guidelines put healthcare “in the high stakes category because people rely on accurate medical information for life-or-death decisions, and mistakes here could result in serious harm.” In addition, diagnosing medical or psychiatric conditions are unsupported uses of the model.

Nonetheless, Nabla tried it out on a range of healthcare use cases.

How did GPT-3 perform?

The researchers found that GPT-3 seemed helpful in finding information in long documents and in basic admin tasks such as appointment booking. But it lacked the memory, logic, and understanding of time for many more specific questions.

Nabla also found that GPT-3 was an unreliable Q&A support tool for doctors, dangerously oversimplified medical documentation analysis, and struggled to associate causes with consequences.

The model also made some basic errors in diagnosis and provided some reckless mental health advice.

The researchers do see some potential for using language models in medical settings. But they conclude that GPT-3 is “nowhere near” ready to provide significant help in the sector:

Because of the way it was trained, it lacks the scientific and medical expertise that would make it useful for medical documentation, diagnosis support, treatment recommendation or any medical Q&A. Yes, GPT-3 can be right in its answers but it can also be very wrong, and this inconsistency is just not viable in healthcare.

Their findings won’t shock OpenAI, given the firm’s warnings against using GPT-3 in healthcare. But they do show that many expectations for the model are wildly unrealistic.

Story by Thomas Macaulay

Managing editor

Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he e (show all) Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he enjoys playing chess (badly) and the guitar (even worse).

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with

Facebook

Facebook’s chief AI scientist says GPT-3 is ‘not very good’ as a dialog system

How did GPT-3 perform?

Get the TNW newsletter

Also tagged with

Germany’s Vsquared is taking on Atomico and Balderton on their London home turf

What SpaceX’s record IPO really means for the OpenAI and Anthropic listings behind it

Discover TNW All Access

AI bubble fears are spreading, even as SpaceX readies the biggest IPO ever

Google DeepMind’s TacticAI can predict football plays 8 seconds before they happen. Palmeiras is the first to use it.