Why Meta’s latest large language model survived only three days online

Galactica was supposed to help scientists. Instead, it mindlessly spat out biased and incorrect nonsense.

Will Douglas Heavenarchive page

November 18, 2022

Stephanie Arnett/MITTR; Getty, Envato, NASA

On November 15 Meta unveiled a new large language model called Galactica, designed to assist scientists. But instead of landing with the big bang Meta hoped for, Galactica has died with a whimper after three days of intense criticism. Yesterday the company took down the public demo that it had encouraged everyone to try out.

Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models. There is a large body of research that highlights the flaws of this technology, including its tendencies to reproduce prejudice and assert falsehoods as facts.

However, Meta and other companies working on large language models, including Google, have failed to take it seriously.

Galactica is a large language model for science, trained on 48 million examples of scientific articles, websites, textbooks, lecture notes, and encyclopedias. Meta promoted its model as a shortcut for researchers and students. In the company’s words, Galactica “can summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.”

But the shiny veneer wore through fast. Like all language models, Galactica is a mindless bot that cannot tell fact from fiction. Within hours, scientists were sharing its biased and incorrect results on social media.

Absolutely.

Galactica is little more than statistical nonsense at scale.

Amusing. Dangerous. And IMHO unethical. https://t.co/15DAFJCzIb
— Grady Booch (@Grady_Booch) November 17, 2022

“I am both astounded and unsurprised by this new effort,” says Chirag Shah at the University of Washington, who studies search technologies. “When it comes to demoing these things, they look so fantastic, magical, and intelligent. But people still don’t seem to grasp that in principle such things can’t work the way we hype them up to.”

Asked for a statement on why it had removed the demo, Meta pointed MIT Technology Review to a tweet that says: “Thank you everyone for trying the Galactica model demo. We appreciate the feedback we have received so far from the community, and have paused the demo for now. Our models are available for researchers who want to learn more about the work and reproduce results in the paper.”

A fundamental problem with Galactica is that it is not able to distinguish truth from falsehood, a basic requirement for a language model designed to generate scientific text. People found that it made up fake papers (sometimes attributing them to real authors), and generated wiki articles about the history of bears in space as readily as ones about protein complexes and the speed of light. It’s easy to spot fiction when it involves space bears, but harder with a subject users may not know much about.

Many scientists pushed back hard. Michael Black, director at the Max Planck Institute for Intelligent Systems in Germany, who works on deep learning, tweeted: “In all cases, it was wrong or biased but sounded right and authoritative. I think it’s dangerous.”

I asked #Galactica about some things I know about and I'm troubled. In all cases, it was wrong or biased but sounded right and authoritative. I think it's dangerous. Here are a few of my experiments and my analysis of my concerns. (1/9)
— Michael Black (@Michael_J_Black) November 17, 2022

Even more positive opinions came with clear caveats: “Excited to see where this is headed!” tweeted Miles Cranmer, an astrophysicist at Princeton. “You should never keep the output verbatim or trust it. Basically, treat it like an advanced Google search of (sketchy) secondary sources!”

Galactica also has problematic gaps in what it can handle. When asked to generate text on certain topics, such as “racism” and “AIDS,” the model responded with: “Sorry, your query didn’t pass our content filters. Try again and keep in mind this is a scientific language model.”

The Meta team behind Galactica argues that language models are better than search engines. “We believe this will be the next interface for how humans access scientific knowledge,” the researchers write.

This is because language models can “potentially store, combine, and reason about” information. But that “potentially” is crucial. It’s a coded admission that language models cannot yet do all these things. And they may never be able to.

“Language models are not really knowledgeable beyond their ability to capture patterns of strings of words and spit them out in a probabilistic manner,” says Shah. “It gives a false sense of intelligence.”

Gary Marcus, a cognitive scientist at New York University and a vocal critic of deep learning, gave his view in a Substack post titled “A Few Words About Bullshit,” saying that the ability of large language models to mimic human-written text is nothing more than “a superlative feat of statistics.”

And yet Meta is not the only company championing the idea that language models could replace search engines. For the last couple of years, Google has been promoting language models, such as LaMDA, as a way to look up information.

It’s a tantalizing idea. But suggesting that the human-like text such models generate will always contain trustworthy information, as Meta appeared to do in its promotion of Galactica, is reckless and irresponsible. It was an unforced error.

My considered opinion of Galactica: it's fun, impressive, and interesting in many ways. Great achievement. It's just unfortunate that it's being touted as a practical research tool, and even more unfortunate that it suggests you use it to write complete articles.
— Julian Togelius (@togelius) November 17, 2022

And it wasn’t just the fault of Meta’s marketing team. Yann LeCun, a Turing Award winner and Meta’s chief scientist, defended Galactica to the end. On the day the model was released, LeCun tweeted: “Type a text and Galactica will generate a paper with relevant references, formulas, and everything.” Three days later, he tweeted: “Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?”

It's not quite Meta's Tay moment. Recall that in 2016, Microsoft launched a chatbot called Tay on Twitter—then shut it down 16 hours later when Twitter users turned it into a racist, homophobic sexbot. But Meta’s handling of Galactica smacks of the same naivete.

“Big tech companies keep doing this—and mark my words, they will not stop—because they can,” says Shah. “And they feel like they must—otherwise someone else might. They think that this is the future of information access, even if nobody asked for that future.”

Correction: A previous version of this story stated that Google has been promoting the language model PaLM as a way to look up information for a couple of years. The language model we meant to refer to is LaMDA.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

Will Douglas Heavenarchive page

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Will Douglas Heavenarchive page

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Will Douglas Heavenarchive page

The AI Act is done. Here’s what will (and won’t) change

The hard work starts now.

Melissa Heikkiläarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Why Meta’s latest large language model survived only three days online

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

What’s next for generative video

The AI Act is done. Here’s what will (and won’t) change

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

What’s next for generative video

The AI Act is done. Here’s what will (and won’t) change

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review