Trust large language models at your own peril

Plus: A robot dog that can scramble over tricky terrain.

Melissa Heikkiläarchive page

November 22, 2022

Stephanie Arnett/MITTR; Getty, Envato, NASA

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

When Meta launched Galactica, an open-source large language model designed to help scientists, the company—reeling from criticism of its expensive metaverse investments and its recent massive layoffs—was hoping for a big PR win. Instead, all it got was flak on Twitter and a spicy blog post from one of its most vocal critics, ending with its embarrassing decision to take the public demo of the model down after only three days.

According to Meta, Galactica can “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” But soon after its launch, it was pretty easy for outsiders to prompt the model to provide “scientific research” on the benefits of homophobia, anti-Semitism, suicide, eating glass, being white, or being a man. Meanwhile, papers on AIDS or racism were blocked. Charming!

As my colleague Will Douglas Heaven writes in his story about the debacle: “Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models.”

Not only was Galactica’s launch premature, but it shows how insufficient AI researchers’ efforts to make large language models safer have been.

Meta might have been confident that Galactica outperformed competitors in generating scientific-sounding content. But its own testing of the model for bias and truthfulness should have deterred the company from releasing it into the wild.

One common way researchers aim to make large language models less likely to spit out toxic content is to filter out certain keywords. But it’s hard to create a filter that can capture all the nuanced ways humans can be unpleasant. The company would have saved itself a world of trouble if it had conducted more adversarial testing of Galactica, in which the researchers would have tried to get it to regurgitate as many different biased outcomes as possible.

Meta’s researchers measured the model for biases and truthfulness, and while it performed slightly better than competitors such as GPT-3 and Meta’s own OPT model, it did provide a lot of biased or incorrect answers. And there are also several other limitations. The model is trained on scientific resources that are open access, but many scientific papers and textbooks are restricted behind paywalls. This inevitably leads Galactica to use more sketchy secondary sources.

Galactica also seems to be an example of something we don’t really need AI to do. It doesn’t seem as though it would even achieve Meta’s stated goal of helping scientists work more quickly. In fact, it would require them to put in a lot of extra effort to verify whether the information from the model was accurate or not.

It’s really disappointing (yet totally unsurprising) to see big AI labs, which should know better, hype up such flawed technologies. We know that language models have a tendency to reproduce prejudice and assert falsehoods as facts. We know they can “hallucinate” or make up content, such as wiki articles about the history of bears in space. But the debacle was useful for one thing, at least. It reminded us that the only thing large language models “know” for certain is how words and sentences are formed. Everything else is guesswork.

Deeper Learning

Watch this robot dog scramble over tricky terrain just by using its camera

A new technique developed by teams from Carnegie Mellon and Berkeley could potentially help robots become more useful by making them better at navigating tricky terrain, such as steps and uneven ground.

Unlike other robots, which tend to rely heavily on an internal map to get around, their robot uses a combination of cameras and reinforcement learning. Applying this technique in other robots could help make them more robust, because they wouldn’t be constrained by potential errors in a map.

Why it’s a big deal: Their work could help with efforts to break robots out of the lab and get them moving about more freely in the real world. Read my story here.

Bits and Bytes

Stanford studied 30 large language models so you don’t have to
The university’s Center for Research on Foundation Models has combined several different metrics into one big, holistic benchmark that evaluates the accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency of large language models. I was surprised to see that bigger models didn’t actually translate to better performance. (Stanford)

Italy has outlawed facial recognition tech in most cases
The country has banned the use of facial recognition unless it is to fight crime—at least until the end of next year. The ban is similar to what the EU is considering doing in its upcoming regulation, the AI Act. (Reuters)

Gig workers in India are uniting to take back control from algorithms
A great story about how gig workers are finding ways to game the algorithms that govern their working lives to their advantage, for once. (Rest of World)

The scary truth about AI copyright is that nobody knows what will happen next
Laws around copyright will need to adjust fast as image-making AI becomes even more ubiquitous. This piece lays out the tensions and pitfalls facing the industry. (The Verge)

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.