Chatbots could one day replace search engines. Here’s why that’s a terrible idea.
Language models are mindless mimics that do not understand what they are saying—so why do we pretend they’re experts?
The opening of this article has been changed.
Last year I wrote about a proposal from a team of Google researchers who argued that large language models like GPT-3 could one day make good search engines. My reporting caught the eye of Emily M Bender, who studies computational linguistics and ethical issues in natural-language processing at the University of Washington. She thought it was a terrible idea, and has now hit back with a counter-argument.
On March 14, Bender and her University of Washington colleague Chirag Shah, who works on search technologies, published a paper that criticizes what they see as a rush to embrace language models for tasks they are not designed to address. In particular, they fear that using language models for search could lead to more misinformation and more polarized debate.
It’s no longer just that small team of Google researchers. The vision is coming to shape the way the company thinks about search. At last year’s Google I/O, an annual showcase of new widgets and work-in-progress tech, CEO Sundar Pichai revealed Google’s “latest breakthrough in natural-language understanding”: a chatbot called LaMDA, designed to converse on any topic.
He then gave a demo in which LaMDA answered questions about Pluto in natural language. The exchange showed off a convenient way to get information—an alternative to search. Instead of listing results after queries were typed into a search box, LaMDA took part in a conversation in which it role-played the dwarf planet.
—(Pichai) Tell me what I would see if I visited
—(LaMDA) You would get to see a massive canyon, some frozen icebergs, geysers, and some craters
—(Pichai) It sounds beautiful
—(LaMDA) I assure you it is worth the trip. However, you need to bring your coat, because it gets really cold
It was cute—even though LaMDA went on to make a few errors. The AI language model that powers it is still in development, Pichai explained. And Google says it has no plans yet to use LaMDA in its products. Even so, the company is using it to explore new ways to interact with computers—and new ways to search for information. “LaMDA already understands quite a lot about Pluto and millions of other topics,” he said.
The vision of a know-it-all AI that dishes out relevant and accurate information in easy-to-understand bite-size chunks is shaping the way tech companies are approaching the future of search. And with the rise of voice assistants like Siri and Alexa, language models are becoming a go-to technology for finding stuff out in general.
But critics are starting to push back, arguing that the approach is wrong-headed. Asking computers a question and getting an answer in natural language can hide complexity behind a veneer of authority that is not deserved. “We got too bogged down by what we could do; we haven’t looked at what we should do,” says Shah.
“The Star Trek fantasy—where you have this all-knowing computer that you can ask questions and it just gives you the answer—is not what we can provide and not what we need,” says Bender, a coauthor on the paper that led Timnit Gebru to be forced out of Google, which highlighted the dangers of large language models.
It isn’t just that today’s technology is not up to the job, she believes. “I think there is something wrong with the vision,” she says. “It is infantilizing to say that the way we get information is to ask an expert and have them just give it to us.”
Google already uses language models to improve its existing search technology, helping it interpret user queries more accurately. But some believe that language models could be used to overhaul how search is done. LaMDA is just one example.
Last year Google researcher Don Metzler and his colleagues proposed that search could be reimagined as a two-way conversation between user and language model, with computers answering questions much as a human expert might. Google is also developing a technology called a multitask unified model, or MUM. Built on top of a language model, it is designed to respond to users’ queries by pulling together information from different sources.
“We’re deeply invested in advancing language understanding because it makes products like Google Search more useful for people,” says Jane Park, a communications manager in Google’s Search team. But she says that Google has no plans to turn this new research into products yet: “We agree there are a number of open challenges in language understanding, which is why we’re taking a very cautious approach overall.”
Large AI models can mimic natural language with remarkable realism. Trained on hundreds of books and much of the internet, they absorb vast amounts of information—so the thinking goes. Why not use them as a kind of search engine, one that can synthesize responses from multiple sources and package up the information into easily understood sentences?
The problem is that language models are mindless mimics. They can become strikingly accurate at predicting the words or phrases most likely to come next in a sentence or conversation. But despite Pichai’s casual claim that his AI “understands” many topics, language models do not know what they are saying and cannot reason about what their words convey.
This matters because conversational AI can change how we think about exchanges with a machine. Typing a search query into a box and getting a list of responses feels like interacting with a computer, says Bender. But it’s different with language models.
“If I’m instead having a conversation with a machine, then the metaphor is that the machine understands what I’m saying,” she says. “And so I’m going to interpret the machine’s responses in that context.”
We already see users placing uncritical trust in search results, says Shah, and “natural-language interactions make that more pronounced.”
The idea of using AI to synthesize and package up responses to search queries is part of a trend that began with the use of what are known as direct answers or snippets—single answers or short excerpts shown above links to documents in search results. In theory, these can give you the information you’re looking for at a glance, saving you the trouble of reading through longer documents to find it yourself.
Bender is not against using language models for question-answer exchanges in all cases. She has a Google Assistant in her kitchen, which she uses for converting units of measurement in a recipe. “There are times when it is super convenient to be able to use voice to get access to information,” she says.
But Shah and Bender also give a more troubling example that surfaced last year, when Google responded to the query “What is the ugliest language in India?” with the snippet “The answer is Kannada, a language spoken by around 40 million people in south India.”
No easy answers
There’s a dilemma here. Direct answers may be convenient, but they are also often incorrect, irrelevant, or offensive. They can hide the complexity of the real world, says Benno Stein at Bauhaus University in Weimar, Germany.
In 2020, Stein and his colleagues Martin Potthast at Leipzig University and Matthias Hagen at Martin Luther University at Halle-Wittenberg, Germany, published a paper highlighting the problems with direct answers. “The answer to most questions is ‘It depends,’” says Matthias. “This is difficult to get through to someone searching.”
Stein and his colleagues see search technologies as having moved from organizing and filtering information, through techniques such as providing a list of documents matching a search query, to making recommendations in the form of a single answer to a question. And they think that is a step too far.
Again, the problem is not the limitations of existing technology. Even with perfect technology, we’d not get perfect answers, says Stein: “We don’t know what a good answer is because the world is complex, but we stop thinking that when we see these direct answers.”
Shah agrees. Providing people with a single answer can be problematic because the sources of that information and any disagreement between them is hidden, he says: “It really hinges on us completely trusting these systems.”
Shah and Bender suggest a number of solutions to the problems they anticipate. In general, search technologies should support the various ways that people use search engines today, many of which are not served by direct answers. People often use search to explore topics that they may not even have specific questions about, says Shah. In this case, simply offering a list of documents would be more useful.
It must be clear where information comes from, especially if an AI is drawing pieces from more than one source. Some voice assistants already do this, prefacing an answer with “Here’s what I found on Wikipedia,” for example. Future search tools should also have the ability to say “That’s a dumb question,” says Shah. This would help the technology avoid parroting offensive or biased premises in a query.
Stein suggests that AI-based search engines could present reasons for their answers, giving pros and cons of different viewpoints.
However, many of these suggestions simply highlight the dilemma that Stein and his colleagues identified. Anything that reduces convenience will be less attractive to the majority of users. “If you don’t click through to the second page of Google results, you won’t want to read different arguments,” says Stein.
Google says it is aware of many of the issues that these researchers raise and works hard to develop technology that people find useful. But Google is the developer of a multibillion-dollar service. Ultimately, it will build the tools that bring in the most people.
Stein hopes that it won’t all hinge on convenience. “Search is so important for us, for society,” he says.
The opening of this article has been changed to highlight the role that our reporting played in the story.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.
AI automation throughout the drug development pipeline is opening up the possibility of faster, cheaper pharmaceuticals.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
The original startup behind Stable Diffusion has launched a generative AI for video
Runway’s new model, called Gen-1, can change the visual style of existing videos and movies.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.