In the world of cancer treatment, early diagnosis can mean the difference between being cured and being handed a death sentence. At the very least, catching a tumor early increases a patient’s chances of living longer.
Researchers at Microsoft think they may know of a tool that could help detect cancers before you even think to go to a doctor: your search engine.
In a study published Tuesday in the Journal of Oncology Practice, the Microsoft team showed that it was able to mine the anonymized search queries of 6.4 million Bing users to find searches that indicated someone had been diagnosed with pancreatic cancer (such as “why did I get cancer in pancreas,” and “I was told I have pancreatic cancer what to expect”). Then, looking at people’s search patterns before their diagnosis, they identified patterns of search that indicated they had been experiencing symptoms before they ever sought medical treatment.
Pancreatic cancer is a particularly deadly form of the disease. It’s the fourth-leading cause of cancer death in the U.S., and three-quarters of people diagnosed with it die within a year. But catching it early still improves the odds of living longer.
By looking for searches for symptoms—which include yellowing, itchy skin, and abdominal pain—and checking the user’s search history for signs of other risk factors like alcoholism and obesity, the team was often able to identify searches for symptoms up to five months before they were diagnosed.
In their paper, the team acknowledged the limitations of the work, saying that it is not meant to provide people with a diagnosis. Instead they suggested that it might one day be turned into a tool that warns users whose searches indicate they may have symptoms of cancer.
“The goal is not to perform the diagnosis,” said Ryen White, one of the researchers, on a post on Microsoft’s blog. “The goal is to help those at highest risk to engage with medical professionals who can actually make the true diagnosis.”
White and his colleague Eric Horvitz have performed many similar studies looking at what types of information can be gleaned from search engines, including a study last month on how people’s searches evolve as they cope with breast cancer. In 2013, they showed that people’s searches could be mined for adverse effects of prescription drugs even before the U.S. Food and Drug Administration was aware of any problems. Social media also appears to be rich territory—the city of Chicago has used tweets to look for signs of food-borne illnesses stemming from local restaurants.
In their latest work, the Microsoft researchers acknowledge the drawbacks of their study. For one thing, search queries make for a messy data set. The team originally started with data from 9.2 million users but had to cut it to 6.4 million because, for example, some people search for health-related terms more than 20 percent of the time. That likely means those people are health-care professionals—but that’s a large chunk of users to just leave by the wayside.
All this leads to an interesting question: how much health insight can we pull from the data we generate online? Research like the Microsoft team’s provides a tantalizing glimpse of an answer—but for now, at least, it seems like it will remain just out of reach.
This new data poisoning tool lets artists fight back against generative AI
The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
The Biggest Questions: What is death?
New neuroscience is challenging our understanding of the dying process—bringing opportunities for the living.
Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist
An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.
How to fix the internet
If we want online discourse to improve, we need to move beyond the big platforms.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.