Skip to Content
Computing

The Rare Disease Search Engine That Outperforms Google

A powerful new search engine designed to help diagnose rare diseases could prove a boon for both medics and the public

 

In the late 1940s, a professor at the University of Maryland School of Medicine coined an unusual phrase to describe unexpected diagnoses. “When you hear hoofbeats behind you, don’t expect to see a zebra,” he said. The phrase stuck and today, medics commonly use the term “zebra” to describe a rare disease, usually defined as one that occurs in less than 1 in 2000 of the population. 

Rare diseases are inherently hard to diagnose. According to the European Organisation for Rare Disease, 25 per cent of diagnoses are delayed by between 5 and 30 years.

So it’s no surprise that medics are looking for more effective ways to do the job. An increasingly common aid in this process is the search engine, typically Google.  This forms part of an iterative process in which a medic enter symptoms into a search engine, examines lists of potential diseases and then looks for further evidence of symptoms in the patient.

The problem, of course, is that  common-or-garden search engines are not optimised for this process. Google, for example, considers pages important if they are linked to by other important pages, the basis of its famous PageRank algorithm. However, rare diseases by definition are unlikely to have a high profile on the web. What’s more, searches are likely to be plagued with returns from all sorts of irrelevant sources.

Today, Radu Dragusin at the Technical University of Denmark and a few pals unveil an alternative. These guys have set up a bespoke search engine dedicated to the diagnosis of rare diseases called FindZebra, a name based on the common medical slang for a rare disease. After comparing the results from this engine against the same searches on Google, they show that it is significantly better at returning relevant results.

The magic sauce in FindZebra is the index it uses to hunt for results. These guys have created this index by crawling a specially selected set of curated  databases on rare diseases. These include the Online Mendelian Inheritance in Man database, the Genetic and Rare Diseases Information Center and Orphanet

They then use the open source information retrieval tool Indri  to search this index via a website with a conventional search engine interface. The result is FindZebra.

Finally, they compared the results of  searches on FindZebra against the same search on Google applied to the same limited dataset, a feature that is possible with advanced Google searches.  Dragusin and co say that the Google results are significantly worse than their own.

For example, on FindZebra the search query “Boy, normal birth, deformity of both big toes (missing joint), quick development of bone tumor near spine and osteogenesis at biopsy” returns the correct diagnosis “Fibrodysplasia ossificans progressiva” as the first result. However, this diagnosis does not appear at all in the results from any type of Google search.

This indicates that the PageRank algorithm, or at least the way Google has tweaked it, is not suited to this kind of search. “Our finding, that FindZebra outperforms Google overall for this task and especially when restricted to the sites of our collection (Google Restricted), suggests that Google ranking algorithm is suboptimal for the task at hand,” they conclude.

Although still a research project, Dragusin and co have made their rare disease search engine publicly available at www.findzebra.com. This could clearly become a valuable tool for the medical community.

What is less clear, however, is how this tool will be used by the general public. The site comes with the forlorn message: “Warning! FindZebra is a research project and it is to be used only by medical professionals” .

FindZebra could obviously be a hypochondriac’s charter. On the other hand, that’s true of any medical dictionary.

The informed public are increasingly visiting their doctors armed with detailed information downloaded form the internet.  Any move to improve the quality of this information must surely be of significant value.

Ref: arxiv.org/abs/1303.3229: FindZebra: A Search Engine For Rare Diseases

Deep Dive

Computing

Russia is risking the creation of a “splinternet”—and it could be irreversible

If Russia disconnects from—or is booted from— the internet’s governing bodies, the internet may never be the same again for any of us.

Conceptual illustration of quantum computing circuity, in multiple colors
Conceptual illustration of quantum computing circuity, in multiple colors

Quantum computing has a hype problem

Quantum computing startups are all the rage, but it’s unclear if they’ll be able to produce anything of use in the near future.

winning team for Pwn2own 2022
winning team for Pwn2own 2022

These hackers showed just how easy it is to target critical infrastructure

Two Dutch researchers have won a major hacking championship by hitting the software that runs the world’s power grids, gas pipelines, and more. It was their easiest challenge yet.

white house regulates cyber concept
white house regulates cyber concept

Inside the plan to fix America’s never-ending cybersecurity failures

The specter of Russian hackers and an overreliance on voluntary cooperation from the private sector means officials are finally prepared to get tough.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.