Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

 

In the late 1940s, a professor at the University of Maryland School of Medicine coined an unusual phrase to describe unexpected diagnoses. “When you hear hoofbeats behind you, don’t expect to see a zebra,” he said. The phrase stuck and today, medics commonly use the term “zebra” to describe a rare disease, usually defined as one that occurs in less than 1 in 2000 of the population. 

Rare diseases are inherently hard to diagnose. According to the European Organisation for Rare Disease, 25 per cent of diagnoses are delayed by between 5 and 30 years.

So it’s no surprise that medics are looking for more effective ways to do the job. An increasingly common aid in this process is the search engine, typically Google.  This forms part of an iterative process in which a medic enter symptoms into a search engine, examines lists of potential diseases and then looks for further evidence of symptoms in the patient.

The problem, of course, is that  common-or-garden search engines are not optimised for this process. Google, for example, considers pages important if they are linked to by other important pages, the basis of its famous PageRank algorithm. However, rare diseases by definition are unlikely to have a high profile on the web. What’s more, searches are likely to be plagued with returns from all sorts of irrelevant sources.

Today, Radu Dragusin at the Technical University of Denmark and a few pals unveil an alternative. These guys have set up a bespoke search engine dedicated to the diagnosis of rare diseases called FindZebra, a name based on the common medical slang for a rare disease. After comparing the results from this engine against the same searches on Google, they show that it is significantly better at returning relevant results.

The magic sauce in FindZebra is the index it uses to hunt for results. These guys have created this index by crawling a specially selected set of curated  databases on rare diseases. These include the Online Mendelian Inheritance in Man database, the Genetic and Rare Diseases Information Center and Orphanet

They then use the open source information retrieval tool Indri  to search this index via a website with a conventional search engine interface. The result is FindZebra.

Finally, they compared the results of  searches on FindZebra against the same search on Google applied to the same limited dataset, a feature that is possible with advanced Google searches.  Dragusin and co say that the Google results are significantly worse than their own.

For example, on FindZebra the search query “Boy, normal birth, deformity of both big toes (missing joint), quick development of bone tumor near spine and osteogenesis at biopsy” returns the correct diagnosis “Fibrodysplasia ossificans progressiva” as the first result. However, this diagnosis does not appear at all in the results from any type of Google search.

This indicates that the PageRank algorithm, or at least the way Google has tweaked it, is not suited to this kind of search. “Our finding, that FindZebra outperforms Google overall for this task and especially when restricted to the sites of our collection (Google Restricted), suggests that Google ranking algorithm is suboptimal for the task at hand,” they conclude.

Although still a research project, Dragusin and co have made their rare disease search engine publicly available at www.findzebra.com. This could clearly become a valuable tool for the medical community.

What is less clear, however, is how this tool will be used by the general public. The site comes with the forlorn message: “Warning! FindZebra is a research project and it is to be used only by medical professionals” .

FindZebra could obviously be a hypochondriac’s charter. On the other hand, that’s true of any medical dictionary.

The informed public are increasingly visiting their doctors armed with detailed information downloaded form the internet.  Any move to improve the quality of this information must surely be of significant value.

Ref: arxiv.org/abs/1303.3229: FindZebra: A Search Engine For Rare Diseases

 

7 comments. Share your thoughts »

Tagged: Biomedicine

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me