Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Emerging Technology from the arXiv

A View from Emerging Technology from the arXiv

The Rare Disease Search Engine That Outperforms Google

A powerful new search engine designed to help diagnose rare diseases could prove a boon for both medics and the public

  • March 18, 2013

 

In the late 1940s, a professor at the University of Maryland School of Medicine coined an unusual phrase to describe unexpected diagnoses. “When you hear hoofbeats behind you, don’t expect to see a zebra,” he said. The phrase stuck and today, medics commonly use the term “zebra” to describe a rare disease, usually defined as one that occurs in less than 1 in 2000 of the population. 

Rare diseases are inherently hard to diagnose. According to the European Organisation for Rare Disease, 25 per cent of diagnoses are delayed by between 5 and 30 years.

So it’s no surprise that medics are looking for more effective ways to do the job. An increasingly common aid in this process is the search engine, typically Google.  This forms part of an iterative process in which a medic enter symptoms into a search engine, examines lists of potential diseases and then looks for further evidence of symptoms in the patient.

The problem, of course, is that  common-or-garden search engines are not optimised for this process. Google, for example, considers pages important if they are linked to by other important pages, the basis of its famous PageRank algorithm. However, rare diseases by definition are unlikely to have a high profile on the web. What’s more, searches are likely to be plagued with returns from all sorts of irrelevant sources.

Today, Radu Dragusin at the Technical University of Denmark and a few pals unveil an alternative. These guys have set up a bespoke search engine dedicated to the diagnosis of rare diseases called FindZebra, a name based on the common medical slang for a rare disease. After comparing the results from this engine against the same searches on Google, they show that it is significantly better at returning relevant results.

The magic sauce in FindZebra is the index it uses to hunt for results. These guys have created this index by crawling a specially selected set of curated  databases on rare diseases. These include the Online Mendelian Inheritance in Man database, the Genetic and Rare Diseases Information Center and Orphanet

They then use the open source information retrieval tool Indri  to search this index via a website with a conventional search engine interface. The result is FindZebra.

Finally, they compared the results of  searches on FindZebra against the same search on Google applied to the same limited dataset, a feature that is possible with advanced Google searches.  Dragusin and co say that the Google results are significantly worse than their own.

For example, on FindZebra the search query “Boy, normal birth, deformity of both big toes (missing joint), quick development of bone tumor near spine and osteogenesis at biopsy” returns the correct diagnosis “Fibrodysplasia ossificans progressiva” as the first result. However, this diagnosis does not appear at all in the results from any type of Google search.

This indicates that the PageRank algorithm, or at least the way Google has tweaked it, is not suited to this kind of search. “Our finding, that FindZebra outperforms Google overall for this task and especially when restricted to the sites of our collection (Google Restricted), suggests that Google ranking algorithm is suboptimal for the task at hand,” they conclude.

Although still a research project, Dragusin and co have made their rare disease search engine publicly available at www.findzebra.com. This could clearly become a valuable tool for the medical community.

What is less clear, however, is how this tool will be used by the general public. The site comes with the forlorn message: “Warning! FindZebra is a research project and it is to be used only by medical professionals” .

FindZebra could obviously be a hypochondriac’s charter. On the other hand, that’s true of any medical dictionary.

The informed public are increasingly visiting their doctors armed with detailed information downloaded form the internet.  Any move to improve the quality of this information must surely be of significant value.

Ref: arxiv.org/abs/1303.3229: FindZebra: A Search Engine For Rare Diseases

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Rewriting Life

Reprogramming our bodies to make us healthier.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Premium {! insider.prices.premium !}*

    {! insider.display.menuOptionsLabel !}

    Our award winning magazine, unlimited access to our story archive, special discounts to MIT Technology Review Events, and exclusive content.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

    First Look: exclusive early access to important stories, before they’re available to anyone else

    Insider Conversations: listen in on in-depth calls between our editors and today’s thought leaders

  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus ad-free web experience, select discounts to partner offerings and MIT Technology Review events

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning magazine and daily delivery of The Download, our newsletter of what’s important in technology and innovation.

    See details+

    What's Included

    Bimonthly magazine delivery and unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.