In the late 1940s, a professor at the University of Maryland School of Medicine coined an unusual phrase to describe unexpected diagnoses. “When you hear hoofbeats behind you, don’t expect to see a zebra,” he said. The phrase stuck and today, medics commonly use the term “zebra” to describe a rare disease, usually defined as one that occurs in less than 1 in 2000 of the population.
Rare diseases are inherently hard to diagnose. According to the European Organisation for Rare Disease, 25 per cent of diagnoses are delayed by between 5 and 30 years.
So it’s no surprise that medics are looking for more effective ways to do the job. An increasingly common aid in this process is the search engine, typically Google. This forms part of an iterative process in which a medic enter symptoms into a search engine, examines lists of potential diseases and then looks for further evidence of symptoms in the patient.
The problem, of course, is that common-or-garden search engines are not optimised for this process. Google, for example, considers pages important if they are linked to by other important pages, the basis of its famous PageRank algorithm. However, rare diseases by definition are unlikely to have a high profile on the web. What’s more, searches are likely to be plagued with returns from all sorts of irrelevant sources.
Today, Radu Dragusin at the Technical University of Denmark and a few pals unveil an alternative. These guys have set up a bespoke search engine dedicated to the diagnosis of rare diseases called FindZebra, a name based on the common medical slang for a rare disease. After comparing the results from this engine against the same searches on Google, they show that it is significantly better at returning relevant results.
The magic sauce in FindZebra is the index it uses to hunt for results. These guys have created this index by crawling a specially selected set of curated databases on rare diseases. These include the Online Mendelian Inheritance in Man database, the Genetic and Rare Diseases Information Center and Orphanet.
They then use the open source information retrieval tool Indri to search this index via a website with a conventional search engine interface. The result is FindZebra.
Finally, they compared the results of searches on FindZebra against the same search on Google applied to the same limited dataset, a feature that is possible with advanced Google searches. Dragusin and co say that the Google results are significantly worse than their own.
For example, on FindZebra the search query “Boy, normal birth, deformity of both big toes (missing joint), quick development of bone tumor near spine and osteogenesis at biopsy” returns the correct diagnosis “Fibrodysplasia ossificans progressiva” as the first result. However, this diagnosis does not appear at all in the results from any type of Google search.
This indicates that the PageRank algorithm, or at least the way Google has tweaked it, is not suited to this kind of search. “Our finding, that FindZebra outperforms Google overall for this task and especially when restricted to the sites of our collection (Google Restricted), suggests that Google ranking algorithm is suboptimal for the task at hand,” they conclude.
Although still a research project, Dragusin and co have made their rare disease search engine publicly available at www.findzebra.com. This could clearly become a valuable tool for the medical community.
What is less clear, however, is how this tool will be used by the general public. The site comes with the forlorn message: “Warning! FindZebra is a research project and it is to be used only by medical professionals” .
FindZebra could obviously be a hypochondriac’s charter. On the other hand, that’s true of any medical dictionary.
The informed public are increasingly visiting their doctors armed with detailed information downloaded form the internet. Any move to improve the quality of this information must surely be of significant value.
Ref: arxiv.org/abs/1303.3229: FindZebra: A Search Engine For Rare Diseases
What’s next for the world’s fastest supercomputers
Scientists have begun running experiments on Frontier, the world’s first official exascale machine, while facilities worldwide build other machines to join the ranks.
The future of open source is still very much in flux
Free and open software have transformed the tech industry. But we still have a lot to work out to make them healthy, equitable enterprises.
The beautiful complexity of the US radio spectrum
The United States Frequency Allocation Chart shows how the nation’s precious radio frequencies are carefully shared.
How ubiquitous keyboard software puts hundreds of millions of Chinese users at risk
Third-party keyboard apps make typing in Chinese more efficient, but they can also be a privacy nightmare.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.