Larry Hunter had just moved into his new office when a reporter visited, so the room lacked knickknacks and family snapshots. Hunter had, however, started unpacking his books, and they were already beginning to form an interesting pattern. Roger Schank’s Dynamic Memory, a classic title in artificial intelligence, was shelved next to Georg Schulz’s Principles of Protein Structure. Machine Learning flanked Oncogenes. Artificial Life leaned on Medical Informatics.
Properly interpreted, the pattern on Hunter’s bookshelf reveals the latest trend in biology, a field now so overwhelmed by information that it is increasingly dependent on computer scientists like Hunter to make sense of its findings. An expert in an offshoot of artificial intelligence research known as machine learning, in which computers are taught to recognize subtle patterns, Hunter was recently lured from a solitary theoretical post in the National Library of Medicine to head the molecular statistics and bioinformatics section at the National Cancer Institute (NCI)-a group formed in 1997 to use mathematical know-how to sift the slurry of biological findings.
Where is all the data coming from? The simple answer is that it’s washing out of the Human Genome Project. Driven by surprise competition from the commercial sector, the publicly funded effort to catalog the estimated 100,000 human genes is nearing its endgame; several large academic centers aim to finish a rough draft by next spring. By then, they will have dumped tens of billions of bits of data into the online gene sequence repository known as GenBank, maintained by the National Center for Biotechnology Information (NCBI) at the National Institutes of Health (NIH) in Bethesda, Md. And DNA sequences aren’t the only type of data on the rise. Using “DNA chips,” scientists can now detect patterns as thousands of genes are being turned on and off in a living cell-adding to the flood of findings.
“New kinds of data are becoming available at a mind-blowing pace,” exults Nat Goodman, director of life sciences informatics at Compaq Computer. Compaq is one of many companies seeking an important commercial opportunity in “bioinformatics.” This congress of computers and biology is a booming business, but has so far revolved mostly around software for generating and managing the mountain of gene data. Now, pharmaceutical companies need ever-faster ways to mine that mountain for the discoveries that will lead to new treatments for disease.
That’s where entrepreneurial researchers such as Larry Hunter come in. On Hunter’s bookshelf sits a glass bauble reading: “$2,000,000 Series A Preferred. March 5, 1999”-a celebration of venture capital funds raised by Molecular Mining, a company he co-founded. The firm, based in Kingston, Ontario, hopes to use data-mining methods to help pharmaceutical companies speed the development of new drugs by identifying key biological patterns in living cells-such as which genes are turned on in particularly dangerous tumors and which drugs those tumors will respond to. And a dozen other startups-the biotech industry’s best indicator of a hot trend-have been formed to make data-mining tools (see “The Genome Miners”). “Biology,” Hunter predicts, “will increasingly be underpinned by algorithms that can find hidden structure in massive amounts of molecular data.” This kind of data-mining work, which Hunter specializes in, is often known as “pattern recognition” and it’s one of the fastest-moving areas in bioinformatics. Indeed, if Hunter is right, pattern recognition might turn out to be the pick that brings forth the gold of new therapies.
The Genome Miners
A sampling of companies specializing in pattern-recognition software.
Company Location Highlight Bioreason
(private) Santa Fe, N.M. Artificial intelligence software makes sense of chemistry data. Compugen
(private) Tel Aviv, Israel Ex-Israeli defense contractors are scoring big in genetic data-mining. Customers include U.S. Patent Office. IBM
(public) Armonk, N.Y. Advanced pattern-recognition algorithms power a 1997 Monsanto alliance for protein discovery. Lion Bioscience
(private) Heidelberg, Germany $100 million pact with drug giant Bayer sets a bioinformatics record. Molecular Mining
(private) Kingston, Ontario Raised $2 million in startup funds from venture capitalists in March. Neomorphic
(private) Berkeley, Calif. Hidden Markov models are among this 1996 startup’s advanced gene-finding tools. Partek
(private) St. Peters, Mo. Neural networks specialists moved into biology market in 1998. Silicon Genetics
(private) San Carlos, Calif. Stanford spinoff mines gene data for profit. Silicon Graphics
(public) Mountain View, Calif. Mine Set visual data-mining tool is popular in the financial, telecom and drug industries.