Jay Knowles is enjoying himself. The biotech executive sits in his San Diego office, where he directs business operations for Structural GenomiX (SGX). SGX, a startup company, has raised nearly $40 million in venture capital since its founding a year ago and is now turning away investors. “Being the next wave of the genomics business, everyone’s flocking to give us as much money as they possibly can,” Knowles boasts. And, he adds, several large pharmaceutical firms are eager to buy SGX’s product: three-dimensional protein structures, those intricate models with loops and whorls that lend a touch of the fanciful to the pages of scientific journals like Science and Nature. “We have lots of deals on the table,” says Knowles.
Suddenly, Knowles’ boss, president Tim Harris, bursts into the room. “Vertex has just cut a structural genomics deal with Incyte,” he says. “Bastards. This is war.”
Knowles is stunned. Before coming to SGX, he was director of R&D planning for Vertex Pharmaceuticals, the visionary Cambridge, Mass., drug company that specializes in structure-based drug design-the construction of medicines atom by atom, fitting drugs like finely cut jewels into protein settings. Out of courtesy to his former employer, Knowles has avoided raiding Vertex for employees. But if Vertex, with the help of Palo Alto, Calif.’s high-flying Incyte Genomics, wants to compete head-to-head with SGX in the emerging field of “structural genomics” (the large-scale discovery of precious protein structures), the truce is history. “Anybody you want to pull out of that place, go ahead,” says Harris. “The gloves are off.”
A few frantic phone calls later, however, and it’s clear that it’s a false alarm. Vertex has merely bought access to Incyte’s stockpile of human genes. But Harris and Knowles remain on red alert, prepared to battle any company that threatens SGX’s early lead in a field that they claim will revolutionize drug discovery. “We’ll see other groups jump on the bandwagon,” says Harris. “You can bet your life.”
Molecules to Medicine
SGX’s ambitious goal is to automate the production of information about protein structures, and sell the results to large drug companies. It’s a business plan with obvious precedent. After all, it was the introduction of high-speed machines for DNA sequencing that allowed quick-moving and well-financed companies such as Incyte to amass private empires of genetic data. Using the same technology, the public-sector international Human Genome Project is expected to publish (sometime this year, ahead of schedule) a working draft of the complete human genetic makeup.
That’s where Knowles’ interrupted pitch picks up: Mass production of protein structure, he believes, is the natural successor to mass production of DNA sequence. After all, those 100,000 or so genes are merely blueprints for making proteins-the versatile molecules that perform nearly every vital function in our bodies. Yet the function of most proteins remains unknown. Thus, the urgent task is to figure out what proteins do and how they work, and there is no better starting point than their three-dimensional shape. Until now, however, “solving” these protein structures has been a notoriously difficult undertaking.
Automating could yield big gains, not only for basic science, but also for what Harris calls the “bloody hard” business of discovering new drugs. According to this 20-year industry veteran, a significant jump in the number of available protein structures could transform how drugs are created.
Today, the vast majority of drugs are still found by hit-and-miss methods, albeit on a massive scale. The world’s top pharmaceutical companies have sunk billions into automated systems that can synthesize and test hundreds of thousands of chemical compounds a week, hoping to turn up a few “hits” against a protein target. (Most drugs on pharmacy shelves work by attaching to proteins, activating or disabling them.)
Structural genomics proposes turning traditional drug discovery on its head, putting protein structures first and using them to design new drugs from the ground up, a process known as “rational drug design” or “structure-based drug design.” Instead of relying on luck, with a three-dimensional structure as a starting point chemists can use the details of its shape to create a chemical compound that fits precisely. Drugs that result should, in theory, be exquisitely specific, avoiding the side effects that often doom otherwise promising compounds to the pharmaceutical dustbin.
Although structure-based design has led to some breakthrough medications, such as HIV protease inhibitors (including Vertex’s Agenerase) and Glaxo’s flu treatment Relenza, the list quickly dwindles after that. Drug firms still largely rely on the mass screening approach, in part because protein structures are so hard to come by.
Mass production of protein structures could change all that. Today, even the biggest pharmaceutical companies only manage about twenty new protein structures a year. Yet, by 2003, SGX proposes to solve that many each week-every week. It’s an incredibly audacious plan, but Harris is confident. “I know that if we don’t do it, some other bugger would,” he says. “Because it’s absolutely there waiting to be done. It’s the next step.”
In fact, Harris already has competition. Less than a mile away as the crow flies, but separated from SGX by a deep ravine, is the Genomics Institute of the Novartis Research Foundation, a non-profit with close ties to Swiss drug giant Novartis (See “The Bell Labs of Biology,” TR March/April 2000). As TR went to press, scientists at the Genomics Institute were about to unveil a new spin-off venture called Syrrx that plans to harness automation and robotics “to solve [protein] structures at what would be considered, in years past, almost an impossible rate,” says director of business development Ned David.
Other contenders include Astex, in Cambridge, England, and Princeton, N.J.-based Structure Function Genomics. More are coming. “It’s going to be very crowded,” says Harren Jhoti, acting CEO of Astex.
Nor is it only the private sector that’s interested. The National Institutes of Health (NIH) recently launched the Protein Structure Initiative, which will spend up to $125 million in its first five years. The governments of Canada, Germany and Japan are also planning major structural genomics initiatives, and overall spending could eventually rival the multibillion-dollar Human Genome Project. The NIH alone hopes to generate 10,000 structures by the end of the decade.
The startup companies are even more ambitious. By 2003, Syrrx projects, it will be solving about 1,000 structures a year. SGX, if it meets targets, will be doing 1,350. The numbers go up from there. These are truly bold figures given the painfully slow pace at which three-dimensional protein structures have been unraveled in the past. For perspective, consider that in almost half a century since the first protein structure was solved in 1957 (the muscle protein myoglobin), a total of only about 2,000 unique protein structures have been deposited in the Protein Data Bank, the international structure repository.
Indeed, the promises being made by SGX and Syrrx are so far out of line with past experience that in the eyes of some experts they amount to fantasy. “Completely unreasonable,” says prominent University of California, Irvine structural biologist Alex McPherson. “The technology is simply not there, and it’s not going to be there for a fairly long time. I just don’t understand where they’re getting numbers like that.”
A Daunting Challenge
McPherson has reason to be skeptical. Mass-producing protein structures is going to be a lot tougher than DNA sequencing. DNA is a simple linear code of four chemical letters, while proteins are composed of twenty different amino acids and fold into complex, largely unpredictable arrangements of sheets and loops. Although scientists have long tried, with the help of computers, to predict protein structure directly from the DNA blueprints, they’re still a long way off, even for the simplest proteins (See sidebar: “Blue Gene vs. Proteins”).
Instead, both Syrrx and SGX will be attempting to automate the most widely used empirical approach, known as X-ray crystallography. With this method, a protein is first purified, then coaxed to form a crystal. At that point, scientists shoot concentrated radiation into the crystal, exploiting the pattern of scattered rays to reconstruct an atom-by-atom model of the protein in its crystalline form.
Although that process may sound straightforward, it’s not. Many proteins, for instance, are extremely difficult to isolate, and the crystallization process itself is anything but cookbook. Temperature, acidity and salts must be fine-tuned to cajole a tiny, delicate crystal out of solution. Finally, converting the X-ray data into a three-dimensional model of a protein’s shape is often an “agonizingly difficult” problem, says University of California, San Diego crystallographer Lynn Ten Eyck. “There’s a lot of human judgment applied to that problem at present,” he says. “And human judgment takes time.”
Filling the Pipeline
The founders of both SGX and Syrrx have the credentials to back their claims. Colleagues credit SGX scientific co-founders, Columbia University biophysicists Barry Honig and Wayne Hendrickson, with coining the term “structural genomics.” Hendrickson is a celebrated crystallographer who invented an ingenious method for tackling otherwise unsolvable proteins. Honig, who has written widely used computer programs for analyzing and predicting protein structures, knew that more solved structures would make his job much easier. “These [prediction] methods depend on data,” he points out.
There’s no single breakthrough technology that’s made structural genomics feasible. Rather, a combination of more DNA sequence data, powerful X-ray “beamlines” and high-speed computers have sped up the entire process and made it more reliable. By the late 1990s, says Honig, he and Hendrickson had concluded “the time was ripe” for a concerted assault on the protein universe.
The idea of industrializing crystallography also fired the imagination of Ray Stevens, a chemist who invented a novel “micro-crystallization” system for making protein crystals from droplets up to a hundred times smaller than usual-a key element in Syrrx’s system. Working feverishly within the walls of the Novartis Institute, Stevens and a team of engineers have already built a prototype system for fully automating X-ray crystallography. Machines sporting intricate mazes of glass flasks, rubber tubing, test tubes and electrical harnesses include a crystallization robot that can perform 139,000 experiments a day and process a million time-lapse photo images of crystal growth. “If you do this one protein at a time, there is no way in the world you can reach those numbers,” Stevens says. “Everything is set up in parallel.”
That doesn’t mean everything will work. “We’re going to have a high failure rate at first,” Stevens readily admits. In 2002, Syrrx’s first year of full-scale production, fewer than two percent of its proteins are expected to yield three-dimensional structures. But since the company plans to test 60,000 proteins, that still means nearly 1,000 successful structures coming out the end of the pipe.
A thousand novel structures a year would, in Ned David’s words, “make drug discovery move at a genomics pace.” While SGX is compiling a database of structures to sell to pharmaceutical companies, Syrrx will go a step further by developing lead drug compounds using computer-based rational design techniques. To do this, it will model virtual libraries of chemicals for proper fit to protein structures. “We’ll be able to dock 200,000 compounds [to a protein] in under a day,” says David. “The value here is making drugs, and making them fast.”
That’s the promise. In the near term, however, both startup firms will simply be racing to generate as much structure data as possible. That’s because, as on the Internet, commercial success may go to whoever gets there first with the most. “We have momentum, and we have first-mover advantage,” says Harris, whose company has found about a dozen structures in its first six months of operation. “Believe me, we are going to exploit that, ruthlessly. I like that word, because I mean it.”
Keeping the Peace
While the startups jockey for advantage, the government-backed projects plan to assemble a body of common knowledge that could help all researchers-public and private-gain access to protein structures. In mid-April, the Wellcome Trust, the giant British biomedical research foundation, hosted a meeting of government scientists and academics in Cambridge, England, to work out ground rules for a coordinated worldwide effort to discover protein structures en masse. The NIH hopes that 10,000 proteins, carefully chosen, will be enough to catalogue roughly 1,000 different protein “folds,” the basic types of loops and twists common to all proteins.
Once every fold is in their library, scientists should be able to use computers to predict, with reasonable accuracy, the structure of any of the remaining 90,000 or so human proteins directly from DNA sequence. That will be a critical step in giving meaning to the raw DNA data generated by the genome projects. “In the past, we found the function of a protein and then found the structure,” says the NIH’s John Norvell. “Now we’ll be in the position to find the structure and ask, ‘What does this protein do?’”
With both private and public sectors diving into structural genomics, some worry that a poisonous competition will develop. An obvious precedent: the Human Genome Project versus Celera Genomics, the Rockville, Md., company that appears poised to win the race to complete the human DNA sequence (See “The Gene Factory,” TR March/April 1999). Efforts to work together were dashed by Celera’s refusal to share its private data on the government’s terms.
Could the same scenario unfold in structural genomics? “I suspect there will be some friction,” says Phil Bourne, who co-directs the Protein Data Bank. Open publication of data will likely again be the flash point for conflict. The NIH (and its international partners) agreed at the Cambridge meeting on quick release of information into the public domain.
SGX executives say they will have to protect their structures, through both secrecy and patents. Stevens, however, says Syrrx is moving to avoid conflict by taking the extraordinary step of depositing much of its data into the Protein Data Bank. “There’s a lot of lessons we can learn from Celera and the Human Genome Project,” says Stevens. “We would like not to make those mistakes….The information should become public.”
Not that Syrrx is giving away the store. The company will keep certain structural details important for drug development under wraps, and it has also filed patents on its robots. But Stevens has promised that Syrrx will let the government use these for a nominal fee. “This is a two-way street with the public effort,” he insists.
Whether or not it’s possible to avoid public-private strife, it’s clear that structural genomics is acquiring tremendous momentum. For crystallographers, that means dramatic changes ahead. Lynn Ten Eyck, who has been a card-carrying member of this insular field for thirty years, sees the writing on the wall. “These automated systems will just steamroll anyone who’s not using them,” he says. “It’s like the Industrial Revolution.”
Ten Eyck does not intend to fight progress. In fact, he’s joined a group of fellow academics seeking a structural genomics grant from the NIH. As Ten Eyck sees it, mass production of protein structures is the inevitable next step in biology’s rapid transformation from a basic science into an “engineering discipline.” The payoffs should include not only structure-based drugs, but also better diagnostics and perhaps even the ability to reverse birth defects. “There’s a vast array of things you can do if you actually understand the biology well enough,” says Ten Eyck. “This is not something that’s going to happen tomorrow, but we’re watching the transition start.”
Structural Genomics Becomes International Big Science Initiative Highlight Protein Structure Initiative $125 million-plus effort funded by the U.S. National Institutes of Health to solve 10,000 protein structures in 10 years Protein Structure Factory The German Ministry for Research and Technology is funding several academic teams to perform high-speed structure analysis of medically important proteins NMR Park Project Japan’s Institute of Physical and Chemical Research (RIKEN) is using NMR to determine the structure of mouse proteins Structural Biology Industrial Platform Several major European pharmaceutical firms are part of this 16-company consortium in structural genomics Structural Diversity Pilot Academic collaboration led by Rockefeller University Project