The Proteomics Payoff

Now that the human genome project is done, proteins are set to displace genes as the new darlings of drug discovery. But are biologists up to the task?

Jon Cohenarchive page

October 1, 2001

On June 26, 2000, President Bill Clinton and Prime Minister Tony Blair jointly announced that researchers had completed the first draft of the human genome, a map that spelled out the three billion letters of the genetic code. “Without a doubt, this is the most important, most wondrous map ever produced by humankind,” said Clinton. Blair was equally effusive. “Let us be in no doubt about what we are witnessing today-a revolution in medical science whose implications far surpass even the discovery of antibiotics, the first great technological triumph of the 21st century,” said Blair.

But neither leader uttered a word that would soon take over the allure and promise that “genome” once enjoyed everywhere from the White House to Wall Street: “proteome.”

Just as genomics is the attempt to decipher all of the genes in an organism, proteomics, in its simplest definition, aims to uncover all of the proteins and their functions. Since genes are simply the blueprints for proteins, which in turn are the main players in most of the body’s functions, it’s a logical progression. Indeed, there is no mistaking what proteomics promises: a revolution in medical science with implications that far surpass those of genomics.

Sounding an awful lot like the genomics gurus of yesteryear, proponents of proteomics declare that a “global understanding” of proteins will reveal the underlying mechanisms of disease, leading drugmakers to treatments that ablate causes rather than mask symptoms. Companies will discover a bounty of natural proteins that can serve as injectable drugs, the advocates assure, as well as an abundance of new protein targets for the “small-molecule” pills that are the cornerstone of the pharmaceutical industry. Side effects will plummet as the precision of treatments increases. A finer appreciation of the differences between the proteomes of individuals will allow doctors to tailor treatments to specific populations. And as new technologies emerge-your entire proteome on a chip?-medicine will advance in ways that even the most farsighted visionaries cannot imagine.

All of which has helped proteomics replace genomics as biology’s new new thing. “Genomics is dead,” declares N. Leigh Anderson, who heads Large Scale Biology’s proteomics subsidiary in Germantown, MD.
Anderson and other proteomics enthusiasts argue genomics provides only rough clues about the workings of the body. And they question the many scientists who tightly tied the deciphering of the human genome to drug discovery. “To some extent, they’ve sold the public a bill of goods in genomics,” says Anderson. Scott Patterson, who leads the proteomics project at Rockville, MD-based Celera Genomics, similarly sees serious shortcomings in some of the most celebrated drug-hunting strategies used by genomics companies-like studying arrays of genes without considering that the body modifies the proteins they code for in myriad ways. “Maybe everyone forgot their microbiology,” says Patterson, whose company infamously raced (and goaded) the publicly funded Human Genome Project.

Investors, it appears, have not. Since June 26, 2000, more than $700 million has poured into proteomics companies from venture capitalists and IPOs. In addition to the swarm of startups that focus on proteomics, many genomics companies now have proteomics branches, and most every big pharmaceutical company has a proteomics-oriented biotech partner or has started its own proteomics division. And because proteomics makes heavy demands on computing power, deep-pocketed cyberstalwarts like IBM, Hitachi, Oracle, Compaq Computer and Sun Microsystems have joined in, too.

Still, not everyone shares the excitement. Some leading scientists worry that yet another bill of goods is being writ before their eyes. Sydney Brenner, who helped launch the Human Genome Project, says the proteomics craze is not about new knowledge but about amassing data-most of which he predicts will have no impact on drug discovery. “I think there will be a backlash,” says Brenner, now at the Salk Institute for Biological Studies in La Jolla, CA. “I really think people will come to their senses. Science will just walk around them. [Proteomics] will prove to be irrelevant.”

Even some proteomics leaders grimace at its popularity. “Today, many people try to use the word proteomics,’ and I wish people did not like it so much,” says Denis Hochstrasser, who does proteomics research at the University of Geneva in Switzerland and chairs the scientific advisory board for Evanston, IL-based startup GeneProt. “People expect too much from a buzzword, and they don’t realize what’s behind it.”
For all its promise, proteomics remains strapped by serious limitations. The technologies to isolate and characterize proteins are still cumbersome and insensitive. Add to that the sheer vastness of the human proteome, and deciphering it presents an arrestingly complex mission. And while some academic and industry researchers have launched targeted efforts to tackle small bits of the puzzle, none resemble the organized effort to decode the human genome. “It’s difficult to conceive of the idea of a human proteome project,” says Celera’s Patterson. “I just don’t know when you’d ever say you finished. It’s bad enough trying to figure out if you’ve finished the human genome project.”

Yet the completion of the human genome does give proteomics an unambiguous starting point. Many new technologies have sprouted in the past few years that make it easier to find and identify proteins. And the detailed description of the human genome provides researchers working in proteomics a powerful new tool to chart, at least in a broad sense, the many technological and organizational challenges that lie ahead.

The Right Word

In mid-1994, Marc Wilkins, a student at Australia’s Macquarie University, struggled to find the right words while cobbling together a scientific paper to support his PhD thesis on rapidly identifying proteins. Wilkins found himself repeatedly writing, “all proteins expressed by a genome, cell or tissue,” a phrase he didn’t like. “This was cumbersome, inelegant and made for a lot of extra typing,” explains Wilkins, who now works at Sydney’s Proteome Systems. So he started playing with words that would communicate the protein equivalent of the genome. After discarding “proteinome” and “protome,” he settled on proteome, “the one that seemed to work best and roll off the tongue nicely.”

In September 1994, Wilkins referred to the proteome at a scientific conference in Italy, and the word stuck.

Despite the similarities in the words, critical differences separate genomics from proteomics, which give many investigators pause when the two are lumped together. “You can take DNA from anything-yourself, bananas, barnacles-and put it through a machine,” explains Brenner. “That’s because it’s all the same stuff. There are no good techniques to try and handle proteins.”

Proteins are far more complex than DNA on many levels. DNA consists of just four basic building blocks: adenine, guanine, cytosine and thymine. Various combinations of 20 different amino acids make up human proteins. The order in which As, Gs, Cs and Ts string together gives scientists the key to everything there is to know about genes, most of which have the same function: coding for proteins. In contrast, the three-dimensional shapes of proteins determine their functions, which seem endless. Proteins provide the structure of all cells and allow them to move around. They make up the cacophony of messengers that constantly traffic between immune-system cells, ordering some to battle and others to the barracks. They control the firing of neurotransmitters that allows us to think, the contraction of muscles that allows us to move, and the very on/off switches in our genes that allow us to make even more proteins. Proteins blow genes out of the water in sheer numbers, too. The Human Genome Project found between 30,000 and 40,000 genes scattered throughout our chromosomes. Estimates of the number of proteins in humans range from 60,000 into the millions; in other words, no one has a clue.

The relative simplicity and uniformity of DNA allowed scientists to develop powerful, fast, reliable tools to unravel the genome. Genomics owes much of its success to automated DNA sequencing; a state-of-the-art analyzer can sequence one million DNA letters in one day. Scientists also can amplify tiny amounts of DNA for easier study.

Protein scientists, in contrast, have no simple way to amplify, identify, quantify or characterize proteins. Instead, researchers must turn to a series of analytical instruments, few of which have been automated. Most proteomics efforts rely on two-dimensional gel electrophoresis to separate proteins; the technique pulls proteins away from each other based on their charge and mass. Mass spectrometry can then identify the proteins by analyzing their components. A technique called “yeast two-hybrid” tells researchers which proteins may interact with each other, while x-ray crystallography reveals a protein’s three-dimensional structure. In short, no single technology rules the field the way the automated DNA sequencer has genomics, and it can take years to isolate, identify and determine the function of a single protein. There also remains no reliable way to amplify proteins, many of which appear in minute amounts. “And those [low-abundance] proteins are almost certainly the most important ones,” says Brenner.

While the rise of genomics had much to do with the advent of new technologies (see “Under Biology’s Hood,” TR September 2001), the ascent of proteomics has more to do with the limitations of genomics. Genomics companies routinely hunt for drug candidates by comparing which genes are turned on in healthy and diseased tissues or cells. Logically, if a company finds an overactive gene in, say, prostate tissue from a man who had prostate cancer, it might develop a drug that targets the protein that gene codes for. But here’s the rub: a gene’s level of activity can bear little relationship to the amount of the corresponding protein that gets made. “Looking broadly, there’s no correlation,” says Celera’s Patterson. “Some of the correlation is negative, some of it’s positive, and some we don’t understand.” Making such correlations even tougher, one gene can code for multiple proteins. And adding still more complexity, proteins go through elaborate modifications after they are formed, becoming-to name but a few examples-shrouded in sugars, studded with phosphates or cleaved.

For all the hoopla about proteomics, it has a long way to go before it proves itself as a great engine of drug discovery. “There still are very few deliverables on the ground, and people don’t want to admit it,” says Ian Humphery-Smith, a researcher at the Netherlands’ University of Utrecht. And that’s from one of the field’s biggest boosters.

Human Proteome Project?

Humphery-Smith heads Glaucus Proteomics in the Netherlands and cofounded the Human Proteome Organisation, a nascent effort to get a human proteome project underway. “When you look at the start of the Human Genome Project, you had all this rhetoric about why it wouldn’t work,” says Humphery-Smith. “And I’m sure that we’re ahead of where the Human Genome Project was in 1988. For one thing, we don’t have to wait for funds: there are two billion dollars available now in industry and academia.”

In June, the organization named its first president, Sam Hanash, a cancer proteomics researcher at the University of Michigan. Hanash tries to frame the group’s mission in realistic terms. “It’s impossible to conceive of a human proteome project that would be exhaustive, that covers everything that one would want to know about in relationship to the proteome,” says Hanash. “But even if we cannot define a project that has an end, there’s still a needto define some components of an ill-defined project.”

To Hanash, a human proteome project would describe all of the proteins and in what quantities they are expressed in all of the tissues of the body. Another way to look at the problem, then, is that each tissue has its own proteome. “You’re talking about doing the Human Genome Project hundreds of times over,” says Hanash. “It’s very unrealistic to want to have as an objective for any one group or any one body to accomplish that. You’re not going to see anyone say, We plan to complete the human proteome project,’ the way Celera did with the genome project. It’s not going to happen. If someone makes claims of that sort, they’re misleading the world.”

The Human Proteome Organisation’s vision is to accomplish what one group cannot by coordinating what amount to multiple proteome projects that can feed off each other. Its goal is to catalogue every distinct human protein, all protein-protein interactions and levels of proteins in different cells and tissues. The organization would like to see all of this done in both healthy and diseased tissues and cells. “There’s a need to deconvolute this complexity,” says Hanash. “Things are disorganized. If there’s no consensus to emerge, no coordination, it will be too much of a frontier mentality.”

Still, the prospect of a human proteome project is held back by a fundamental problem. Proteomics suffers from a technology gap that does not yet allow for the high-throughput, “massively parallel” analyses that have become the trademark of genomics. “The Human Genome Project was a particularly simple and get-your-hands-around-it definable goal, while proteomics is a far more amorphous and expandable area where we don’t have breakthrough technologies yet,” says biochemist Roger Tsien, whose own lab at the University of California, San Diego, is pushing forward the ability to image proteins as they move about cells (see “Candid Camera,” sidebar).

The Fine Print

No proteomics companies have agendas that come anywhere near the type of comprehensive analysis favored by the Alliance for Cellular Signaling. But the companies’ bold pronouncements about their projects would seem to indicate otherwise. Myriad Genetics of Salt Lake City, for example, announced in April that it had formed an alliance with Hitachi and Oracle “to map the human proteome in less than three years.” Large Scale Biology is compiling a Human Protein Index that it boasts is “the protein equivalent of the Human Genome Project, in which all the proteins expressed by every human cell type are being documented.” Celera’s founder, J. Craig Venter, once declared that his company’s proteomics division would work through “every tissue, organ and cell.”

Many proteomics researchers blanch at these declarations. “Those are exaggerated claims, and you have to read the fine lines,” says Human Proteome Organisation president Hanash.

What the fine lines reveal is that the more than 70 companies involved with proteomics are each exploiting limited technologies and therefore have sliced relatively small pieces of the human-proteome pie. Myriad’s definition of proteomics, says executive vice president of research Sudhir Sahasrabudhe, is cataloguing all of the interactions between proteins that it can discover with yeast two-hybrid and mass spectrometry approaches. So much for a map of the human proteome in less than three years. In fact, Myriad has no intention of cataloguing all of the proteins in a human and their modifications. “The technology for conducting a comprehensive inventory of all proteins in a biological model is not there,” says Sahasrabudhe. “You really just begin to skim the surface.”

Meanwhile, Large Scale Biology has catalogued 115,000 proteins derived from 157 “medically relevant” tissues in its Human Protein Index. Yet no one knows just how many medically relevant tissues there are. “We’re in a quandary about that,” acknowledges Anderson. “It’s going to be an elastic concept,” he says, based on the definition of what a separate tissue is. But whatever the definition, the effort seems far short of the boast of “the protein equivalent of the Human Genome Project.” Anderson stresses that his company’s goal is to develop new diagnostics and drugs rather than to know everything that can be known. “The Human Protein Index is trying to get down to a rational point that means something,” he says.

Celera, too, now has much more circumscribed goals than finding all of the proteins from every tissue, organ and cell. Instead, the company studies tissues and cells from people with specific diseases and hunts for proteins found in the membranes that surround cells; such proteins are often susceptible to drugs. “We’re looking in a very targeted way for potential therapeutics,” says Patterson.

Other proteomics companies bill themselves-at least in the fine print-in relatively restricted terms as well. Some target proteins that work as enzymes, the molecular scissors that can cut other proteins and either cause or prevent disease. Other companies hunt for antibodies, the Y-shaped immune warriors that can glom onto dangerous proteins and render them harmless. Some, like Rockville, MD-based Human Genome Sciences, look for proteins secreted from cells (see “Consulting Biotech’s Oracle,” p. 70). Still others carve out a bioinformatics niche, combing the literature to create novel protein databases, developing software to make sense of huge databanks, or helping companies design experiments to quickly find promising drug candidates.

And a dozen companies have invested heavily in developing the next hot assay in proteomics, protein affinity arrays (see “Protein Chips,” TR May 2001). These protein chips set up microgrids of protein fragments or molecules like antibodies that can trap proteins; one day, they may offer a fast, reliable way to compare hundreds of proteins found in, say, healthy and diseased tissue, giving researchers clues about how diseases develop and how drugs work.

Sydney Brenner, building on the oft-used analogy that the human genome data resemble the names and addresses found in a phone book’s white pages, says all of these proteomics research efforts are attempting to compile yellow pages. “It’s classification: we’re trying to find all the plumbers,” says Brenner. But he emphasizes that a true “global understanding” of the proteome will require much more. “We have to begin thinking about getting beyond the mere lists and the interactions,” says Brenner. “That’s as opaque as the original data. That’s going to be the critical thing for biology. And it’s not going to happen overnight.”
The University of Geneva’s Hochstrasser similarly sees the task as daunting. “It’s funny when you think about it,” Hochstrasser says. “People have sent a man to the moon. We have the entire human genome sequenced. But we do not know how many proteins we have in blood.” And we may never know. “It’s like the DNA world is finite, but maybe the protein world is infinite,” he says.

So proteomics researchers may never enjoy their version of June 26, 2000, a day where they bask in the glory of heads of state exclaiming over the completion of their “map.” But already, proteomics is revolutionizing the way scientists hunt for new medicines, and given the many diseases for which no good treatments exist, the ultimate payoff could be much larger than crossing a finish line.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.