The Right Word
In mid-1994, Marc Wilkins, a student at Australia’s Macquarie University, struggled to find the right words while cobbling together a scientific paper to support his PhD thesis on rapidly identifying proteins. Wilkins found himself repeatedly writing, “all proteins expressed by a genome, cell or tissue,” a phrase he didn’t like. “This was cumbersome, inelegant and made for a lot of extra typing,” explains Wilkins, who now works at Sydney’s Proteome Systems. So he started playing with words that would communicate the protein equivalent of the genome. After discarding “proteinome” and “protome,” he settled on proteome, “the one that seemed to work best and roll off the tongue nicely.”In September 1994, Wilkins referred to the proteome at a scientific conference in Italy, and the word stuck.
Despite the similarities in the words, critical differences separate genomics from proteomics, which give many investigators pause when the two are lumped together. “You can take DNA from anything-yourself, bananas, barnacles-and put it through a machine,” explains Brenner. “That’s because it’s all the same stuff. There are no good techniques to try and handle proteins.”
Proteins are far more complex than DNA on many levels. DNA consists of just four basic building blocks: adenine, guanine, cytosine and thymine. Various combinations of 20 different amino acids make up human proteins. The order in which As, Gs, Cs and Ts string together gives scientists the key to everything there is to know about genes, most of which have the same function: coding for proteins. In contrast, the three-dimensional shapes of proteins determine their functions, which seem endless. Proteins provide the structure of all cells and allow them to move around. They make up the cacophony of messengers that constantly traffic between immune-system cells, ordering some to battle and others to the barracks. They control the firing of neurotransmitters that allows us to think, the contraction of muscles that allows us to move, and the very on/off switches in our genes that allow us to make even more proteins. Proteins blow genes out of the water in sheer numbers, too. The Human Genome Project found between 30,000 and 40,000 genes scattered throughout our chromosomes. Estimates of the number of proteins in humans range from 60,000 into the millions; in other words, no one has a clue.
The relative simplicity and uniformity of DNA allowed scientists to develop powerful, fast, reliable tools to unravel the genome. Genomics owes much of its success to automated DNA sequencing; a state-of-the-art analyzer can sequence one million DNA letters in one day. Scientists also can amplify tiny amounts of DNA for easier study.
Protein scientists, in contrast, have no simple way to amplify, identify, quantify or characterize proteins. Instead, researchers must turn to a series of analytical instruments, few of which have been automated. Most proteomics efforts rely on two-dimensional gel electrophoresis to separate proteins; the technique pulls proteins away from each other based on their charge and mass. Mass spectrometry can then identify the proteins by analyzing their components. A technique called “yeast two-hybrid” tells researchers which proteins may interact with each other, while x-ray crystallography reveals a protein’s three-dimensional structure. In short, no single technology rules the field the way the automated DNA sequencer has genomics, and it can take years to isolate, identify and determine the function of a single protein. There also remains no reliable way to amplify proteins, many of which appear in minute amounts. “And those [low-abundance] proteins are almost certainly the most important ones,” says Brenner.
While the rise of genomics had much to do with the advent of new technologies (see “Under Biology’s Hood,” TR September 2001), the ascent of proteomics has more to do with the limitations of genomics. Genomics companies routinely hunt for drug candidates by comparing which genes are turned on in healthy and diseased tissues or cells. Logically, if a company finds an overactive gene in, say, prostate tissue from a man who had prostate cancer, it might develop a drug that targets the protein that gene codes for. But here’s the rub: a gene’s level of activity can bear little relationship to the amount of the corresponding protein that gets made. “Looking broadly, there’s no correlation,” says Celera’s Patterson. “Some of the correlation is negative, some of it’s positive, and some we don’t understand.” Making such correlations even tougher, one gene can code for multiple proteins. And adding still more complexity, proteins go through elaborate modifications after they are formed, becoming-to name but a few examples-shrouded in sugars, studded with phosphates or cleaved.
For all the hoopla about proteomics, it has a long way to go before it proves itself as a great engine of drug discovery. “There still are very few deliverables on the ground, and people don’t want to admit it,” says Ian Humphery-Smith, a researcher at the Netherlands’ University of Utrecht. And that’s from one of the field’s biggest boosters.