In an effort to vault genetics into a new era of big data, six drug companies say they will decode the genes of half a million Brits and then make the data public—all by 2020.
The plan will turn the UK Biobank, the source of the DNA samples, into the world’s single biggest concentration of genetic and health data anywhere, giving scientists and drug companies a powerful tool for understanding diseases.
The UK Biobank is already a treasure trove: a public database containing carefully assembled medical records, test results, and even psychological assessments the country has collected from 500,000 volunteers.
Its first big release of data last July—anonymized to protect people’s identities—electrified scientists. With a click of a mouse, they can inspect the genetic basis of everything from diabetes to TV-watching habits.
But the genetic data it contains are limited. Now the sequencing consortium plans to decode all 20,000 or so genes of each volunteer. Such “exome” sequencing falls short of decoding the complete genome, but it captures the parts most important to drug makers—the genetic sequences that code for proteins, the building blocks of life and, when they go awry, the cause of most health problems.
Adding those gene sequences would “increase [the Biobank’s] value 100 times for drug development, and 10 to 100 times for biology,” says Sek Kathiresan, who studies the genetics of heart disease at the Broad Institute in Cambridge, Massachusetts.
The expanded database will, for instance, make it much easier to locate rare genetic mutants whose bodies suggest ideas for new drugs.
Regeneron Pharmaceuticals, the company leading the sequencing consortium, based an anti-heart-attack drug, Praluent, on the 14-year-old chance discovery that certain unusual people who lack a working version of one gene, PCSK9, have incredibly low cholesterol.
That kind of serendipity is being made commonplace by methods that automatically scour data on hundreds of thousands of people. “It takes me and my browser 30 seconds to rediscover PCSK9,” says Regeneron CSO George D. Yancopoulos. “And I think there are thousands of those in the genome.”
The gene decoding effort will occur at Regeneron’s sequencing facility in Tarrytown, New York. Regeneron will be joined by five other drug firms—AbbVie, Alnylam, AstraZeneca, Biogen, and Pfizer—that will each contribute $10 million for the effort. The companies will have private access to the newly created data for 12 months, but then it will become public.
“There is so much information here, there is no one company that can take advantage of all of it,” says Yancopoulos. “We think of this as adding accelerant to science.”
The joint project is not only big news for biomedical science, however. It also widens an already yawning gap between Britain and the U.S., whose own large population study is far behind. According to the National Institutes of Health, the million-person “AllofUs” study has recruited 10,000 people but has not yet sequenced anyone’s DNA.
A spokesperson for the NIH said the $4.3 billion program is “on target” with its goals and plans a national launch this spring.
Yancopoulos calls the slow start by the U.S. “a national embarrassment.” The U.K. data trove is set to dominate “for the foreseeable future, the next five to 10 years,” he says. “It’s going to be the best resource. It’s the first place people will go.”