The long-anticipated release of genetic and health data on 500,000 British people last summer by a public consortium in the U.K. is generating a shock wave of genetic discoveries that could speed the development of new drugs and tests, scientists say.
The data from the UK Biobank hit in July, after scientists in charge of 406 different projects had the chance to download several terabytes of data, including DNA data and information on everything from who has diabetes to whether people like coffee or tea.
What’s the relationship between genes and diet? Or between DNA and schizophrenia? With 2,500 different phenotypes—or traits—measured in the British volunteers, on the basis of weigh-ins, surveys, and national hospital records, this data dump is the largest of its kind and the best chance yet to figure it out.
Just a few years ago, linking a gene to a human disease would have made a whole career, and it might have taken as long. Scientists had to painstakingly collect patients and request their DNA, and they often guarded the data like the gold in Fort Knox.
What’s emerging now is a global data exchange where information about the human condition is shared, stored in the cloud, and studied with fast-evolving computer tools. “It’s a sea change in terms of data availability and the open-science movement,” says Benjamin Neale, a geneticist at the Broad Institute, in Cambridge, Massachusetts. “I do think it puts pressure on others to follow suit. There’s a psychological shift here that matters.”
Many private and public gene banks already exist. There’s Amgen-owned DeCode Genetics, which started mining the genes of Icelanders in 1996. In California, 23andMe holds genetic data on more than two million customers of its direct-to-consumer tests—and has sold access to drug companies for millions.
The UK Biobank is different, in part because it’s basically free and managed as a public resource. In July, scientists were provided keys to unlock the data simultaneously. “This absolutely levels the playing field,” says Matthew Nelson, head of genetics at drug giant GlaxoSmithKline. “Instead of spending tens or hundreds of millions to build a biobank, we’re paying a $2,500 access fee.”
The July data giveaway was the result of a plan conceived nearly two decades ago by British scientific and medical leaders who eventually won $250 million in investment by the U.K. government and large charities. During a recruitment drive from 2006 to 2010, half a million middle-aged Britons answered the call to donate their genetic and health information (see “The U.K.’s Biobank Gets Intimate”).
In all, one of every 125 people in the country signed up. “I know loads of people who are in it—neighbors in my street, colleagues,” says Cathie Sudlow, the neurologist who is the UK Biobank’s chief scientist.
All the volunteers had their genes analyzed on a DNA chip. That’s an inexpensive analysis that costs about $50 and detects some 835,000 places where the genes they inherited differ from another person’s. Think of it as a roughly pixelated copy of a person’s complete genome.
Even a low-resolution genome is plenty useful. In less than two days of computer time, Neale says, the statistics whizzes in his lab were able to determine the extent to which human height, diabetes, even how much booze someone drinks could be explained using the British DNA data. Even whether someone watches TV a lot or a little is “at least a little bit heritable,” he says.
There’s no one gene that makes you like TV, and no single “height gene.” Instead, most common human characteristics are shaped by a blizzard of tiny contributions from all across the six-billion-letter human genome. Those are the signals geneticists are using the U.K. data to look for.
For instance, in October, biologists at King’s College, London, crowed that they’d pulled off “the largest genetic study of anxiety to date.” They had compared the DNA of UK Biobank volunteers with their answers to survey questions about things like how worried and afraid they felt. They found four new “hits” on two chromosomes.
Such hits are recorded as a sharp spikes on diagrams scientists call Manhattan plots. Until recently, some of these genetic skylines weren’t any too detailed. For schizophrenia, for instance, hits have been hard to come by, and the disease remains mostly unexplained. But with more people’s DNA to study, the plots are starting to bristle. Anyone can look up the results using newly created online services, like the Global Biobank Engine put up by Stanford University.
At GSK, the British drug company, the U.K. data is being used for “reverse” searches. That is, instead of trying to find every gene involved in one disease, Nelson, the company’s genetic chief, says his team wants to trace every health effect of blocking or boosting a single gene, as they might do with a drug.
And it turns out that medical records of people with natural mutations in a given gene provide clues. Did they have diabetes less often? More heart attacks? It’s like seeing what a drug does before anyone even swallows it. “That’s going to be the main use case of the biobank for drug companies,” says Nelson.
Recommended for You
Despite selling $30 billion worth of pills a year, GSK “never had the resources” to build a private data bank of its own, says Nelson. But since the summer, GSK says, it is suddenly on par with competitors that did. After DeCode, the Icelandic company, reported an important discovery connected to asthma in March, Nelson says his team was later able to repeat the result “in a matter of hours.”
Public gene data is expected to proliferate even further. In China, the Kadoorie Biobank has been recruiting half a million adults from 10 regions of the country, asking them about tea drinking, exposure to pollution, sleep, and alcohol. In the U.S., there’s the Million Veteran Program run by the Department of Veterans Affairs as well as the Obama-era precision medicine initiative, now called AllofUs.
Some startup entrepreneurs say the DNA findings are also going to increase the number, and quality, of direct-to-consumer DNA tests. Such tests, which often try to use DNA to tell people about their diet or fitness, have been criticized by some doctors as medically dubious guesswork.
Yet with half a million new subjects to study, DNA predictions about people’s health risks will get more accurate. “We are furiously trying to get access to that data as quickly as we can,” says Chris Glode, CEO of HumanCode, a company developing DNA “entertainment” tests, including one called BabyGlimpse that tells two people, on the basis of their DNA, what their kids would look like. “The reason it’s so hot right now is it’s the first broadly available data for training models. It’s something we can throw computers, hypotheses, against to see what we can come up with.”
Sudlow, the Biobank science chief, says she doesn’t think the project’s data will be able to tell doctors who will get sick, with what, and when. “The majority of diseases are not predictable from DNA alone. I think we are a very long way from that,” she says. “I think the role is to discover disease mechanisms, rather than working out which individuals will develop which disease.”
The July data release isn’t the culmination of the project. In fact, the data collecting will end only when the last volunteer has succumbed to dementia, cancer, or another cause of death. “I think our kids are going to answer the most interesting questions,” she says.
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today