What’s the chance police could locate the average person using a public DNA database?
I recently made a bet on what the answer would be. Now, thanks to two math whizzes in California, we have an answer.
And I’m the loser, but not by much.
It all started after the arrest of the alleged Golden State Killer in April. Police had uploaded crime-scene DNA to an open-access genealogy website, GEDmatch, and located some of his relatives. Eventually, they found him.
The case generated huge interest among genealogists, journalists, geneticists, and sleuths of all sorts. How did investigators do it? Is our genetic privacy at risk? How come this never happened before?
But one question emerged paramount (even for those innocent of anything): what’s the chance they could find you?
I had a guess. We’d recently reported on the explosive growth in DNA genealogy tests, which more than 12 million people have now taken. Figuring everybody’s got dozens of relatives, I posted on Twitter that I’d bet any American by now has at least one relative already in a database.
“How much do you bet?” fired back Henry Greely, a law professor at Stanford University.
It was on. First, the setting of terms. Specifically, I was willing to bet that more than 95 percent of people could find at least one second cousin match in Ancestry.com, the largest of these relative-finding databases.
The bet would also have a critical caveat. It could only apply to people of European background, because that’s mostly who has taken the tests.
And the stakes? The loser would have to submit a spit sample, allowing millions of strangers to compare the DNA results with their own.
Now, thanks to a couple of academics with a free Friday afternoon, we have an answer of sorts, and it appears I am the loser.
The answer comes from mathematical geneticists Graham Coop and Doc Edge. The duo, based at University of California, Davis, decided to calculate whether police just got lucky finding their suspect, or whether databases are now so big they couldn’t miss.
In a blog post, they highlight some key concepts that constrained the answer. One is “genealogical blowup.” That’s their term for how immensely the number of possible relatives increases the more distant you allow the connection be. You have just one or two siblings. But you can have hundreds of third cousins.
There is an opposing phenomenon that narrows the search space. The reason it’s possible to match relatives is that some of their DNA is literally the same, or “identical by descent.” For instance, you share about half your DNA with your father. You and a first cousin share some DNA from the two grandparents you have in common.
But more distant relations have less identical DNA. A third cousin you’ve probably never met? Less than 1 percent of your DNA is shared, and sometimes none at all. Thus, for more distant relations, DNA can’t make a match.
Edge and Coop found that California police had good odds of finding the killer’s relatives. The database they used, GEDmatch, has about 950,000 profiles in it. According to the UC Davis scientists, the odds that a random American of European background has a first cousin in GEDmatch is 3.5 percent. It’s 25 percent for a second cousin, and more than 90 percent for a third cousin, of which the police apparently found several.
As you’d imagine, the bigger the database, the bigger the chance some DNA identical with yours is in it. In fact, avoiding a second-cousin match is all but impossible in Ancestry.com, according to Coop and Edge's estimates—though not quite as likely as I needed it to be to win the bet.
According to their estimates, the chance of having a second cousin in that database is 94 percent, just shy of my 95 percent guess. Since Ancestry declined to provide the exact figure, I'll go with those of the UC Davis gang and say I lost my bet by a nose.
Honestly, I’ve never wanted to have my DNA tested. Companies like 23andme and Helix have sent me free kits, and I never sent them back. What am I going to learn? I know more or less where I am from. And I am not sure I’d want to locate some unacknowledged sibling or learn that the postman is really Daddy.
Even more than that, it’s been obvious that as the databases grow in size, they’ll only get more powerful, and no one can say what uses they could be put to in the future. Once you give up your DNA—like your fingerprints—you can’t get it back.
The reason I’ve decided to do the Ancestry test, which costs $99, isn’t only that I am a good loser. It’s that the choice has already been made for me. According to Coop’s estimates, I may have 200 more third cousins and 1,000 fourth cousins who’ve already gotten tested.
My DNA, like yours, is already out there.