On September 26, 2001, Howard Cash got a phone call that changed his life. On the other end of the line was the Office of the Chief Medical Examiner of the City of New York. Could Cash-the founder of Gene Codes, a bioinformatics company recognized for a DNA-sequencing program called Sequencher-build the software necessary to manage and identify the remains of the 2,792 missing victims of the September 11 attacks on the World Trade Center? Existing identification tools were inadequate for the task: The scope of the project was mind-bogglingly large, and the remains to be identified had been pulverized and commingled by the falling towers and burning jet fuel. Ultimately, 19,937 separate remains were found, with some victims recovered in as many as 200 pieces.
Using $2 million in profits to hire 11 new people and double his office space, Cash sprang into action. He started a wholly owned subsidiary called Gene Codes Forensics to develop a new breed of identification software: M-FISys (pronounced “emphasis”), an acronym for Mass-Fatality Identification System. (The subsidiary would protect the parent company against potential lawsuits should any victim not be identified.) The day the medical examiner’s office first turned on the software, December 13, 2001, 55 new matches that ultimately resulted in identifications were made. Says Cash, “There was such a mountain of information, there wasn’t a way to sift through it and find the matches until that first day. Suddenly, there were these loved ones’ remains waiting to be sent home.”
Before M-FISys, the New York medical examiner’s office was attempting to make DNA identifications using an FBI program called CoDIS (or Combined DNA Index System), which is generally used to identify felons based on the DNA found at crime scenes, along with Charles Brenner’s longstanding DNA-VIEW and Benoit Leclair’s Mass Disaster Kinship Analysis Program. But by mid-December, CoDIS, which was designed to compare a single sample to a large database of DNA profiles, had identified just 203 remains representing 105 people. Information-everything from DNA analysis of victims’ remains and their personal effects, as well as of relatives’ cheek swabs, to dental records and fingerprints-was being stored in 22 different databases as varied as FileMaker Pro and Oracle. As of June 2003, the medical examiner’s office had collected 7,681 personal items, including toothbrushes, razors, and hairbrushes, and 11,641 cheek swabs from some 7,166 relatives.
As of late July, 1,518 victims had been identified-with 779 of them, or just over half, identified by DNA alone. M-FISys brings together three types of DNA analysis-some standard, some not generally used for identification purposes-for repeated “all-against-all” comparisons among victim and kinship samples. The program constructs “virtual” DNA profiles where actual ones have literally gone up in smoke and permits users to add or subtract sample analyses from the composites as the evidence changes. It can link to other databases, such as those containing descriptions of, say, family relationships or what the victim wore to work the day of the disaster, or the medical examiner’s postmortem findings. It can present a snapshot of not just every test done on a sample but of the progression of those tests, as well as the forensic scientists’ comments. “M-FISys allows us to do quality checks on the software, on the samples, on the analysis,” says Robert C. Shaler, director of the Department of Forensic Biology for New York City. “We can, at a glance, get an idea of what samples we have and what results we have on them so that we can quickly go through and ascertain what else we need to do.”
The ability to coordinate disparate data sources is key to M-FISys’s success, experts believe. “The DNA part of a mass fatality incident is effectively a very small and narrow part of that incident,” says Chris Maguire, a consultant scientist with England’s Forensic Science Service who evaluated the New York identification efforts for the British Consul General. “When you have something on the scale of September 11, with some 20,000 body parts, potentially 10,000 relatives, and close to 3,000 victims, the actual logic in all of the comparisons [between body parts, family samples, and personal effects] can get lost in the mass of information. That’s where a program like M-FISys is so important.”
M-FISys brings together three DNA technologies in order to increase the likelihood of identifying remains: short tandem repeat analysis, mitochondrial DNA analysis, and examination of DNA markers called single nucleotide polymorphisms. Short tandem repeat analysis is the most common technique used today in paternity testing and forensic matches, as in the O.J. Simpson case. Just four chemical bases-adenine, thymine, guanine, and cytosine (known as A, T, G, C)-make up the DNA organized into 24 chromosomes in each human cell. The chain of letters stretches some three billion bases long. A short tandem repeat is a brief stretch out of that three billion that repeats over and over again. Because many people may have the same repeat at one particular spot, it’s necessary, for a conclusive match, to look at as many locations as possible. For its forensic investigations, the FBI examines 13 distinct regions where those repeats occur, plus a marker that indicates the sex of the person being profiled. M-FISys has upped the number to 15 regions for the Trade Center identifications.
The remains from 9/11, however, were often so compromised that few of even the minimal 13 regions could be profiled. In response, Cash’s team developed what he calls “virtual profiles.” If after multiple attempts only partial data could be extracted, the company would take whatever results it had and combine them, as Cash puts it, “to make a sample that never really existed.” For example, one cutting from a recovered bone might yield a specific number of repeats for a given region. An attempt with a second cutting might yield a partial reading on another region. M-FISys would combine the values to provide a profile with a greater possibility of matching to other test results. “I’ve not seen any other program that will actually put together a virtual profile,” says the British Forensic Science Service’s Maguire.
In fact, half the victim samples recovered did not yield enough information for identification by short tandem repeat analysis alone. So the medical examiner’s office turned to analysis of DNA from cellular components called mitochondria as an adjunct. Such analysis is typically used for studies of human evolution. Mitochondrial DNA analysis is not nearly as precise as short tandem repeat analysis (the most common pattern of the mitochondrial sequence is shared by about 7% of the Caucasian population), but it has two advantages: mitochondrial DNA is 500 times more plentiful than chromosomal DNA (of which everyone has just two copies). And it is much shorter and therefore hardier, which made it more likely to survive three months near the 1,000-plus-degree Celsius heat of burning jet fuel.
Unlike chromosomal DNA, mitochondrial DNA is inherited from the mother alone. And instead of being three billion bases long, each copy consists of just 16,569 letters. There are two regions in mitochondrial DNA, called hypervariable-one and hypervariable-two, where the sequence of bases varies a great deal between people. Together the regions total only about a thousand bases. Mitochondrial DNA analysis tracks just the letters in the hypervariable regions that differ from those outlined in a reference sequence called the Anderson Sequence. The differences are recorded in M-FISys, which matches them up against the patterns from other victim samples, personal effects, and DNA samples from maternal relatives. Mitochondrial DNA analysis alone is not enough to identify a victim, but it may help narrow the possibilities.
For the most degraded samples, investigators are starting to examine single nucleotide polymorphisms, or SNPs (pronounced “snips”)-locations along the genome where just one letter varies. With SNP analysis, which was developed to diagnose a genetic predisposition to certain diseases, a piece of chromosomal DNA that’s very short-perhaps only 60 letters long-can be used for an experiment. The GeneScreen division of Orchid Biosciences, in Dallas, TX, has developed population statistics to project the likelihood of any particular individual inheriting certain SNP pairs at specific genetic locations. In M-FISys, those probabilities are added to the test results from the mitochondrial and short tandem repeat analyses to further diminish the number of possible matches.
Gene Codes Forensics began incorporating SNP data into its calculations in November 2002, but the technology has not yet been approved by the State of New York’s Department of Health for identification purposes. “Because no one’s ever tried it before, the Department of Health wants to be absolutely sure,” says Cash. “The worst case isn’t that somebody doesn’t get identified. The worst case is if you identify somebody based on a new technology, and then you have to tell a family you made a mistake.”
In practice, M-FISys makes a Byzantine process appear straightforward. Human remains are found and catalogued, and a biologist places a small cutting from the bone or muscle into a test tube. The tube is sent to one of several labs, where biologists perform the three types of DNA analysis. The biologists then deliver the patterns of numbers and letters that they decipher to Gene Codes Forensics, and Cash and his team feed those patterns into M-FISys. The same process is applied to DNA samples from relatives and from personal effects. When an operator clicks on the M-FISys “Match Index” icon, the system compares every one of the thousands of samples to every other sample to see which ones match. The medical examiner’s office won’t declare a match between victim samples and a known DNA sample from, say, a toothbrush unless the likelihood of finding a similar match in the general population is less than one in 10 billion.
Existing DNA identification programs can incorporate both SNP data and short tandem repeat data into their match calculations, but not data from mitochondrial DNA analysis. “No one has ever tried to combine these three technologies before to collaboratively identify individuals,” says Cash. “If I can say, ‘These nine pieces go together; they’re one person,’ then hopefully I can also find a toothbrush that has the same DNA pattern so I can say, ‘OK. Now I know that these are not only the same, but that they came from whoever used this toothbrush.’”
Nearly every week since the program’s launch, in December 2001, Cash has flown from his company’s headquarters in Ann Arbor, MI, to Manhattan to deliver the latest release of the program in order to accommodate the ever-changing requirements of the medical examiner’s office. To date, there have been 68 iterations. Among the improvements has been a quality-control test to ensure that remains that anthropologists have guaranteed belong to just one person indeed belong to just one person. (In one case, the bone from one victim was so deeply imbedded in the tissue of another that what had on visual inspection looked like a single sample turned out, upon DNA analysis, to be two.) Another made it possible to pinpoint the location at which individual remains were found on a roughly 25-by-25-meter virtual grid of the disaster site, in order to cross-reference unidentified remains with identified ones that had been found in the same general area.
Cash’s contract with the city of New York officially ends on September 11, 2004-or sooner if his job is done. But the end of the project likely won’t mean returning to business as usual. Sequencher, the company’s original moneymaker, needs to be upgraded. The program, which has some 16,000 users, got pushed to the back burner when Cash received that fateful call. And there are new applications for M-FISys to consider. The software could be used in missing-persons work in a state forensics lab or for the International Commission on Missing Persons. Components of it, of course, could be applied to more common needs. For example, the kinship-analysis piece could be used in products for genetic counseling and for paternity searches, and even to protect endangered species by indicating restrictions for crossbreeding-or for identifying particularly dangerous pathogens.
Cash is also investigating the possibility of building a portable mass-fatality identification system to be used for other disasters-for example, a hurricane in the Philippines, an earthquake in Turkey, or another terrorist attack-so that, as he puts it “someone else doesn’t have to start from scratch.” To date, 12 countries have expressed interest in such a system. They likely share the viewpoint of England’s Maguire. “If I were to be involved in trying to run an incident on the scale of September 11 without a piece of software like this,” he says, “it would be in my view almost impossible to do.”
This new data poisoning tool lets artists fight back against generative AI
The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models.
The Biggest Questions: What is death?
New neuroscience is challenging our understanding of the dying process—bringing opportunities for the living.
Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist
An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.
How to fix the internet
If we want online discourse to improve, we need to move beyond the big platforms.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.