Skip to Content

Breaking the Genome Bottleneck

It can be quicker and easier to sequence a genome than to analyze the resulting data—now one startup thinks it has a solution to this data-crunching bottleneck.

The genomic data generated from next-generation sequencing machines doesn’t amount to much more than alphabet soup if it’s not subjected to significant computational processing and statistical analysis. For the data to be useful, the trick is to turn those As, Ts, Gs, and Cs into a manageable description of disease risks and other genetic predispositions. That requires a lot of computational power and time—already a significant bottleneck for some genomic analysis companies (see Bases to Bytes, May/June, 2012).

Several companies are looking to the cloud as a way to help them analyze all the data. The idea is that researchers can send their data to a Web-hosted analysis service that will process raw data into a genetic profile. However, the data files generated by sequencing machines are so massive (see “Bases to Bytes,” March/April 2012) that the mundane issue of uploading large files to the cloud becomes its own issue. The strategy of a Redwood City, California-based startup called Bina Technologies is to divide and conquer: give customers an in-house data-crunching machine that will turn a mountain of raw sequence into easily shared genetic profiles. Those profiles can then be quickly uploaded to Bina Technologies’ cloud-hosted site for data management, sharing, and aggregation.

The group plans to sell its so-called “Bina Box” preloaded with software that can reduce the 300 gigabytes or so of raw data from a human genome into a few hundred megabytes of genetic information. The box will upload the compressed dataset to Bina’s cloud service for storage, sharing, and further analysis. The Bina Box can do the initial heavy lifting and make the data small enough to send to the cloud, says Narges Bani Asadi, founder of Bina.

Bina Technologies says its system does this initial processing of genomic data at speeds that are orders of magnitude faster than tools made available by the Broad Institute, the MIT-Harvard joint genome center. What takes about a week using the Broad’s genome variation analysis pipeline on a high-end eight-core machine on Amazon’s cloud can be done in about two hours on a Bina Box, says Asadi. The company expects to publish a full description of its comparison to other analysis pipelines in the coming months.

Bina Technologies plans to work with a few genomics groups as part of a pilot test phase for its system. One group in early conversation with Bina Technologies is Foundation Medicine, a Cambridge, Massachusetts, cancer genomics company (see “Foundation Medicine: Personalizing Cancer Drugs,” March/April 2012). While the team responsible for prepping samples and generating raw sequence data has been able to scale up its processes to meet demand, the same is not true for the computational analysis, says Maureen Cronin, senior vice president of research collaborations with Foundation Medicine, and an advisor to Bina Technologies. She says all the data streaming off Foundation Medicine’s sequencing machines has “created quite a computational problem.”

To be certain of the mutations they identify, says Cronin, Foundation Medicine sequences a patient’s genome at an average of 500X coverage—that is, every one of the three billion base pairs in the human genome is replicated about 500 times. This raw data, billions of short blips of the genome, each a few dozen base pairs in length, must then be processed into longer chromosomal sequences. This “assembly” process is followed by a comparison of an individual’s genome to a standard of reference—the result of the human genome project. All this must happen before any clinical interpretation of a tumor or other genome can even begin. “It’s an incredibly computationally intensive process,” says Cronin.

Keep Reading

Most Popular

individual aging affects covid outcomes concept
individual aging affects covid outcomes concept

Anti-aging drugs are being tested as a way to treat covid

Drugs that rejuvenate our immune systems and make us biologically younger could help protect us from the disease’s worst effects.

Europe's AI Act concept
Europe's AI Act concept

A quick guide to the most important AI law you’ve never heard of

The European Union is planning new legislation aimed at curbing the worst harms associated with artificial intelligence.

Uber Autonomous Vehicles parked in a lot
Uber Autonomous Vehicles parked in a lot

It will soon be easy for self-driving cars to hide in plain sight. We shouldn’t let them.

If they ever hit our roads for real, other drivers need to know exactly what they are.

crypto winter concept
crypto winter concept

Crypto is weathering a bitter storm. Some still hold on for dear life.

When a cryptocurrency’s value is theoretical, what happens if people quit believing?

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.