Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

DNA data: Narges Bani Asadi founded Bina Technologies, a genome-analysis company that aims to speed up the processing of DNA sequence data.

The genomic data generated from next-generation sequencing machines doesn’t amount to much more than alphabet soup if it’s not subjected to significant computational processing and statistical analysis. For the data to be useful, the trick is to turn those As, Ts, Gs, and Cs into a manageable description of disease risks and other genetic predispositions. That requires a lot of computational power and time—already a significant bottleneck for some genomic analysis companies (see Bases to Bytes, May/June, 2012).

Several companies are looking to the cloud as a way to help them analyze all the data. The idea is that researchers can send their data to a Web-hosted analysis service that will process raw data into a genetic profile. However, the data files generated by sequencing machines are so massive (see “Bases to Bytes,” March/April 2012) that the mundane issue of uploading large files to the cloud becomes its own issue. The strategy of a Redwood City, California-based startup called Bina Technologies is to divide and conquer: give customers an in-house data-crunching machine that will turn a mountain of raw sequence into easily shared genetic profiles. Those profiles can then be quickly uploaded to Bina Technologies’ cloud-hosted site for data management, sharing, and aggregation.

The group plans to sell its so-called “Bina Box” preloaded with software that can reduce the 300 gigabytes or so of raw data from a human genome into a few hundred megabytes of genetic information. The box will upload the compressed dataset to Bina’s cloud service for storage, sharing, and further analysis. The Bina Box can do the initial heavy lifting and make the data small enough to send to the cloud, says Narges Bani Asadi, founder of Bina.

Bina Technologies says its system does this initial processing of genomic data at speeds that are orders of magnitude faster than tools made available by the Broad Institute, the MIT-Harvard joint genome center. What takes about a week using the Broad’s genome variation analysis pipeline on a high-end eight-core machine on Amazon’s cloud can be done in about two hours on a Bina Box, says Asadi. The company expects to publish a full description of its comparison to other analysis pipelines in the coming months.

Bina Technologies plans to work with a few genomics groups as part of a pilot test phase for its system. One group in early conversation with Bina Technologies is Foundation Medicine, a Cambridge, Massachusetts, cancer genomics company (see “Foundation Medicine: Personalizing Cancer Drugs,” March/April 2012). While the team responsible for prepping samples and generating raw sequence data has been able to scale up its processes to meet demand, the same is not true for the computational analysis, says Maureen Cronin, senior vice president of research collaborations with Foundation Medicine, and an advisor to Bina Technologies. She says all the data streaming off Foundation Medicine’s sequencing machines has “created quite a computational problem.”

To be certain of the mutations they identify, says Cronin, Foundation Medicine sequences a patient’s genome at an average of 500X coverage—that is, every one of the three billion base pairs in the human genome is replicated about 500 times. This raw data, billions of short blips of the genome, each a few dozen base pairs in length, must then be processed into longer chromosomal sequences. This “assembly” process is followed by a comparison of an individual’s genome to a standard of reference—the result of the human genome project. All this must happen before any clinical interpretation of a tumor or other genome can even begin. “It’s an incredibly computationally intensive process,” says Cronin.

4 comments. Share your thoughts »

Credit: Courtesy of Narges Bani Asadi

Tagged: Computing, Biomedicine, genomic sequencing, genomic data

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me