Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo


Unsupported browser: Your browser does not meet modern web standards. See how it scores »

{ action.text }

Vast quantities of data are freely available on the Web, and it can be a potential treasure trove for many businesses–providing they can figure out how to use it effectively.

A company can, for example, comb through data from the U.S. Patent and Trademark Office and court records prior to acquiring another company to see if any of its intellectual property is tied up in legal action. In practice, however, going through so much information takes time and effort to orchestrate.

IBM hopes that a new tool, called BigSheets, will help users analyze Web data more easily. The company has developed a test version of the software for the British Library.

“The ability of any user to do their own types of interesting analytics is coming of age,” says Rod Smith, vice president of emerging Internet technologies for IBM.

BigSheets is built on top of another piece of software called Hadoop. This is an open-source platform for processing very large amounts of Web data by splitting up tasks and handing them off to a cluster of different computers. Hadoop is often used to analyze large amounts of unstructured Web data.

BigSheets uses Hadoop to crawl through Web pages, parsing them to extract key terms and other useful data. BigSheets organizes this information in a very large spreadsheet, where users can analyze it using the sort of tools and macros found in desktop spreadsheet software. Unlike ordinary spreadsheet software, however, there’s no limit to the size of a spreadsheet created through BigSheets.

To use BigSheets, a user would point the tool at a set of URLs or a repository of data. Lists of terms can be used to organize the data into rows and tables, and these can be adjusted later.

Smith says that IBM chose the spreadsheet as the model for organizing data because most users are already familiar with such software. If users want to represent the data in more complex ways, the tool will work with an IBM visualization tool called Many Eyes, as well as other visualization software.

2 comments. Share your thoughts »

Credits: IBM

Tagged: Computing, Internet, IBM, data analysis, visualization, enterprise computing, Hadoop

Reprints and Permissions | Send feedback to the editor

From the Archives


Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me