Technology Review - Published By MIT
Advertisement
« Back 1 2 3 4 5 [6] 7 8 Next »

April 2003

Surveillance Nation

Continued from page 5

By Dan Farmer and Charles C. Mann

smaller text tool iconmedium text tool iconlarger text tool icon

Managing the sheer size of these aggregate surveillance databases, surprisingly, will not pose insurmountable technical difficulties. Most personal data are either very compact or easily com- pressible. Financial, medical, and shopping records can be represented as strings of text that are easily stored and transmitted; as a general rule, the records do not grow substantially over time.

Even biometric records are no strain on computing systems.To identify people, genetic-testing firms typically need stretches of DNA that can be represented in just one kilobyte—the size of a short e- mail message. Fingerprints, iris scans, and other types of biometric data consume little more. Other forms of data can be preprocessed in much the way that the cameras on Route 9 transform multi-megabyte images of cars into short strings of text with license plate numbers and times. (For investigators, having a video of suspects driving down a road usually is not as important as simply knowing that they were there at a given time.) To create a digital dossier for every individual in the United States—as pro- grams like Total Information Awareness would require—only "a couple terabytes of well-defined information" would be needed, says Jeffrey Ullman, a former Stanford University database researcher. "I don't think that's really stressing the capacity of [even today's] databases."

Instead, argues Rajeev Motwani, another member of Stanford's database group, the real challenge for large surveillance databases will be the seemingly simple task of gathering valid data. Computer scientists use the term GIGO— garbage in, garbage out—to describe situations in which erroneous input cre-ates erroneous output.Whether people are building bombs or buying bagels, governments and corporations try to predict their behavior by integrating data from sources as disparate as electronic toll-collection sensors, library records, restaurant credit-card receipts, and grocery store customer cards—to say nothing of the Internet, surely the world's largest repository of personal information. Unfortunately, all these sources are full of errors, as are financial and medical records. Names are misspelled and digits transposed; address and e-mail records become outdated when people move and switch Internet service providers; and formatting differences among databases cause information loss and distortion when they are merged. "It is routine to find in large customer databases defective records—records with at least one major error or omission—at rates of at least 20 to 35 percent,"says Larry English of Information Impact, a database consulting company in Brentwood, TN.

Unfortunately, says Motwani, "data cleaning is a major open problem in the research community. We are still struggling to get a formal technical definition of the problem." Even when the original data are correct, he argues, merging them can introduce errors where none had existed before.Worse, none of these worries about the garbage going into the system even begin to address the still larger problems with the garbage going out.

« Back 1 2 3 4 5 [6] 7 8 Next »
April 2003

Would you like to read more articles from the April 2003 issue?

This article is from the April 2003 Issue of Technology Review. To read other articles from this issue simply register for My.TechnologyReview.com. It's free.

Subscribe today and save up to 41% »

Resources

Events

Comments

Advertisement

Current Issue

Technology Review November/December 2008
Sun + Water = Fuel
An MIT chemist has opened the way to making hydrogen fuel from water using sunlight.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology