Technology Review - Published By MIT
Advertisement

May 2005

The Infinite Library

Continued from page 1

By Wade Roush

smaller text tool iconmedium text tool iconlarger text tool icon

Breaching the Walls
Even for authorized users, access to the Bodleian Library's seven million volumes is anything but instant. If you are an Oxford ­undergraduate in need of a book, you first send an electronic ­request to a worker in the library's underground stacks. (Before 2000 or so, you would have handed a written request slip to a ­librarian, who would have relayed it to the stacks via a 1940s-era network of pneumatic tubes.) The worker locates the book in a warren of movable shelves (a space-saving innovation conceived in 1898 by former British prime minister William Gladstone) and places it in a plastic bin. An ingenious system of conveyor belts and elevators, also built in the 1940s, carries the bin back to any of seven reading rooms, where it is unpacked, and the book is handed over to you.

The process can take anywhere from 30 minutes to several hours. But once you finally have the book, don't even think about taking it back to your dorm room for further study. The Bodleian is a noncirculating legal deposit library, meaning that it is en­titled to a free copy of every book published in the United Kingdom and the Republic of Ireland, and it guards those copies jealously. The library takes in tens of thousands of books every year, but the legend is that no book has ever left its walls.

But a digital book needn't be loaned out to be shared. And Oxford's various libraries have already created digital images of many of their greatest treasures, from ninth-century illuminated Latin manuscripts to 19th-century children's alphabet books. Most of these images can be examined at high resolution on the Web. The only catch is that scholars have to know what they're looking for in advance, since very few of the digital pages are searchable. Optical character recognition (OCR) technology cannot yet interpret handwritten script, so exposing the content of these books to today's search engines requires typing their texts into separate files linked to the original images. A three-person team at Oxford, in collaboration with librarians at the University of Michigan and 70 other universities, is doing just that for a large collection of early English books, but the entire effort produces searchable text for only 200 books per month. At that rate, making a million books searchable would take more than 400 years.

That's where Google's resources will make a difference. ­Susan Wojcicki, a product manager at Google's Mountain View, CA, campus and leader of the Google Print project, puts it bluntly: "At Google we're good at doing things at scale."

Google has already copied and indexed some eight billion Web pages, which lends credibility to its claim that it can digitize a big chunk of the 60 million volumes (counting duplicates) held by Harvard, Oxford, Stanford, the University of Michigan, and the New York Public Library in a matter of years. It will be a complex task, but one that is in some ways familiar for the company. "It's not just feeding the books into some kind of digitization machine, but then actually taking the digital files, moving those files around, storing them, compressing them, OCR-ing them, indexing them, and serving them up," points out Wojcicki. "At that point it becomes similar to all of Google's other businesses, where we're managing large amounts of data." But the entire project, Wojcicki admits, hinges on those digitization machines: a fleet of proprietary robotic cameras, still under development, that will turn the digitization of printed books into a true assembly-line process and, in theory, lower the cost to about $10 per book, compared to a minimum of $30 per book today.

Neither Google nor its partner libraries have announced exactly how the process will work. But John Wilkin, associate university librarian at the University of Michigan, says it will go something like this: "We put a whole shelfful of books onto a cart, keeping the order intact. We check them out by waving them under a bar code reader. Overnight, software takes all the bar codes, extracts machine-readable records from the university's electronic catalogue, and sends the records to Google, so they can match them with the books. Then we move the cart into Google's operations room."

This room will contain multiple workstations so that several books can be digitized in parallel. Google is designing the machines to minimize the impact on books, according to Wilkin. "They scan the books in order and return the cart to us," he continues. "We check them back in and mark the records to show they've been scanned. Finally, the digital files are shipped in a raw format to a Google data center and processed to produce something you could use."

May 2005

Would you like to read more articles from the May 2005 issue?

This article is from the May 2005 Issue of Technology Review. To read other articles from this issue simply register for My.TechnologyReview.com. It's free.

Subscribe today and save up to 41% »

Comments

  • respond to the Infinite Library
    Guest (Jocelyn Stevenson) on 01/28/2006 at 12:00 AM
    Posts:
    1
    I agree with the first part of the article. That was good to have them libraries join google.
    Rate this comment: 12345
  • Infinte Library
    Guest (Sylvia Reyna) on 06/03/2006 at 12:00 AM
    Posts:
    1
    Intresting article. Libraries play a major role in the dissemination of "free" information and now it comes with a covert tag. Privitazation of information, company lawyers will have to mud wrestle this one with the publishers and authors.
    Rate this comment: 12345
Advertisement

Current Issue

Technology Review January/February 2009
Lifeline for Renewable Power
Without a radically expanded and smarter electrical grid, wind and solar will remain niche power sources.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today
Advertisement

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology