Ultimately, Clancy says, Google would like Book Search to give the same result as someone going to a library, looking in its stacks, and serendipitously finding a book that’s interesting or useful. One way to do this would be to link books to each other by categories and themes, he suggests. The task becomes more complicated, though, when linking works by Virginia Woolf, for instance, to criticisms of her work, works that inspired her, or authors who wrote during the same era. Designing algorithms that can effectively organize all of this new information, Clancy says, is “one of the grand challenges and will take many years.”
Reddy says CMU researchers are trying to tackle this challenge by using a “statistical approach” to organizing the information. In this approach, Virginia Woolf’s stream-of-consciousness sentences, for example, would be analyzed by an algorithm that would find patterns based on sentence length, structure, and punctuation. This technique might find a work by James Joyce, one of Woolf’s influences – or that of an obscure author whose writings might otherwise never have been found.
In the meantime, researchers are seeking shortcuts for searching among authors, books, and genres, Reddy says. Similar to the way “collaborative filtering” at Amazon uses people’s past purchases to help others find potential purchases, Book Search users could help each other. The community-based approach is an idea that Google has not announced, Clancy says, but it could add another layer to searching through books and create grassroots excitement about the project.
Certainly, holding its cards close is not new for Google. “They are secretive about almost everything they do…this is very common with Silicon Valley companies,” Reddy at CMU says. In the case of Book Search, he says, Google wants “to have a captive solution for all the libraries.” Even so, though, Reddy is excited about Google’s project and believes it will eventually complement his research. “I’m sure at some point they will have a pointer to our books,” he says.
Meanwhile, Google has to contend with the nontechnical issue of disgruntled copyright holders dragging them into court. The Author’s Guild and a number of publishers have sued them, claiming that Google’s project violates copyright law. (Stanford professor Lawrence Lessig has made a 30-minute video about the legal controversy.)
But if the legal and technical challenges can be overcome, digitized physical books could greatly surpass the billions of existing Web pages in breadth and depth of information. Indeed, a single comprehensive online card catalog for millions of the world’s books has the potential to create a whole new chapter in the information age.