Ultimately, Clancy says, Google would like Book Search to give the same result as someone going to a library, looking in its stacks, and serendipitously finding a book that's interesting or useful. One way to do this would be to link books to each other by categories and themes, he suggests. The task becomes more complicated, though, when linking works by Virginia Woolf, for instance, to criticisms of her work, works that inspired her, or authors who wrote during the same era. Designing algorithms that can effectively organize all of this new information, Clancy says, is "one of the grand challenges and will take many years." Reddy says CMU researchers are trying to tackle this challenge by using a "statistical approach" to organizing the information. In this approach, Virginia Woolf's stream-of-consciousness sentences, for example, would be analyzed by an algorithm that would find patterns based on sentence length, structure, and punctuation. This technique might find a work by James Joyce, one of Woolf's influences -- or that of an obscure author whose writings might otherwise never have been found. In the meantime, researchers are seeking shortcuts for searching among authors, books, and genres, Reddy says. Similar to the way "collaborative filtering" at Amazon uses people's past purchases to help others find potential purchases, Book Search users could help each other. The community-based approach is an idea that Google has not announced, Clancy says, but it could add another layer to searching through books and create grassroots excitement about the project. Certainly, holding its cards close is not new for Google. "They are secretive about almost everything they do...this is very common with Silicon Valley companies," Reddy at CMU says. In the case of Book Search, he says, Google wants "to have a captive solution for all the libraries." Even so, though, Reddy is excited about Google's project and believes it will eventually complement his research. "I'm sure at some point they will have a pointer to our books," he says. Meanwhile, Google has to contend with the nontechnical issue of disgruntled copyright holders dragging them into court. The Author's Guild and a number of publishers have sued them, claiming that Google's project violates copyright law. (Stanford professor Lawrence Lessig has made a 30-minute video about the legal controversy.) But if the legal and technical challenges can be overcome, digitized physical books could greatly surpass the billions of existing Web pages in breadth and depth of information. Indeed, a single comprehensive online card catalog for millions of the world's books has the potential to create a whole new chapter in the information age. |









Comments
02/28/2006
Posts:1
I hope my Chinese Character indexing system will tie in with this kind of on-line book database in the future.
02/28/2006
Posts:1
02/28/2006
Posts:1
03/02/2006
Posts:1
dll all stuff can be stored magazines books music other communication forms
03/05/2006
Posts:1
To be more specific I can tell you that they use some specialized book scanners from Kirtas Technologies. Those scanners turn the pages automaticaly and have a decent scanning speed. You can use only one operator for two or three scanners working in paralel. Very eficient. Kirtas is not alone in this field but I supose this is another discusion.
If you are wondering what OCR they use is very simple .... you can use it too ... is ABBY Fine Reader.
Hope the info is usefull
03/08/2006
Posts:1
We have already digitized millions of pages. Our mission is to store the books and literature for years together for the next 100's of generation to come. We also OCR them so as to make it searchable.
In this article even Clancy says, "in particular, with page numbering. For instance, full pages can be missing or dog-eared corners could reveal an incorrect page number. And if pagination is wrong in one part of the book, the error propagates throughout the work."
We have overcomed with that problem with the technology we are using.
Rutul Kamdar
Director
DigiSys Info Service Pvt. Ltd.
www.digisysglobe.com
digisysglobe@gmail.com
digisysglobe
08/21/2008
Posts:1