The Chinese Solar Machine Layer by Layer Fire in the Library The Mystery Behind Anesthesia
(Page 2 of 2)
Ultimately, Clancy says, Google would like Book Search to give the same result as someone going to a library, looking in its stacks, and serendipitously finding a book that's interesting or useful. One way to do this would be to link books to each other by categories and themes, he suggests. The task becomes more complicated, though, when linking works by Virginia Woolf, for instance, to criticisms of her work, works that inspired her, or authors who wrote during the same era. Designing algorithms that can effectively organize all of this new information, Clancy says, is "one of the grand challenges and will take many years."
Reddy says CMU researchers are trying to tackle this challenge by using a "statistical approach" to organizing the information. In this approach, Virginia Woolf's stream-of-consciousness sentences, for example, would be analyzed by an algorithm that would find patterns based on sentence length, structure, and punctuation. This technique might find a work by James Joyce, one of Woolf's influences -- or that of an obscure author whose writings might otherwise never have been found.
In the meantime, researchers are seeking shortcuts for searching among authors, books, and genres, Reddy says. Similar to the way "collaborative filtering" at Amazon uses people's past purchases to help others find potential purchases, Book Search users could help each other. The community-based approach is an idea that Google has not announced, Clancy says, but it could add another layer to searching through books and create grassroots excitement about the project.
Certainly, holding its cards close is not new for Google. "They are secretive about almost everything they do...this is very common with Silicon Valley companies," Reddy at CMU says. In the case of Book Search, he says, Google wants "to have a captive solution for all the libraries." Even so, though, Reddy is excited about Google's project and believes it will eventually complement his research. "I'm sure at some point they will have a pointer to our books," he says.
Meanwhile, Google has to contend with the nontechnical issue of disgruntled copyright holders dragging them into court. The Author's Guild and a number of publishers have sued them, claiming that Google's project violates copyright law. (Stanford professor Lawrence Lessig has made a 30-minute video about the legal controversy.)
But if the legal and technical challenges can be overcome, digitized physical books could greatly surpass the billions of existing Web pages in breadth and depth of information. Indeed, a single comprehensive online card catalog for millions of the world's books has the potential to create a whole new chapter in the information age.
Guest (w wong)
This article is extremely interesting.
I hope my Chinese Character indexing system will tie in with this kind of on-line book database in the future.
Guest (Doc_Reader)
U.S. Government Documents available online
LexisNexis is currently digitizing millions of pages of declassified documents from all three branches of the U.S. government going back as far as 200 years. These documents are being made available in searchable PDF documents that have been fully abstracted and indexed. It's amazing the quantity and quality of information they are publishing online.
Guest (Chandra Sekhar S.)
The article "How to Digitize a Million Books" discusses the issue of Searching thru the scanned books, but does not touch the currently popular way of indexing the books by their "entire text", so that any word or a string of characters can be used to search books, instead of key word searches.
Guest (lawrephord24@hotmail.com)
wanted a topological transformation compressor
with the right ratio and a 40gig
dll all stuff can be stored magazines books music other communication forms
Guest (Mpaunescu)
Google Book Search aproach is not so special
You can find an article about it at the folowing adress: http://students.haverford.edu/jhuttner/Essays/Computers/GoogleBookSearch.htm
To be more specific I can tell you that they use some specialized book scanners from Kirtas Technologies. Those scanners turn the pages automaticaly and have a decent scanning speed. You can use only one operator for two or three scanners working in paralel. Very eficient. Kirtas is not alone in this field but I supose this is another discusion.
If you are wondering what OCR they use is very simple .... you can use it too ... is ABBY Fine Reader.
Hope the info is usefull
Mass - Digitization with Revolutionary techonology without breaking the spine or damaging the book
We are the pioneers to do the mass digitization of books, journals, periodicals in India with a special revolutionary techonology on a large basis. The best part is your originals will not be damaged and even we dont need to break the spine of the book.
We have already digitized millions of pages. Our mission is to store the books and literature for years together for the next 100's of generation to come. We also OCR them so as to make it searchable.
In this article even Clancy says, "in particular, with page numbering. For instance, full pages can be missing or dog-eared corners could reveal an incorrect page number. And if pagination is wrong in one part of the book, the error propagates throughout the work."
We have overcomed with that problem with the technology we are using.
Rutul Kamdar
Director
DigiSys Info Service Pvt. Ltd.
www.digisysglobe.com
digisysglobe@gmail.com
Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.
Our list of the 50 most innovative companies, including the following:
Guest (Rdmoore6)
Digitization errors
Google Scholar OCR of old journals has a problem which often converts "modern" to "modem". Google is not using any content processing to detect silly uses of "modem".
Reply