Google shook up the worlds of publishing and library science last year when it announced it would digitize millions of books from several of the world’s greatest libraries – including Oxford’s Bodleian Library and the New York Public Library – and make their contents searchable on the Web (see “The Infinite Library”).
Many librarians applauded Google’s move, and predicted it would jumpstart a broader effort to ensure universal electronic access to human knowledge. But publishers weren’t as pleased – particularly because Google said it would not seek permission to scan and index books still covered by copyright.
Now a group led by one of Google’s main rivals, Yahoo, is trying a more collective approach to digitization. On October 4, Yahoo and ten partner organizations announced the formation of the Open Content Alliance, which plans to build a free, permanent online repository for a wide range of print and multimedia content, including both copyrighted works and those that have passed into the public domain.
Yahoo’s partners in the alliance are Adobe Systems, the European Archive, Hewlett-Packard Labs, the Internet Archive, the National Archives of the United Kingdom, O’Reilly Media, the Prelinger Archives, the University of California, and the University of Toronto.
In contrast to Google’s approach, which requires publishers to “opt out” if they don’t want their works to be included, the alliance will only disseminate copyrighted works after their publishers have explicitly opted into the program, according to David Mandelbrot, Yahoo’s vice president for search technology.
Mandelbrot says the alliance will encourage other entities, including Google, to contribute to the repository, and will create a set of standards for digitization intended to make it easier to pool the products of various digitization efforts and to make them searchable from any search engine. Technology Review’s executive Web editor, Wade Roush, recently interviewed Mandelbrot about Yahoo’s approach to digitizing the world’s literature.
Wade Roush: How did the Open Content Alliance come about?
David Mandelbrot: In March of last year we launched our effort to partner with content rights holders. We wanted to move beyond what we could provide just by crawling the Web and improve the quality of Yahoo search. Soon after, we connected with the folks at the Internet Archive, who are doing great work with digitizing works. They were hosting a lot of great content and we wanted to integrate that into our search engine.
As we started that discussion, Brewster [Kahle, the founder of the Internet Archive] became focused on what can we do together to digitize content. They’ve developed a great scanning technology and a really good way to digitize works of literature, but they were looking for partners to help them get their message out there and get funding flowing. From those discussions, we decided to form this Open Content Alliance.