A printer that automatically reformats pages to reduce clutter could save on ink and make printouts easier to read. But would you accept a few additional ads on your document in return? Researchers at HP’s labs think so.
“By some statistics, almost half of the printouts on HP’s printers come from the Web, but the experience is really terrible compared to office or PDF documents,” says Parag Joshi, a member of HP’s Multimedia Interaction and Understanding Lab in Palo Alto, California. Joshi and his colleague Sam Liu led the development of software that extracts the pertinent text and images from an online article and discards the rest of the page, making for a much cleaner printout. The system removes clutter such as navigation elements or online ads, so less ink and paper are needed.
At the same time, the system can insert advertisements chosen to match the content. The software “is a way HP could generate additional revenue,” says Liu, “but it can also provide a better experience to the user and save them money.”
The ads that do appear on the printed version are chosen by algorithms that scan an article’s content, and they can be designed to better suit the printed page, says Joshi. For example, they may be more image-centric, or include coupons to be taken to a store for discounts–both tactics that are more common in print advertising than online.
To determine which parts of a Web page to keep and which to discard, the HP software first renders the page in the same way as a Web browser. It then analyzes how text and images are spread across different sections of the page to extract the core text and images. Several clues make it possible to accurately exclude everything but the content of an article, says Liu. For example, the fact that advertisements are often labeled as such and lack captions makes them easy to spot.
This aspect of the software is similar to the workings of browser plugins like Readability–now built into Apple’s browser Safari–that strip away everything but the body text of a page and present it in a clean, easy-to-read layout. But HP’s system also preserves relevant images and has to do the extra work of formatting the printed page, and including new advertisements.
Selecting the right ads for printouts involves extracting meaning from the text. “Once we identify the main content, we use machine learning to find matching semantic categories,” says Liu. Adverts relevant to those categories are then selected for insertion into the document.
The final layout is currently chosen from a small set of broadly similar templates that arrange an article into columns. “They produce documents that look like a news magazine,” says Liu. One planned feature currently in the works would automatically combine several articles to save paper, instead of printing them individually. “You could subscribe to an RSS feed and have a small magazine generated automatically, or combine articles from different sites,” says Joshi.
Richard Ziade, a founding partner of New York design consultancy Arc90, which developed the Readability plug-in, says many people are waking up to the fact that Web page designs often leave a lot to be desired when it comes to reading the content on them.
“When we put [Readability] out there, the huge response made it clear we had really hit a nerve,” he says. “There’s now frustration around clutter and accessibility on the Web, while before there was a kind of agreement that ‘I get it for free, and so I’ll tolerate your bad design.’”
The erosion of that agreement has brought about tools like Readability, Instapaper, and the iPad app Flipboard, which creates a magazine-like reading experience from articles shared by a user’s friends. “Now it seems HP is exploring how to better use Web content too,” says Ziade.
Ziade is less sure about the size of the market for printing out online content. “There’s probably not a huge percentage of people that print all of their reading from portable devices, and apps for them, like Flipboard, are really improving fast.”
For now, the HP technology is being tested in the form of plug-ins for Firefox and Internet Explorer Web browsers. These plug-ins present a button in the browser that, when clicked, sends a page to a service running on a cloud-computing platform that reformats the page, inserts adverts, and sends the final result back to the user’s computer for printing.
Next year, Liu says, a version will be made for the app store that accompanies HP’s new cloud-based printing service ePrint. ePrint allows any Internet-connected device–including smartphones that lack printer drivers–to print by sending the document to an email address associated with a physical printer. The app store–dubbed ePrintCenter–gives users access to smart applications such as a subscription service to a daily newspaper, or the one being developed by Liu and Joshi.