As mobile phones become more sophisticated, they are bidding to replace laptops as the business traveler’s tool of choice. But trying to view and navigate documents on a phone’s small screen remains frustrating. A new research project at Fuji Xerox Palo Alto Laboratory (FXPAL) aims to solve that problem, while making it easier to transfer scanned documents to a phone in the first place.
In a recent demonstration, researchers showed how the technology, called Seamless Documents, could store a scanned document in a database and analyze its structure and content. The analysis identifies sections and paragraphs, and automatically extracts key phrases that summarize the sections. So when a person pulls up the document on a phone, she can jump to a section labeled with a keyword, or just skip to the last paragraph on a page. In addition, as the user scrolls through the document, software on the phone automatically resizes images, section headers, and plain text, as different elements of the layout come into view.
Thanks to Moore’s Law, mobile phones have been gaining exponentially in computational power, but their screen sizes have remained comparatively static. Researchers, such as Patrick Baudisch at Microsoft, have been trying to find the best way to modify the mobile user interface for viewing Web pages and maps. And Apple has made great strides in making content viewable on its iPhone. But no one has solved the complex problem of building software that lets a user skim through and zoom in and out of scanned documents without losing her orientation.
Solving this problem could be a boon to business travelers who now lug around heavy folders of papers. “Executives take enormous stacks of documents with them when they travel,” says Scott Carter, a researcher on the FXPAL project. “We wanted to find a better way for them to get through these documents quickly.”
See Seamless Documents make a paper document readable on a handheld.
While the FXPAL project is only six months old and still very much a research project, it aims to solve a mixture of problems simultaneously–from extracting information from analog documents and synchronizing it across devices, to making this information easy to navigate on a cell-phone screen.
The first part of the Seamless Documents project focuses on converting analog documents into digital information and storing it in a database accessible over both the Internet and the cell-phone network. Once a scanner or some type of camera captures the documents, they are sent to a database where specialized software, developed at FXPAL, analyzes their structure to determine where paragraph breaks, pictures, and section titles occur. In addition, the document is run through optical character recognition software that converts the images of words on paper into digital information that can be read by a computer. Then, software automatically summarizes the text, picking out key words and concepts from each section that are then highlighted for the user.
The second part of the research project involves the software that runs on mobile phones. This software opens the document and displays all the extracted information. A user sees a view of the document with key phrases, in large font, overlaid on top of the paragraphs or segments, which makes it easier to pick the paragraph of interest. When a user taps on a paragraph, the font is enlarged and repositioned on the screen so that it’s readable. In a navigation window to the side of the screen, the user can see the location on the page of the section that has been enlarged. While reading the enlarged text, a user can also pull up a list of the key phrases that were previously superimposed on the document, which lets her quickly jump to a different section. As she scrolls through the text, if she encounters a picture that doesn’t fit well into the resized view of the document, the software automatically zooms out so that the picture is visible. As the user scrolls away from the picture, the text automatically resizes.
The act of resizing text and adjusting it so that it fits neatly on a screen is an important feature of the FXPAL project, says Ben Bederson, a professor of computer science at the University of Maryland. “I think that’s a crucial solution that has to be available on handheld devices,” he says. “If you want [a product] to work broadly, that’s important.”
But, Bederson adds, it’s still unclear how important keyword navigation is on mobile devices. Often, he says, people just do simple document browsing on cell phones, and offering extra features, he feels, is overkill. In addition, Bederson says, the FXPAL design is somewhat wasteful in that it uses a quarter of the viewable screen to display a navigation bar. “I think that’s a mistake,” he says. “Apple got it right: if you’re reading content, you want to use 100 percent of the screen.”
This summer, Carter and his team plan to run user tests to better understand how people want to access scanned documents on their phones. The results of these tests will shape the look and feel of the software in the future. “It’s important to make sure all the features we’re providing for scanned documents meet what you’d get if you were using a digital document,” says Carter. “It’s also important to get a broad understanding of the context around how people would like to read documents and interact with them on mobile devices.”
DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.
“This is a profound moment in the history of technology,” says Mustafa Suleyman.
What to know about this autumn’s covid vaccines
New variants will pose a challenge, but early signs suggest the shots will still boost antibody responses.
Human-plus-AI solutions mitigate security threats
With the right human oversight, emerging technologies like artificial intelligence can help keep business and customer data secure
Next slide, please: A brief history of the corporate presentation
From million-dollar slide shows to Steve Jobs’s introduction of the iPhone, a bit of show business never hurt plain old business.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.