A Search Service that Can Peer into the Future
A Yahoo Research tool mines news archives for meaning–illuminating past, present, and even future events.
Showing news stories on a timeline has been tried before. But Time Explorer, a prototype news search engine created in Yahoo’s Barcelona research lab, generates timelines that stretch into the future as well as the past.
Time Explorer’s results page is dominated by an interactive timeline illustrating how the volume of articles for a particular search term has changed over time. The most relevant articles appear on the timeline, showing when they were published. If the user moves the timeline into the future, articles appear positioned at any point in time the text might have referred to.
This provides a new way to discover articles, and also a way to check up on past predictions. The timeline for 2010 becomes a way to discover a 2004 Op-Ed suggesting that by now, North Korea would have constructed some 200 nuclear warheads, or a 2007 article accurately predicting difficult policy decisions for Democrats over the expiration of George Bush’s tax cuts.
News organizations are increasingly turning to new ways of presenting their content, including through enhanced forms of search. A Pew research study in 2008 found that 83 percent of people looking for news online use a search engine to find it.
Time Explorer can spot both absolute references to future times, such as “November 2010,” and work forward from an article’s publication date to figure out relative timings like “an election next month.” It also extracts names, locations, and organizations mentioned in articles. These are shown in a box to the right of the results; they can be used to add a person or other entity to the timeline, and to fine-tune results to home in on combinations of particular people or places.
“You can see for wars or any other event not only the people that are important, but when they became important,” says Michael Matthews, a member of the Yahoo research team. “The evolution of news over time is not something you can do very easily with tools that are out there today.”
Time Explorer was built using a collection of 1.8 million articles released by the New York Times stretching from 1987 to 2007 to stimulate research into new ways of exploring news coverage. Time Explorer was presented, along with other ideas for using the same dataset, at a session of the Human Computer Interaction and Information Retrieval (HCIR) workshop in New Brunswick, NJ, over the weekend. Time Explorer won the most votes from attendees for best use of the Times articles.
Other tools presented at HCIR attempted to assess the authority of people mentioned in an article, determine phrases related to a search term, and rapidly pull together a page summarizing the latest news on a particular topic, for example a celebrity or country.
“For most news search engines, recency is a significant factor for relevance,” says Daniel Tunkelang, a tech lead at Google’s New York office who chaired the challenge session. “Time Explorer brings an exploratory perspective to the time dimension, letting users see the evolution of a topic over time.”
“The slick visualization allows users to discover unexpected relationships between entities at particular points in time–for example, between Slobodan Milosevic and Saddam Hussein,” says Tunkelang. Refining a search for the term “Yugoslavia” with the two leaders reveals how, at first, Hussein appears as a point of comparison in coverage of the Serbian leader, but later the two leaders were directly involved, with stories reporting arms deals between them.
Although Time Explorer currently only works with old news, it could also be used to explore new coverage, and to put it in context, says Matthews. “It would be tough to update in real time, but it could certainly be done daily, and I think that would be useful for sure.”
He says the service would be best deployed as a tool that works off of the topics in a breaking story. A person reading a news report about, say, Medicaid would find it useful to see the history of coverage on the topic, as well as the predictions made about its future, says Matthews. “It’s like a related-articles feature, but focused in the future.” He and colleagues are working on adding more up-to-date news sources, as well as content from blogs and other sites to Time Explorer’s scope.
The Times has digitized and made searchable its content going back to 1851, yet today’s search technologies and interfaces are not up to the task of making such large collections explorable, says Evan Sandhaus, a member of the New York Times Research and Development Labs who oversaw the release of the article archive in late 2008.
“We can say, ‘show me all the articles about Barack Obama,’ but we don’t have a database that can tell us when he was born, or how many books he wrote,” says Sandhaus, who adds that tools developed to process the meaning of news articles could have wider uses. “That resource will not only help the research community move the needle for our company but for any company with a large-scale data-management problem.”
With most organizations harboring millions of text documents, from e-mails to reports, smarter tools to handle them would likely be popular, Matthews says. “In theory, the underlying algorithms should work on anything, perhaps with a little tweaking.”