Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Intelligent Machines

Software Predicts Tomorrow’s News by Analyzing Today’s and Yesterday’s

Prototype software can give early warnings of disease or violence outbreaks by spotting clues in news reports.

A method of using online information to accurately predict the future could transform many industries.

Researchers have created software that predicts when and where disease outbreaks might occur based on two decades of New York Times articles and other online data. The research comes from Microsoft and the Technion-Israel Institute of Technology.

The system could someday help aid organizations and others be more proactive in tackling disease outbreaks or other problems, says Eric Horvitz, distinguished scientist and codirector at Microsoft Research. “I truly view this as a foreshadowing of what’s to come,” he says. “Eventually this kind of work will start to have an influence on how things go for people.” Horvitz did the research in collaboration with Kira Radinsky, a PhD researcher at the Technion-Israel Institute.

The system provides striking results when tested on historical data. For example, reports of droughts in Angola in 2006 triggered a warning about possible cholera outbreaks in the country, because previous events had taught the system that cholera outbreaks were more likely in years following droughts. A second warning about cholera in Angola was triggered by news reports of large storms in Africa in early 2007; less than a week later, reports appeared that cholera had become established. In similar tests involving forecasts of disease, violence, and a significant numbers of deaths, the system’s warnings were correct between 70 to 90 percent of the time.

Horvitz says the performance is good enough to suggest that a more refined version could be used in real settings, to assist experts at, for example, government aid agencies involved in planning humanitarian response and readiness. “We’ve done some reaching out and plan to do some follow-up work with such people,” says Horvitz.

The system was built using 22 years of New York Times archives, from 1986 to 2007, but it also draws on data from the Web to learn about what leads up to major news events.

“One source we found useful was DBpedia, which is a structured form of the information inside Wikipedia constructed using crowdsourcing,” says Radinsky. “We can understand, or see, the location of the places in the news articles, how much money people earn there, and even information about politics.” Other sources included WordNet, which helps software understand the meaning of words, and OpenCyc, a database of common knowledge.

All this information provides valuable context that’s not available in news article, and which is necessary to figure out general rules for what events precede others. For example, the system could infer connections between events in Rwandan and Angolan cities based on the fact that they are both in Africa, have similar GDPs, and other factors. That approach led the software to conclude that, in predicting cholera outbreaks, it should consider a country or city’s location, proportion of land covered by water, population density, GDP, and whether there had been a drought the year before.

Horvitz and Radinsky are not the first to consider using online news and other data to forecast future events, but they say they make use of more data sources—over 90 in total—which allows their system to be more general-purpose.

There’s already a small market for predictive tools. For example, a startup called Recorded Future makes predictions about future events harvested from forward-looking statements online and other sources, and it includes government intelligence agencies among its customers (see “See the Future With a Search”). Christopher Ahlberg, the company’s CEO and cofounder, says that the new research is “good work” that shows how predictions can be made using hard data, but also notes that turning the prototype system into a product would require further development.

Microsoft doesn’t have plans to commercialize Horvitz and Radinsky’s research as yet, but the project will continue, says Horvitz, who wants to mine more newspaper archives as well as digitized books.

Many things about the world have changed in recent decades, but human nature and many aspects of the environment have stayed the same, Horvitz says, so software may be able to learn patterns from even very old data that can suggest what’s ahead. “I’m personally interested in getting data further back in time,” he says.

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.

Subscribe today
More from Intelligent Machines

Artificial intelligence and robots are transforming how we work and live.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

    Bimonthly digital/PDF edition

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special interest publications

    Discount to MIT Technology Review events

    Special discounts to select partner offerings

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/3
You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.