A startup called Recorded Future has developed a tool that scrapes real-time data from the Internet to find hints of what will happen in the future. The company’s search tool spits out results on a time line that stretches into the future as well as the past.
The 18-month-old company gained attention earlier this year after receiving money from the venture capital arms of both Google and the CIA. Now the company has offered a glimpse of how its technology works.
Conventional search engines like Google use links to rank and connect different Web pages. Recorded Future’s software goes a level deeper by analyzing the content of pages to track the “invisible” connections between people, places, and events described online.
“That makes it possible for me to look for specific patterns, like product releases expected from Apple in the near future, or to identify when a company plans to invest or expand into India,” says Christopher Ahlberg, founder of the Boston-based firm.
A search for information about the drug company Merck, for example, generates a time line showing not only recent news on earnings but also when various drug trials registered with the website clinicaltrials.gov will end in coming years. Another search revealed when various news outlets predict Facebook will make its initial public offering.
The results are compiled using a constantly updated index of what Ahlberg calls “streaming data,” including news articles, filings with government regulators, Twitter updates, and transcripts from earnings calls or political and economic speeches. Recorded Future uses linguistic algorithms to identify specific types of events, such as product releases, mergers, and natural disasters; the dates when those events will happen; and related entities such as people, companies, and countries. The tool can also track the overall tone that news coverage and blog entries take toward companies, classifying it as either good or bad.
Recorded Future’s customer base is currently “sub-100,” says Ahlberg. It includes a mix of financial firms, government analysts, and media analysts, who pay a monthly fee to access the online tools. “Government analysts are interested in tracking people and places, while financial services may want to reveal events coming up around particular companies,” says Ahlberg.
As well as providing a slick online interface to perform searches that generate time lines showing the results (see video), Recorded Future offers free e-mail newsletters that tip users off to predictions in specific areas. It also makes it possible for customers to write software that draws on the tool’s data and analysis through application programming interfaces, or APIs.
In time, this may lead to the development of apps targeted at consumers, says Ahlberg. “If I’m about to buy an iPhone, I might want to know if I am going to look stupid because they’ll launch a new one next week, or how long it usually takes for competitors to launch competing products after a new Apple launch.” Financial analysts are already using the company’s APIs to overlay Recorded Future’s data onto their own models or even to integrate the two, he says.
“We have proven out that our data can make strong predictions,” says Ahlberg, citing studies that compared Recorded Future’s output with changes in the volume of activity around particular financial stocks. “We found that our momentum metric, which indicates the strength of activity around an event or entity, and our future events correlate with the volume of market activity,” says Ahlberg.
His company’s tools can also be used to work out which sources of information give the best clues as to future events. A recent analysis showed that the posts on one of the Financial Times blogs were better than other news sources at predicting the performance of companies on the S&P 500 share index. Negative posts about a company correlated with below-market performance a week later, while positive ones correlated with above-market performance.
“What they’re really doing here is identifying and collating statements that have been made about the future,” says Steven Skiena, a computer scientist at Stony Brook University in New York. Skiena developed similar technology used by another startup, General Sentiment, to mine material from news and blogs. “An analyst can use those to inform their own predictions, less risky than Recorded Future actually making predictions themselves,” he says.
Various tools are capable of extracting events, people, and companies from text, but aligning that information in time is a trickier task, says Panagiotis Ipeirotis, an associate professor of information, operations, and management sciences at New York University’s Leonard Stern School of Business. Ipeirotis researches how economically important data can be mined from online news sources and social media. “Analysis of sequences of events is very interesting, and underexploited in the research literature,” he says. “Even getting decently timed data of news articles in order to properly generate event sequences is a hard problem.”
This focus on the time line sets Recorded Future apart from other firms trying to gain insights by mining news and other data, says Ipeirotis. “I’m curious to see when other text analytics firms will jump into the trend.”
Recorded Future is about to expand its service to cover Arabic and Chinese sources. Making its indexes bigger is a major priority. “I’d like to be able to get in front of every piece of streaming data on the planet,” says Ahlberg.
As the databases covered by Recorded Future, General Sentiment, and others grow, more powerful types of analysis will become possible, says Skiena. “I’m currently working with social scientists on models to predict what the probability is that a person that gets few mentions today suddenly becomes very famous in the future, by looking back at years of past data,” he says.