Learning machine: Sean Colbath, a senior scientist at Raytheon BBN, helped create BBN’s broadcast TV monitoring system.
They look a bit like communally written Wikipedia pages. But these articles—concise profiles of people and organizations, complete with lists of connected organizations, people, and events—were in fact written by computers, in a new bid by the Pentagon to build machines that can follow global news events and provide intelligence analysts with useful summaries in close to real time.
The prototype system is part of a nonpublic site built for intelligence agencies by Raytheon BBN in Cambridge, Massachusetts, and scheduled for delivery to the government later this year. It gathers information from 40 news websites written in English, Chinese, and Arabic, and eventually it will cover hundreds of news sites in all major languages. Ultimately the system will be linked with an existing TV broadcast monitoring network.
On the new site, if you search for information on the Nigerian jihadist movement Boko Haram, you get this entirely computer-generated summary: “Founded by Mohammed Yusuf in 2002, Boko Haram is led by Ibrahim Abubakar Shekau. (Former leaders include Mohammed Yusuf.) It has headquarters in Maiduguri. It has been described as ‘a new radical fundamentalist sect,’ ‘the main anchor for mayhem in the state,’ ‘a fractured sect with no clear structure,’ and ‘the misguided extremist sect.’ “
To be sure, Wikipedia’s Boko Haram entry is clearer. But the BBN system captures everything that appears on news sites—not just on topics people chose to write Wikipedia pages about—and constantly and automatically adds information, says Sean Colbath, a senior scientist at BBN Technologies who demonstrated the technology. “I could go and read 200 articles to learn about Bashar Al-Assad (the Syrian dictator). But I’d like to have a machine tell me about it,” says Colbath. (The system, by the way, picks up the fact that the brutal Al-Assad is also a licensed ophthalmologist.)
It starts by detecting an “entity”—a name or an organization, such as Boko Haram, accounting for a variety of spellings. Then it identifies other entities (events and people) that are connected to it, along with statements made by and about the subject. “It’s automatically extracting relationships between entities,” Colbath says. “Here the machine has learned, by being given examples, how to put these relationships together and fill in those slots for you.”
The Boko Haram page goes on to list associated organizations and statements by and about the group. Clicking on any of them takes you back to original news sources, many of them translations of articles originally published in Arabic by sites such as Al Sharq in Qatar and Al Balad in Lebanon.
The BBN project is the fruit of the Defense Advanced Research Projects Agency’s latest effort to build machines that read as humans do, a decades-old problem that has been the focus of increasing research in recent years. Under DARPA’s research program, prototypes have been built by SRI International and IBM as well as Raytheon BBN.
Bonnie Dorr, DARPA’s program manager for the project, says the technology incorporates recent improvements in machine reading, enabling it to do a better job of understanding when the same underlying event is described in multiple ways—such as “Joe is married to Sue” and “Sue is Joe’s spouse”—and to determine the sentiment implied in phrases like “really awesome.”
Automatically summarizing text is notoriously tricky given the difficulty of detecting humor, sarcasm, obviously incorrect information, idioms, and variant spellings and syntax, not to mention the problems involved in interpreting and translating information sources in different languages.