An Online Encyclopedia that Writes Itself
They look a bit like communally written Wikipedia pages. But these articles—concise profiles of people and organizations, complete with lists of connected organizations, people, and events—were in fact written by computers, in a new bid by the Pentagon to build machines that can follow global news events and provide intelligence analysts with useful summaries in close to real time.
The prototype system is part of a nonpublic site built for intelligence agencies by Raytheon BBN in Cambridge, Massachusetts, and scheduled for delivery to the government later this year. It gathers information from 40 news websites written in English, Chinese, and Arabic, and eventually it will cover hundreds of news sites in all major languages. Ultimately the system will be linked with an existing TV broadcast monitoring network.
On the new site, if you search for information on the Nigerian jihadist movement Boko Haram, you get this entirely computer-generated summary: “Founded by Mohammed Yusuf in 2002, Boko Haram is led by Ibrahim Abubakar Shekau. (Former leaders include Mohammed Yusuf.) It has headquarters in Maiduguri. It has been described as ‘a new radical fundamentalist sect,’ ‘the main anchor for mayhem in the state,’ ‘a fractured sect with no clear structure,’ and ‘the misguided extremist sect.’ “
To be sure, Wikipedia’s Boko Haram entry is clearer. But the BBN system captures everything that appears on news sites—not just on topics people chose to write Wikipedia pages about—and constantly and automatically adds information, says Sean Colbath, a senior scientist at BBN Technologies who demonstrated the technology. “I could go and read 200 articles to learn about Bashar Al-Assad (the Syrian dictator). But I’d like to have a machine tell me about it,” says Colbath. (The system, by the way, picks up the fact that the brutal Al-Assad is also a licensed ophthalmologist.)
It starts by detecting an “entity”—a name or an organization, such as Boko Haram, accounting for a variety of spellings. Then it identifies other entities (events and people) that are connected to it, along with statements made by and about the subject. “It’s automatically extracting relationships between entities,” Colbath says. “Here the machine has learned, by being given examples, how to put these relationships together and fill in those slots for you.”
The Boko Haram page goes on to list associated organizations and statements by and about the group. Clicking on any of them takes you back to original news sources, many of them translations of articles originally published in Arabic by sites such as Al Sharq in Qatar and Al Balad in Lebanon.
The BBN project is the fruit of the Defense Advanced Research Projects Agency’s latest effort to build machines that read as humans do, a decades-old problem that has been the focus of increasing research in recent years. Under DARPA’s research program, prototypes have been built by SRI International and IBM as well as Raytheon BBN.
Bonnie Dorr, DARPA’s program manager for the project, says the technology incorporates recent improvements in machine reading, enabling it to do a better job of understanding when the same underlying event is described in multiple ways—such as “Joe is married to Sue” and “Sue is Joe’s spouse”—and to determine the sentiment implied in phrases like “really awesome.”
Automatically summarizing text is notoriously tricky given the difficulty of detecting humor, sarcasm, obviously incorrect information, idioms, and variant spellings and syntax, not to mention the problems involved in interpreting and translating information sources in different languages.
Accordingly, many of the system’s results come across as a bit wooden or off-key. The profile of Barack Obama, for example, correctly identifies him as the president of the United States, but then summarizes him this way: “Obama has been described as ‘Nobel Peace Prize winner,’ ‘the only reasonable guy in the room,’ ‘an anti-apartheid campus divestment activist,’ and ‘the most trusted politician in the CR-poll.’ “
At another point it notes, “Obama is married to Michelle LaVaughn Robinson Obama; other family members include Henry Healy, Malia Obama, and Ann Dunham.” (Healy is a distant Obama cousin from Moneygall, Ireland. Obama’s younger daughter, Sasha, isn’t mentioned.)
The system lacks real-world knowledge that would help a human analyst recognize something as false, humorous, or plainly irrelevant. Indeed, some of the outputs can be a little comical. I looked up Abraham Lincoln and found that the statements attributed to him include a number of accurate ones (though nothing from his most famous speech, the Gettysburg Address). Then I stumbled across this quote, which seems to have been produced when the system got itself snagged in some published list of famous sayings and did its best to synthesize them. “Abraham Lincoln says that the point of honey one fishing of flies more than fish barrels of a bitter pill, as well as the case for humans,” the profile reports.
Humans aren’t going to be completely replaced anytime soon.
Become an MIT Technology Review Insider for in-depth analysis and unparalleled perspective.Subscribe today