MIT Technology Review Subscribe

An Online Encyclopedia that Writes Itself

Machine reading effort builds dossiers on people and organizations from translated news sources.

They look a bit like communally written Wikipedia pages. But these articles—concise profiles of people and organizations, complete with lists of connected organizations, people, and events—were in fact written by computers, in a new bid by the Pentagon to build machines that can follow global news events and provide intelligence analysts with useful summaries in close to real time.

The prototype system is part of a nonpublic site built for intelligence agencies by Raytheon BBN in Cambridge, Massachusetts, and scheduled for delivery to the government later this year. It gathers information from 40 news websites written in English, Chinese, and Arabic, and eventually it will cover hundreds of news sites in all major languages. Ultimately the system will be linked with an existing TV broadcast monitoring network.

Advertisement

On the new site, if you search for information on the Nigerian jihadist movement Boko Haram, you get this entirely computer-generated summary: “Founded by Mohammed Yusuf in 2002, Boko Haram is led by Ibrahim Abubakar Shekau. (Former leaders include Mohammed Yusuf.) It has headquarters in Maiduguri. It has been described as ‘a new radical fundamentalist sect,’ ‘the main anchor for mayhem in the state,’ ‘a fractured sect with no clear structure,’ and ‘the misguided extremist sect.’ “

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Subscribe now Already a subscriber? Sign in
You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Subscribe now Already a subscriber? Sign in

To be sure, Wikipedia’s Boko Haram entry is clearer. But the BBN system captures everything that appears on news sites—not just on topics people chose to write Wikipedia pages about—and constantly and automatically adds information, says Sean Colbath, a senior scientist at BBN Technologies who demonstrated the technology. “I could go and read 200 articles to learn about Bashar Al-Assad (the Syrian dictator). But I’d like to have a machine tell me about it,” says Colbath. (The system, by the way, picks up the fact that the brutal Al-Assad is also a licensed ophthalmologist.)

It starts by detecting an “entity”—a name or an organization, such as Boko Haram, accounting for a variety of spellings. Then it identifies other entities (events and people) that are connected to it, along with statements made by and about the subject. “It’s automatically extracting relationships between entities,” Colbath says. “Here the machine has learned, by being given examples, how to put these relationships together and fill in those slots for you.”

The Boko Haram page goes on to list associated organizations and statements by and about the group. Clicking on any of them takes you back to original news sources, many of them translations of articles originally published in Arabic by sites such as Al Sharq in Qatar and Al Balad in Lebanon.

The BBN project is the fruit of the Defense Advanced Research Projects Agency’s latest effort to build machines that read as humans do, a decades-old problem that has been the focus of increasing research in recent years. Under DARPA’s research program, prototypes have been built by SRI International and IBM as well as Raytheon BBN. 

Bonnie Dorr, DARPA’s program manager for the project, says the technology incorporates recent improvements in machine reading, enabling it to do a better job of understanding when the same underlying event is described in multiple ways—such as “Joe is married to Sue” and “Sue is Joe’s spouse”—and to determine the sentiment implied in phrases like “really awesome.”

Automatically summarizing text is notoriously tricky given the difficulty of detecting humor, sarcasm, obviously incorrect information, idioms, and variant spellings and syntax, not to mention the problems involved in interpreting and translating information sources in different languages.

Page views: This entry on the Muslim Brotherhood was composed by computers using information gathered from online news sources.

Accordingly, many of the system’s results come across as a bit wooden or off-key. The profile of Barack Obama, for example, correctly identifies him as the president of the United States, but then summarizes him this way: “Obama has been described as ‘Nobel Peace Prize winner,’ ‘the only reasonable guy in the room,’ ‘an anti-apartheid campus divestment activist,’ and ‘the most trusted politician in the CR-poll.’ “

Advertisement

At another point it notes, “Obama is married to Michelle LaVaughn Robinson Obama; other family members include Henry Healy, Malia Obama, and Ann Dunham.” (Healy is a distant Obama cousin from Moneygall, Ireland. Obama’s younger daughter, Sasha, isn’t mentioned.)

The system lacks real-world knowledge that would help a human analyst recognize something as false, humorous, or plainly irrelevant. Indeed, some of the outputs can be a little comical. I looked up Abraham Lincoln and found that the statements attributed to him include a number of accurate ones (though nothing from his most famous speech, the Gettysburg Address). Then I stumbled across this quote, which seems to have been produced when the system got itself snagged in some published list of famous sayings and did its best to synthesize them. “Abraham Lincoln says that the point of honey one fishing of flies more than fish barrels of a bitter pill, as well as the case for humans,” the profile reports.

Humans aren’t going to be completely replaced anytime soon.

This is your last free story.
Sign in Subscribe now

Your daily newsletter about what’s up in emerging technology from MIT Technology Review.

Please, enter a valid email.
Privacy Policy
Submitting...
There was an error submitting the request.
Thanks for signing up!

Our most popular stories

Advertisement