Building a Picture of the Bomb Suspects through Social Network Analysis

Within hours after the Boston marathon bombing suspects were identified, police probably obtained warrants to search extensive digital records from mobile phone networks and social media and e-mail providers.

Twitter feed of Boston bomber — **Data mine:** This Twitter account is thought to have been used by Dzhokhar Tsarnaev.

The fast-growing rise of such data sets—and rise of network analysis tools to make sense of them—could be a boon in the investigation. It might reveal the existence of other evidence, further plots, or the identity of accomplices. But sorting the real information from the junk will be a challenge.

“The general number of law enforcement requests of e-mail and social network data has gone up by a wide margin,” says Hanni Fakhoury, a staff attorney at the Electronic Frontier Foundation in San Francisco. The result is that police agencies hold huge databases of stored information.

In the case of Google, such requests are rising rapidly. In the last six months of 2012, for example, Google reported receiving about 8,400 requests for user data, up from about 6,300 in the last six months of 2011. The company reports these data here.

Security camera footage and public appeals seem to have quickly led to the identification of the suspects: brothers Tamerlan and Dzhokhar Tsarnaev, immigrants of Chechen heritage. Now “it will be interesting to see how the prevalence of electronic evidence and data all around us on the Web makes it possible for the police to solve all aspects of this crime,” Fakhoury says.

Searches of e-mail are governed by a 1986 law called the Stored Communications Act, which technically allows police access to e-mails older than 180 days without a formal search warrant. Getting a search warrant would require a judicial finding of probable cause that the defendant committed a crime. In order to access messages more than 180 days old, investigators need only a court-issued document similar to a subpoena that says the information is relevant to a criminal investigation.

“Congress assumed if you left things on the server for six months, you’d abandoned it,” Fakhoury says. However, major providers of Web communication services, like Google and Facebook, won’t release data without a search warrant, he says. Some data—such as the user’s IP address and date the e-mail account was created—won’t be released without such a warrant. But of course, other data—such tweets and some Facebook posts—are publicly available.

As soon as the names surfaced, police would have gone to court to obtain search warrants. The authorities would turn to any number of software tools to sort through huge quantities of data, and visualize links between suspects, locations, and other points of reference.

One such tool is the Sentinel Visualizer from a Virginia company called FMS Advanced Systems Group. “A lot of police departments have thousands, if not tens or hundreds of thousands, of telephone call records,” says Dan Wasser, the director of business development at the company, which he said is not involved in the current investigation. “It’s impossible to look at that data and see who is calling whom.”

“Let’s say the police have been gathering data for weeks, months, years,” Wasser says. “Now they have the name of these Chechen brothers. Those names may pop up in the records from past databases of phone calls, transactions, and other data they may have collected over the years.”

Some traces of social networking activity have already surfaced. A YouTube profile created last year in the name of the elder suspect, Tamerlan Tsarnaev, features videos about terrorism, but verifying that he created the profile might pose a challenge.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.