Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Business Impact

The Immortal Life of the
Enron E-mails

A decade after the Enron scandal, the company’s internal messages are still helping to advance data science and many other fields.

E-mail is still the most popular online communication method in companies.

Former Enron executive Vincent Kaminski is a modest, semi-retired business school professor from Houston who recently wrote a 960-page book explaining the fundamentals of energy markets. His most lasting legacy, however, may involve thousands of e-mails he wrote more than a decade ago at the energy-services company.

3-D Enron logo
Corporate corpus: Volumes of e-mails that were sent and received in Enron’s headquarters in Houston, seen here in 2002, are still parsed and dissected by computer scientists and other researchers.

Kaminski, a former managing director for research who warned repeatedly about concerning practices he saw at Enron, is among more than 150 senior executives whose e-mail boxes were dumped onto the Internet by the Federal Energy Regulatory Commission (FERC) on March 26, 2003. In the name of serving the public’s interest during its investigation of Enron, the federal agency made the controversial decision to post online more than 1.6 million e-mails that Enron executives sent and received from 2000 through 2002. FERC eventually culled the trove to remove the most sensitive and personal data, after receiving complaints (see PDF). Even so, the “Enron e-mail corpus,” as the cleaned-up version is now known, remains the largest public domain database of real e-mails in the world—by far.

This corpus is valuable to computer scientists and social-network theorists in ways that the e-mails’ authors and recipients never could have intended. Because it is a rich example of how real people in a real organization use e-mail—full of mundane lunch plans, boring meeting notes, embarrassing flirtations that revealed at least one extramarital affair, and the damning missives that spelled out corruption—it has become the foundation of hundreds of research studies in fields as diverse as machine learning and workplace gender studies.

This research has had widespread applications: computer scientists have used the corpus to train systems that automatically prioritize certain messages in an in-box and alert users that they may have forgotten about an important message. Other researchers use the Enron corpus to develop systems that automatically organize or summarize messages. Much of today’s software for fraud detection, counterterrorism operations, and mining workplace behavioral patterns over e-mail has been somehow touched by the data set.

“It’s like we are studying yeast,” says William Cohen, a Carnegie Mellon University computer scientist who helped put the corpus in a database that could be mined by researchers. “It’s studied and experimented on because it is a very well understood model organism. [The e-mail generated by] Enron is similar. People are going to keep using it for a long time.”

The Enron e-mails were given their extended life by scientists at MIT, Carnegie Mellon University, and the nonprofit research institute SRI International. Ten years ago, researchers at these institutions were collaborating on the DARPA-funded CALO project, which stands for “Cognitive Assistant that Learns and Organizes,” and whose biggest claim to fame is giving rise to Apple’s Siri software. For CALO, the researchers were cobbling together much smaller e-mail data sets to analyze.

When the Enron e-mails were posted in 2003, the researchers realized that they could be extremely useful for testing algorithms that could process written language and form the basis of intelligent workplace tools.  Because FERC had posted the e-mails in an unusable format, MIT’s Leslie Kaelbling purchased the raw files from a government contractor for $10,000, and others spent time cleaning up the data—weeding out duplicates, organizing folders, taking out the remaining private attachments and e-mails, and mapping the senders and recipients to Enron’s organizational structure. The corpus, at first more than 517,431 e-mails, was whittled down to 200,000 by 2004.

A research ecosystem still blooms around the corpus because there is nothing else like it in the public domain. If it didn’t exist, research into business e-mails could be done only by people with access to big corporate or government servers. That probably would exclude social science, organizational, and linguistics researchers—many of whom have used the corpus to glean valuable insights into corporate culture, says Owen Rambow, a Columbia University professor involved in a research project that used the Enron corpus and received a $510,000 grant from the National Science Foundation.

Since 2010, about 30 papers a year have cited the original paper that presented the Enron corpus, Carnegie Mellon’s Cohen estimates. This year, for instance, researchers at HP Labs turned to the corpus to demonstrate an artificial intelligence program for automatically identifying the commitments people make over e-mail. Jafar Adibi, who worked on an early map of the Enron social network, says he still gets handfuls of inquiries every month, more and more from researchers outside of the United States. There is still an active list-serv devoted to discussing the corpus.

Researchers who have worked with the corpus know there won’t be another Enron. FERC released the e-mails back when the world still had a lot to learn about online privacy. The harms to people mentioned—most of whom were innocent of any wrongdoing at Enron—were quickly apparent. Social security numbers and even bank records were in there. Though much private data has been removed, browsing hundreds of e-mails in Kaminski’s “sent” folder, I found a home phone number, his wife’s name, and an unflattering opinion he held of a former colleague. I also got the sense that he had been long, long overdue for the promotion he received in 2000. At the time the e-mails were first released, Kaminski, the manager of about 50 employees at Enron, said he was most disturbed to see his back-and-forth communications about HR complaints and job candid­ate evaluations become public. A job candidate he once interviewed got upset after their release.

Today, many people who work in highly regulated industries like finance avoid putting sensitive information in their e-mails. Kaminski, who later served as a managing director at Citigroup, notes that the acronym “LTOL” became popular e-mail lingo in the years following Enron. It stands for “Let’s take this offline.”

Cut off? Read unlimited articles today.

Become an Insider
Already an Insider? Log in.

Uh oh–you've read all of your free articles for this month.

Insider Premium
$179.95/yr US PRICE

More from Business Impact

How technology advances are changing the economy and providing new opportunities in many industries.

Want more award-winning journalism? Subscribe and become an Insider.
  • Insider Plus {! insider.prices.plus !}* Best Value

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

    Bimonthly digital/PDF edition

    Access to the magazine PDF archive—thousands of articles going back to 1899 at your fingertips

    Special interest publications

    Discount to MIT Technology Review events

    Special discounts to select partner offerings

    Ad-free web experience

  • Insider Basic {! insider.prices.basic !}*

    {! insider.display.menuOptionsLabel !}

    Six issues of our award winning print magazine, unlimited online access plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

    Bimonthly print magazine (6 issues per year)

  • Insider Online Only {! insider.prices.online !}*

    {! insider.display.menuOptionsLabel !}

    Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox.

    See details+

    What's Included

    Unlimited 24/7 access to MIT Technology Review’s website

    The Download: our daily newsletter of what's important in technology and innovation

/
You've read all of your free articles this month. This is your last free article this month. You've read of free articles this month. or  for unlimited online access.