Skip to Content

Archiving E-mail Effectively

The White House’s recent problems archiving e-mail could be solved by emerging technologies.

Losing e-mail can be a serious problem for both the public and private sector. Recently, the White House came under fire for failing to keep archives of its e-mail, even after the Clinton Administration instituted an archiving system due to a similar scandal. The Bush Administration set aside the existing archiving system as part of a move from Lotus Notes to Microsoft Exchange, and its current system reportedly relies on a combination of backup tapes and hand sorting. Earlier this week, according to the Washington Post, court documents revealed that the White House has not been able to find e-mails sent during a period in 2003 that encompasses the Iraq invasion.

E-mail can be hard to archive effectively because it gets sent in such vast quantities, which makes it difficult to store in a form that’s easy to search and demonstrably tamper proof. Both of these requirements are particularly important for e-mail that’s necessary for court cases and other legal situations. Fortunately, experts say that there are many good technologies for proper e-mail archiving and that the capabilities have only gotten better in recent years.

“Backup tapes make a lousy method for archiving e-mail,” says Mark Diamond of Contoural, a data and storage consulting firm. Tape systems take snapshots of data at set daily or weekly intervals, leaving open the possibility that data could be both created and deleted before a copy can be made. Consequently, tape backups are really only good as insurance against system failure. A decent archiving system, Diamond says, “includes the ability to know what you have and the ability to … find e-mails and retrieve them easily, which is very difficult on backup tapes.”

There’s also a growing recognition that archiving systems need to be automated in order to work properly, in contrast to the laborious and error-prone manual sorting system currently in place at the White House. Even many digital archiving schemes store files in multiple locations. A complete search for all files related to a particular lawsuit might require searching many machines at different locations. Diamond says that a good archiving system makes a record of the e-mail as soon as it hits the server. Other automated tools aim to protect e-mail that relates to existing litigation, and to make it easy to sort through existing stored messages.

Diamond says that newer archiving systems go beyond simply storing e-mail, making it possible to recover in a matter of minutes data that might otherwise take weeks or months to get in hand. Such improvements are important, he says, because “every e-mail somebody sends, whether it’s from the White House e-mail server or the company e-mail server, is a business document.” Courts and regulators have been very clear, he says, that this means e-mail needs to be properly preserved. The costs can be high when good systems aren’t in place. Beyond worries about the White House, Diamond points to a case between Intel and AMD in which the former reported spending more than $25 million recovering lost e-mail. The typical Fortune 500 company, Diamond says, has more than 150 legal actions pending at any given time. At least 50 percent of the cost of that litigation, he adds, lies in recovering needed documents, most of which are electronic.

Although at first glance, it may seem important to sort out which e-mails need to be kept, some experts say that it’s best just to keep them all. Robin Bingeman, product manager for Forensic and Compliance Systems, a U.K.-based company that makes an archiving tool called Cryoserver, has a favorite example to illustrate the point. Imagine an e-mail that says the following: “Hi Bill, good to see you last week. Sorry to hear about Eileen’s kidney infection. I hope she comes out of hospital soon. Well, here’s that document I promised regarding that customer steering column failure. I think we need to have the technical team have a look at it before it becomes a bit of a customer concern.” Depending on your perspective, he points out, the e-mail could be thought of as personal. But it also contains information about product liability that, at least according to European law, needs to be saved for 10 years. A human sorter could easily focus on only one aspect of the e-mail, and a natural-language processing algorithm is likely to become confused. Bingeman thinks that businesses should keep everything for the maximum time period and focus on coming up with good ways to search through the data and prove that it hasn’t been tampered with, so that it can be admitted as evidence in the event of a court case. He adds that this type of policy is also good because it gives a company a chance to prove that an alleged e-mail hasn’t been sent–something that’s impossible to do with backup tapes, which don’t record everything.

Peter terSteeg, technical director of Quest Software’s unified communications business unit, says that companies must do more than just save e-mail, however. “Companies need one pane of glass to find all of the data that they need for litigation,” he says. “They need to find vendors that are going to make a significant investment in that single pane of glass for searching across all the data types in that organization.”

Files or data produced by Web-based collaboration, such as through Microsoft’s SharePoint, need to be archived along with e-mail, terSteeg notes. A coming challenge for vendors like Quest, he says, is to develop a system that archives all these types of data and to make them easy to search through and produce as reliable evidence in court, even if the quantity of data grows substantially over an organization’s lifetime.

Mimosa Systems CEO T.M. Ravi agrees about the direction in which the field is heading. In fact, earlier this week, Mimosa announced new features for the company’s existing e-mail archiving system that will allow companies to archive and search for documents beyond e-mail and instant messages. Using a technique called global single instancing, the software searches through the archive for identical data stored in multiple locations–for example, if a Word file is saved by a user and then e-mailed to 40 people within a company–and keeps only one copy. In the future, Ravi says, the company plans to add support for SharePoint files as well.

“At the end of the day,” says Contoural’s Diamond, “e-mail archiving isn’t about saving e-mail. It’s about control.” Considering the maturation of tools on the market in recent years, he says, most problems arise when organizations, including the White House, don’t come up with good policies for using the available technologies.

Keep Reading

Most Popular

This new data poisoning tool lets artists fight back against generative AI

The tool, called Nightshade, messes up training data in ways that could cause serious damage to image-generating AI models. 

The Biggest Questions: What is death?

New neuroscience is challenging our understanding of the dying process—bringing opportunities for the living.

Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist

An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.

How to fix the internet

If we want online discourse to improve, we need to move beyond the big platforms.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.