In Google We Trust

Internet users should think carefully before relying on Gmail.

Simson Garfinkel ’87, PhD ’05archive page

February 15, 2006

This story, by a veteran TR correspondent, first appeared in the Dec. 2005/Jan. 2006 issue of Technology Review. It explores the complex issues of privacy and data security as they relate to Gmail, the increasingly popular, free web-mail service from Google.

Google’s Gmail raises important questions about the security and privacy of our personal information – questions that should matter not just to users of the free Web-based e-mail system but to everyone who exchanges e-mail with Gmail users.

And since the technical underpinnings of Gmail might very well be the prototype for the next generation of desktop-computer applications, the answers to these questions potentially affect everyone.

But wait – this is not another diatribe against the targeted advertisements Gmail shows while you read your mail. All of the worry surrounding that single issue has obscured a far more important one: data integrity and security. Gmail is so powerful, fast, and convenient that there’s a huge incentive for you to keep all of your e-mail there. But there’s a catch: Gmail makes no promise that a mail message you save today will still be there tomorrow – nor that e-mail you delete today will be gone tomorrow. Using Gmail means placing a lot of trust in Google.

When Gmail was launched in April 2004, it boasted three strengths: scale, search, and sales. Scale was the most obvious; Google promised each user the ability to store a gigabyte of e-mail when competitors like Hotmail were offering a measly two megabytes. Google could make this offer because, at the time, its 100,000-plus computers had more than 20 petabytes of combined storage. Since then, Google has shown it can buy new hard drives faster than its users can fill the old ones up.

Search was Gmail’s second strength. Instead of asking users to create “folders” and archive their e-mail like obedient file clerks, Gmail allowed them to simply click “archive” and banish e-mail messages from their in-boxes to an unseen holding area. Gmail users retrieve their archived mail by searching for it – a process that is so fast and thorough that it’s actually liberating.

Sales was Gmail’s third strength – one that was surprisingly controversial. When Google announced Gmail, it proudly proclaimed that it would analyze e-mail messages for common keywords and use them to customize advertisements. For example, an undergraduate reading a message about an upcoming assignment might simultaneously see an advertisement for a site that sells term papers.

Despite this apparent convenience, many privacy activists – me among them – called upon Google to describe how its targeted-advertising technology worked. The company responded this past October by dramatically expanding and clarifying its privacy policy. Google now explains that the advertisements are based on your computer’s IP address, the content of the message you’re reading, and your previous use of Gmail. But don’t worry, Google says: your e-mail is scanned only by computers and never by human beings.

In addition, Google now makes it clear that you can delete individual e-mail messages or your entire Gmail account at any time. If you do, however, your old e-mail might remain on Google’s servers for up to 60 days and on its “offline backup systems” for even longer. Although this may sound like an unacceptably long time, Google has in fact done a far better job in addressing the concerns of privacy activists than its competitors ever did.

It’s important for Google to get its privacy and security policy right with Gmail, because Gmail is the standard-bearer for an increasingly important approach to Web programming called Ajax, for asynchronous JavaScript and XML. Simply put, Ajax applications have user interfaces that run inside a Web browser, but the heavy computation and data storage are done remotely – in the case of Gmail, on Google’s supercomputer cluster. When you start up Gmail, large parts of your in-box are downloaded into your computer’s memory and displayed in your browser as needed. This makes Gmail dramatically faster and more efficient than existing Web-based mail systems, where messages and mailbox lists have to be downloaded again and again every time you display a new Web page.

In recent months, Gmail has introduced a message editor that lets users bold and italicize text or change fonts within a message – much the way you can in a PC-based e-mail program like Microsoft Outlook. There’s even an “autosave” feature, so that if your browser crashes you don’t lose the message that you were composing. And Gmail can now be integrated with Google Desktop; for example, you can download your e-mails to your Windows-based computer and search and read them when you are not online. All of this is made possible by Gmail’s Ajax architecture.

So if Google is applying Ajax with such skill, why am I still concerned about privacy and security?

When most people think about privacy, they think about the threat of accidental disclosure of personal information. When they think about online security, they tend to think about worms, viruses, and phishing attacks – active attacks by bad people or bad software.

But privacy and security are more complex. Privacy, for instance, includes not just the right to keep personal matters out of the public eye but also the right to be free from intrusion – the right to be “let alone,” as Samuel Warren and Louis Brandeis put it in their famous 1890 Harvard Law Review article “The Right to Privacy.” Gmail’s advertisements may be less intrusive than those of Hotmail and Yahoo, but they are intrusive nevertheless.

Google argues in its updated privacy policy that users should have the right to choose to read their e-mail through a free, advertiser-supported service. But of course, Google does not in fact offer a choice: there is no fee-based, advertising-free version of Gmail. I note this not to be obnoxious – clearly, Google can argue for choice in the market without itself having to offer more than one option – but to call attention to the most important characteristic of Google’s business model.

That characteristic is this: fee-based consumer services are not part of Google’s business model at all. Although Google is often called a search company or an e-mail provider, it earns its billions by selling clicks on targeted advertisements. Everything else is merely the honey designed to attract enough attention that some of it will spill onto those ads. Gmail’s users are not Google’s customers; they are its product. I personally find advertisements highly distasteful and have shied away from Gmail for that reason.

Far more troubling for me, however, is Gmail’s data security story.

Like privacy, security is a much deeper concept than most Internet users realize. Being free from spyware and viruses is important, certainly. But so is data integrity – retaining data whole, without additions, deletions, or other modifications. While Google provides a ton of storage and great availability, there is no obvious way to back up your e-mail once it has been delivered, read, and archived. This means that you have no choice but to trust Google totally for your data integrity.

But nowhere in Gmail’s “Terms of Use” does the company promise that it won’t delete some or all of your mail – now, or in the future. In fact, the termination clause of Gmail’s policy gives the company the right to delete any account, for any reason, at any time, with no user recourse.

Gmail could provide a backup system, of course. Google Desktop already downloads mail in the background for offline access, and it would be trivial to let users save that e-mail in archive files on their hard drives, for subsequent burning onto CD-ROMs or DVDs. Perhaps Gmail will do this in the future. But it doesn’t do it now.

The mere existence of that huge archive of personal e-mail – an archive that can neither be backed up nor deleted on demand – should give users pause. For example, such an archive could become a one-stop-shopping destination for subpoenas in civil litigation and criminal investigations. Gmail’s early adopters now have nearly two years’ worth of mail archived in the system – an attractive body of evidence in, say, a nasty divorce proceeding.

The preservation of old messages wasn’t previously a concern because earlier online e-mail providers like Hotmail didn’t offer their users enough storage. Also, folder-based archives give users a strong incentive to throw most messages away rather than keeping them all. And of course, if you download your e-mail with POP (the post office protocol) and keep it on a hard drive in your living room, you are responsible for the security of your mail – and you have the option of fighting a subpoena in court rather than turning over your files.

Many of my concerns could be addressed through the clever use of encryption. Mail could be encrypted while stored on Google’s servers and only decrypted when it is displayed to Gmail users. This would dramatically reduce the risk of a subpoena: now an attorney fishing for incriminating documents would have to demand not just e-mail but also the user’s decryption key. This would give users more opportunities to fight subpoenas – or perhaps to “lose” their keys.

Whether or not these risks actually matter to you depends on what uses, if any, you make of the Gmail service. But how Google responds to persistent concerns about privacy and data security should matter to everyone who uses the Web. For better or worse, Google remains the hottest Internet company on the planet – and the example it sets with Gmail will shape the products and policies of hundreds of other companies using Ajax technology to build new Web-based services.

Home page image courtesy of Jason Schneider.

Simson Garfinkel is a postgraduate fellow at Harvard University’s Center for Research on Computation and Society.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.