Now things are different. We’re seeing more and more problems arise not because data isn’t protected, but because it’s improperly protected.
Until recently, few people outside of academia paid much attention to a concept called “information flow security,” which involves checking data as it interacts with software. Then along came “This Is Almost Certainly James Comey's Twitter Account,” a Gizmodo article that captures everything it means to be a modern information leak.
The story behind the article: a journalist named Ashley Feinberg wanted to find FBI director James Comey’s secret Twitter account. She started digging around the Internet and was able to uncover the account in just four hours, in large part by exploiting a key information flow bug in Instagram.
Feinberg describes how she used facts about Comey and his family to find a public tweet leading her to a public Instagram comment leading her to the protected Instagram page of Comey’s 22-year-old son, Brien.
After Feinberg sent requests to follow Brien, Instagram recommended that she also follow “reinholdniebuhr,” another protected Instagram account that matched what Feinberg already knew about James Comey’s Instagram account. Even better, the Twitter handle with the same name matched what Feinberg knew about Comey’s Twitter account.
There are many things happening here, many of which are outside the control of a software developer at Instagram, but at the core of this leak is an information flow bug. Feinberg relied on several key public facts about Comey, but she would not have been able to find his Twitter account had Instagram not inadvertently provided the vital clues. And Instagram would not have revealed this information had the code been properly enforcing information flow security.
There’s an inconsistency between how Instagram protects the information on account profiles when users try to access it and how it protects this information when it’s used in various algorithms. When you try to view the Instagram page of a protected user, you cannot access that person’s photos or see who that user is following. It turns out, however, that the protected account information is visible to algorithms that suggest other users to follow, a feature that becomes—incorrectly—visible to all viewers once a follow is requested.
In this case the policy violation is particularly insidious because what is actually leaked, reinholdneibuhr’s profile photo and name, are both public across Instagram. What should be private is the relationship between Brien Comey and this reinholdneibuhr account. While it’s possible that Instagram randomly showed reinholdneibuhr as a recommended account to follow out of its 600 million active monthly users, what is more likely, especially given that the other recommended users had the last name Comey, is that Instagram’s recommendation algorithm used secret “follow” information to compute which accounts to recommend. In information flow nomenclature, the leak of secret information through displaying public information is called an “implicit flow.”
Just as proper encryption would have prevented the Target and Sony hacks, there are solutions for preventing information leaks like this one. There are decades of research on information flow security techniques: some that check software before it runs, others that monitor software as it’s running. This work is much more than theoretical: people have built operating systems and Web frameworks based on these ideas. Such systems would have detected if a recommendation algorithm was leaking secret follow information and prevented the leak. But even with these approaches, the programmer still needs to reason about the complex and subtle interaction of policies with each other and with the code to produce software that doesn’t leak information.
In my lab, we’re attempting to make it easier for developers to implement information flow policies. We help the machine to take responsibility for managing the interaction between policies and with the program to make sure recommendation algorithms don’t leak information. The policies also specify what values the machine can use when the actual values must be kept secret. For example, if a search algorithm isn’t allowed to use someone’s exact location, maybe it can use the corresponding city.
Even though digital security should be one of the main concerns of the FBI, Comey couldn’t avoid the problems that arise from the mess of policy spaghetti that is modern code. Though this leak affected many fewer people than the large data breaches, it marks an important shift in information security.
Until now, most people have thought about security in terms of protecting individual data items, rather than in terms of complex and subtle interactions with the programs that use them. But we now live in a world in which the director of the FBI trusts our Internet infrastructure, and Instagram, enough to put 3,227 private photographs online. On the one hand, this means that we’ve reached a certain level of security. On the other, it means that we can now focus on more advanced security problems. And when anyone with good deduction and access to the Internet can find out all sorts of information, our work on information security is far from over.