Blindfolding Big Brother, Sort of

Jeff Jonas is an IBM engineer who specializes in software that infuses powerful search technology with anonymity.

Kate Greenearchive page

January 30, 2006

In 1983, entrepreneur Jeff Jonas founded Systems Research and Development (SRD), a firm that provided software to identify people and determine who was in their circle of friends. In the early 1990s, the company moved to Las Vegas, where it worked on security software for casinos. Then, in January 2005, IBM acquired SRD and Jonas became chief scientist in the company’s Entity Analytic Solutions group.

His newest technology, which allows entities such as government agencies to match an individual found in one database to that same person in another database, is getting a lot of attention from governments, banks, health-care providers, and, of course, privacy advocates. Jonas claims that his technology is as good at protecting privacy as it as at finding important information.

Technology Review: Your most recent project at IBM, Anonymous Resolution [formerly known as ANNA], is software that can match a given individual across different databases, but in the process safeguards personal identifiers – for example, name and social security number – in those databases. Who would use this software? What problem does it solve?

Jeff Jonas: The software is used by organizations that have data, have access to data, or have relationships with other entities with whom they want to exchange data. For example, a bank will take data about its customers and encrypt it. Then they’ll send the data to a database marketing company. That company will decrypt it, and match the bank’s customers to various records that the marketing company would have. For example, records that show what kind of magazines you subscribe to, how big your house is, the number of children you have, and so on. And then the marketing company will send back to the bank what’s called a “database marketing append,” so the bank will understand better who its customers are.

That’s very commonplace. But the risk is that even though the data is encrypted while being transported, it is decrypted by the other party. If the people who are managing that data happen to be corrupt or they have a breech of their system’s security, that data’s at risk for an unintended disclosure event.

TR: How does your software solve this problem?

JJ: The technique that we have created allows the bank to anonymize its customer data. When I say “anonymize,” I mean it changes the name and address and date of birth, or whatever data they have about an identity, into a numeric value that is nonhuman readable and nonreversible. You can’t run the math backwards and compute from the anonymized value what the original input value was.

When I went to invent this software, I could have done this with encryption, where the data could be decrypted; but I felt like it would be a stronger privacy product if we didn’t invent it that way. So the unique thing about the technique is that instead of me encrypting data and sending it to you, and you decrypting it to use it, the technique allows me to encrypt my data, you to encrypt your data, and this new technology is capable of performing robust matching of identities using only encrypted data.

To put [data] on the highest possible privacy grounds, instead of making it encrypted, we actually used one of the components of encryption called one-way hashing that is not reversible.

TR: Who currently uses your software?

JJ: The current customers are governments that are interested in using this to share data with themselves. This is an interesting notion that I think would be a shock to most citizens of any country: You can walk into any government organization and you’ll have one group working on, say, money laundering, and ten doors down you have another group working on drug cartels. The only way they have today to figure out whether they’re working on the same person is to play the game that I refer to as Go Fish. That means one of them has to pick up the phone and call the other and say “Majed Moqed? Khalid al Mihdhar? Threes? Tens? Jacks? Twos?” They’re not going to read the whole list.

This technique allows an entity, whether it’s corporate or government, to compare data that’s trapped in silos, sensitive data that you wouldn’t want to escape. The identity data flows into a central index, and in that index, it figures out when people are the same or related. But it can’t tell you the name or the address or the phone number of the people who are the same because it doesn’t know. When there’s a match, each of the records that match has its pedigree or attribution on it that tells you which system and which record. So it creates a pointer and tells you which record to ask the other group about.

TR: And this is obviously useful for counterterrorism.

JJ: Here’s the scenario: The government has a list of people we should never let into the country. It’s a secret. They don’t want people in other countries to know. And the government tends to not share this list with corporate America. Now, if you have a cruise line, you want to make sure you don’t have people getting on your boat who shouldn’t even be in the United States in the first place. Prior to the U.S. Patriot Act, the government couldn’t go and subpoena 100,000 records every day from every company. Usually, the government would have to go to a cruise line and have a subpoena for a record. Section 215 [of the Patriot Act] allows the government to go to a business entity and say, “We want all your records.” Now, the Fourth Amendment, which is “search and seizure,” has a legal test called “reasonable and particular.” Some might argue that if a government goes to a cruise line and says, “Give us all your data,” it is hard to envision that this would be reasonable and particular.

But what other solution do they have? There was no other solution. Our Anonymous Resolution technology would allow a government to take its secret list and anonymize it, allow a cruise line to anonymize their passenger list, and then when there’s a match it would tell the government: “record 123.” So they’d look it up and say, “My goodness, it’s Majed Moqed.” And it would tell them which record to subpoena from which organization. Now it’s back to reasonable and particular.

TR: What were the challenges with developing this software?

JJ: One of the challenges is when you one-way hash the data, it becomes “infinitely sensitive.” What I mean by that is that the word robert, if you one-way hash it, and take Robert, where the r is capital and not lowercase, the one-way hash generated by this subtle difference is completely different.

One of the reasons people didn’t try to do this before, or it was believed that maybe it wasn’t useful, is that people’s identity data is always quite different – sometimes with a middle initial, sometimes without. Identities just don’t show up the same. That was the trick we had to solve: allowing it to match data that’s fuzzy while only using one-way hashed values.

The trick is in how we prepare the data. Here’s a simple example. One list says Bob and one says Rob. Well, we know that both Bob and Rob belong to the same root name, in this case, Robert. So before we anonymize each side, we throw in the most rooted form, which is Robert. So we’ve added Robert to both lists, and we then one-way hash both lists so it turns out the Robert matches.

TR: How is this is based on earlier work you did for Las Vegas casinos?

JJ: The ability to figure out if two people are the same despite all the natural variability of how people express their identity is something we really got a good understanding of assisting the gaming industry. We also learned how people try to fabricate fake identities and how they try to evade systems. It was learning how to do that at high speed that opened the door to make this next thing possible. Had we not solved that in the 1990s, we would not have been able to conjure up a method to do anonymous resolution.

TR: You’ve said that 40 percent of your time is spent on privacy and civil liberties issues and that a privacy strategist works with you. Could you give me an example of the sort of things you and your privacy strategist discuss?

JJ: When the government has a watch list –- this, by the way, doesn’t have to do with our tech, this is about responsible usage of tech and improved processes – when you have a watch list, the questions come up: Who’s on the list? How can people find out if they’re on the list? How can they get off the list if they’re not supposed be on it? If a government has a list and they’re sharing it, making copies of it, and somebody’s removed from the list because they’ve made a mistake, how can you be sure that they’re removed from everywhere else they shared it?

Another thing that my privacy strategist and I have been talking about is called an “immutable audit log.”

TR: What’s that?

JJ: You want to make sure that someone who is using a secret government system isn’t putting their ex-wife in a watch list or searching for their ex-wife or their neighbor just because they’re curious. That would be a misuse. An immutable audit log is the notion that every time a user queries for a record, this new kind of audit log records it in an indelible way that’s like etching it into stone. In other words, even if a database administrator was in cahoots with them, or the database administrator was a corrupt entity, they couldn’t erase their own footprints.

TR: Is there anything a person can do, other than living off the grid, to keep their digital trail to a minimum?

JJ: Oh, boy – that’s a great, great question. As consumers, we often trade our information, creating a bigger footprint, because of some opportunity being extended to us. And the biggest privacy problem I have with that is when it is a surprise to the consumer. My advice to companies and governments is to avoid consumer surprise. That’s one of the most offensive things: when you find out somebody’s doing something with your data about which you had no clue.

So my advice is to avoid consumer surprise, and that means having some degree of transparency. I believe consumers should be offered the opportunity to opt out. So the organizations that you transact with, the ones that allow the consumer to say “Hey, please don’t sell my data” and those organizations that make it easy for the consumer to opt in or out – I think consumers may eventually flock to those places where they feel the risk of consumer surprise is less.

TR: The Department of Justice has subpoenaed some of Google’s data, and the company is refusing to cooperate. What is your opinion on this?

JJ: I haven’t been following this very closely. But let’s talk about consumer surprise. I think it would be a surprise to consumers [to find] that they would be identified to the government at individual levels. I think consumers would be less surprised if Google provided just statistics.

TR: As an engineer concerned with privacy issues, what is your opinion on the NSA domestic wiretapping program?

JJ: I have not read up much on that. I don’t know whether it’s legal or not legal. I would say if it turns out to be legal and it’s going to continue, then I would say, “Could you do it with anonymous data?”

TR: As an entrepreneur, you’ve successfully looked around, found a problem, and solved it with software. In your mind, what is the most important problem to be solved today?

JJ: Picture this: We’re in a canyon, and on the left there’s this wall, and behind it is this back pressure, and that back pressure is “ill-will” that wants to do harm to democracy or the United States. And behind the other wall it is a police surveillance state. And the number of technology options that you have that don’t turn you into a police surveillance state and that prevent the ill-will intent on the left are in the middle. There are a very narrow number of solutions between these canyon walls. But the problem is, should ill-will continue to grow, the pressure behind the wall on the left becomes such that, as we march forward through time, the canyon gets narrower and narrower, and eventually you have bad things happening and you have to be a police surveillance state to protect yourself.

But the real thing that has nothing to do with technology is, if we don’t figure out how to lower ill-will, our future is darker.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.