TR: What were the challenges with developing this software?
JJ: One of the challenges is when you one-way hash the data, it becomes “infinitely sensitive.” What I mean by that is that the word robert, if you one-way hash it, and take Robert, where the r is capital and not lowercase, the one-way hash generated by this subtle difference is completely different.
One of the reasons people didn’t try to do this before, or it was believed that maybe it wasn’t useful, is that people’s identity data is always quite different – sometimes with a middle initial, sometimes without. Identities just don’t show up the same. That was the trick we had to solve: allowing it to match data that’s fuzzy while only using one-way hashed values.
The trick is in how we prepare the data. Here’s a simple example. One list says Bob and one says Rob. Well, we know that both Bob and Rob belong to the same root name, in this case, Robert. So before we anonymize each side, we throw in the most rooted form, which is Robert. So we’ve added Robert to both lists, and we then one-way hash both lists so it turns out the Robert matches.
TR: How is this is based on earlier work you did for Las Vegas casinos?
JJ: The ability to figure out if two people are the same despite all the natural variability of how people express their identity is something we really got a good understanding of assisting the gaming industry. We also learned how people try to fabricate fake identities and how they try to evade systems. It was learning how to do that at high speed that opened the door to make this next thing possible. Had we not solved that in the 1990s, we would not have been able to conjure up a method to do anonymous resolution.
TR: You’ve said that 40 percent of your time is spent on privacy and civil liberties issues and that a privacy strategist works with you. Could you give me an example of the sort of things you and your privacy strategist discuss?
JJ: When the government has a watch list –- this, by the way, doesn’t have to do with our tech, this is about responsible usage of tech and improved processes – when you have a watch list, the questions come up: Who’s on the list? How can people find out if they’re on the list? How can they get off the list if they’re not supposed be on it? If a government has a list and they’re sharing it, making copies of it, and somebody’s removed from the list because they’ve made a mistake, how can you be sure that they’re removed from everywhere else they shared it?
Another thing that my privacy strategist and I have been talking about is called an “immutable audit log.”
TR: What’s that?
JJ: You want to make sure that someone who is using a secret government system isn’t putting their ex-wife in a watch list or searching for their ex-wife or their neighbor just because they’re curious. That would be a misuse. An immutable audit log is the notion that every time a user queries for a record, this new kind of audit log records it in an indelible way that’s like etching it into stone. In other words, even if a database administrator was in cahoots with them, or the database administrator was a corrupt entity, they couldn’t erase their own footprints.