The recent emergence of a powerful machine-learning technique known as deep learning has made computing giants such as Google, Facebook, and Microsoft even hungrier for data. It’s what lets software learn to do things like recognize images or understand language.
Yet many problems where deep learning could be most valuable involve data that is hard to come by or is held by organizations that are unwilling to share it. And as Apple CEO Tim Cook puts it, some consumers are already concerned about companies “gobbling up” their personal information.
“A lot of people who hold sensitive data sets like medical images are just not going to share them for legal and regulatory concerns,” says Vitaly Shmatikov, a professor at Cornell Tech who studies privacy. “In some sense we’re depriving these people from the benefits of deep learning.”
Shmatikov and researchers at Microsoft and Google are all working on ways to get around that privacy problem. By providing ways to use and train the artificial neural networks used in deep learning without needing to gobble up everything, they hope to be able to train smarter software, and convince the guardians of sensitive data to make use of such systems.
Shmatikov and colleague Reza Shokri are testing what they call “privacy-preserving deep learning.” It provides a way to get the benefit of multiple organizations—say, different hospitals—combining their data to train deep-learning software without having to take the risk of actually sharing it.
Each organization trains deep-learning algorithms on its own data, and then shares only key parameters from the trained software. Those can be combined into a system that performs almost as well as if it were trained on all the data at once.
The Cornell research was partly funded by Google, which has published a paper on similar experiments and is talking with Shmatikov about his ideas. The company’s researchers invented a way to train the company’s deep-learning algorithms using data such as images from smartphones without transferring that data into Google’s cloud.
That could make it easier for the company to leverage the very personal information held on our mobile devices, they wrote. Google declined to make someone available to discuss that research, but Shmatikov believes the company is still working on it.
Microsoft’s cryptography research group has developed its own solution to machine learning’s privacy problem. It invented a way to use trained deep-learning software on encrypted data and spit out encrypted answers. The idea is that a hospital, for example, could ask Microsoft to use one of these “CryptoNets” to flag medical scans containing possible problems, avoiding the usual need to expose the images to the company.
The Microsoft researchers pulled off that trick using a technique called homomorphic encryption, which makes it possible to perform mathematical operations on encrypted data and produce an encrypted result (see “10 Breakthrough Technologies 2011: Homomorphic Encryption”). They have tested the idea using deep-learning software that recognizes handwriting, and a system that estimates a patient’s risk of pneumonia from his vital signs.
A CryptoNet requires more computing power than conventional deep-learning software to do the same work. But Kristin Lauter, who leads Microsoft’s cryptography research, says the gap is small enough that CryptoNets could become practical for real-world use. “The health, financial, and pharmaceutical industries are where I think this is most likely to be used first,” she says.