Microsoft and Google Want to Let Artificial Intelligence Loose on Our Most Private Data

New ways to use machine learning without risking sensitive data could unlock new ideas in industries like health care and finance.

Tom Simonitearchive page

April 19, 2016

The recent emergence of a powerful machine-learning technique known as deep learning has made computing giants such as Google, Facebook, and Microsoft even hungrier for data. It’s what lets software learn to do things like recognize images or understand language.

Yet many problems where deep learning could be most valuable involve data that is hard to come by or is held by organizations that are unwilling to share it. And as Apple CEO Tim Cook puts it, some consumers are already concerned about companies “gobbling up” their personal information.

“A lot of people who hold sensitive data sets like medical images are just not going to share them for legal and regulatory concerns,” says Vitaly Shmatikov, a professor at Cornell Tech who studies privacy. “In some sense we’re depriving these people from the benefits of deep learning.”

Shmatikov and researchers at Microsoft and Google are all working on ways to get around that privacy problem. By providing ways to use and train the artificial neural networks used in deep learning without needing to gobble up everything, they hope to be able to train smarter software, and convince the guardians of sensitive data to make use of such systems.

Shmatikov and colleague Reza Shokri are testing what they call “privacy-preserving deep learning.” It provides a way to get the benefit of multiple organizations—say, different hospitals—combining their data to train deep-learning software without having to take the risk of actually sharing it.

Each organization trains deep-learning algorithms on its own data, and then shares only key parameters from the trained software. Those can be combined into a system that performs almost as well as if it were trained on all the data at once.

The Cornell research was partly funded by Google, which has published a paper on similar experiments and is talking with Shmatikov about his ideas. The company’s researchers invented a way to train the company’s deep-learning algorithms using data such as images from smartphones without transferring that data into Google’s cloud.

That could make it easier for the company to leverage the very personal information held on our mobile devices, they wrote. Google declined to make someone available to discuss that research, but Shmatikov believes the company is still working on it.

Microsoft’s cryptography research group has developed its own solution to machine learning’s privacy problem. It invented a way to use trained deep-learning software on encrypted data and spit out encrypted answers. The idea is that a hospital, for example, could ask Microsoft to use one of these “CryptoNets” to flag medical scans containing possible problems, avoiding the usual need to expose the images to the company.

The Microsoft researchers pulled off that trick using a technique called homomorphic encryption, which makes it possible to perform mathematical operations on encrypted data and produce an encrypted result (see “10 Breakthrough Technologies 2011: Homomorphic Encryption”). They have tested the idea using deep-learning software that recognizes handwriting, and a system that estimates a patient’s risk of pneumonia from his vital signs.

A CryptoNet requires more computing power than conventional deep-learning software to do the same work. But Kristin Lauter, who leads Microsoft’s cryptography research, says the gap is small enough that CryptoNets could become practical for real-world use. “The health, financial, and pharmaceutical industries are where I think this is most likely to be used first,” she says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

Will Douglas Heavenarchive page

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Casey Crownhartarchive page

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

Will Douglas Heavenarchive page

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Cassandra Willyardarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Microsoft and Google Want to Let Artificial Intelligence Loose on Our Most Private Data

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

The problem with plug-in hybrids? Their drivers.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

How scientists traced a mysterious covid case back to six toilets

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

The problem with plug-in hybrids? Their drivers.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

How scientists traced a mysterious covid case back to six toilets

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review