Microsoft and IBM Researchers Develop a Lie Detector for the Cloud

A way to check whether calculations have been tampered with could make cloud computing more reliable, and boost privacy.

Tom Simonitearchive page

June 3, 2013

It is now common for all kinds of data—from personal photos to business documents—to be stored on third-party servers. But despite increased use of outside commodity “cloud computing” equipment, confidence that a third-party service is using your data appropriately is still based more on old-fashioned trust than on technology. As digital break-ins at Twitter and LinkedIn in recent months show, even the biggest services aren’t immune to attack, and this is a big challenge to companies looking to outsource calculations related to sensitive data.

Software called Pinocchio, created by researchers at IBM and Microsoft, shows one possible solution. It serves as a kind of lie detector that can be used to check whether a cloud service did the work it was supposed to, or whether it may have been compromised and forced to do something else. The software could also be used to improve privacy, by providing a dependable way for companies to process personal data remotely rather than bringing it all back to their servers.

Pinocchio takes a set of operations written in the C programming language and converts them into a version with a verification system built into the code. This new set is then provided to the cloud service being used to do the work. The conversion step also produces a verification key that can be used to check that the results sent back by the cloud service really were produced by performing the operations requested.

“The verification key behaves like a digital signature, in that you can provide it to any third party to check a result,” says Bryan Parno, one of the Microsoft researchers working on Pinocchio. Parno developed Pinocchio with Microsoft colleague Jon Howell, as well as Craig Gentry and Mariana Raykova of IBM. Gentry is known within the field for proving it is possible for cloud services to work on encrypted data without having to decrypt it, thereby keeping that data secure (see “Computing with Secrets, but Keeping Them Safe”).

Currently, the only way to know for sure that a cloud provider did the work it was asked to do is to check by performing the work again, which defeats the object of outsourcing in the first place, says Parno. Companies can guard against cheats or errors by randomly checking a handful of results, or asking multiple providers to do the same work; but this doesn’t ensure the integrity of calculations. “That doesn’t give you a strong guarantee,” he says.

The Pinocchio approach could also be used to improve the privacy of systems that gather fine-grained personal data and send it back to a central server, says Parno. Smart electric meters, for example, collect data that’s detailed enough to reveal what appliances are in a given home, and how many people are there at any time (see “TR35: Shwetak Patel”). A household’s bill is calculated by sending all that data to the provider, but it could be calculated locally if there was a way for the provider to check that someone hadn’t reprogrammed its device to get a discount.

“The provider could send the billing function to the meter, which could calculate the bill in a way that could be verified without having to send any reading back,” says Parno.

The idea of using a system like Pinocchio has been proposed before, but under previous implementations, checking a result took longer than doing the work itself. Tests with several example programs show that Pinocchio is less intensive for certain mathematical operations, including those at the core of some recommendation systems, says Parno.

However, for many tasks, using Pinocchio is still more work than simply repeating the original task, and although it performs at least 100,000 times better than earlier prototypes, it is still not ready for real-world use. Pinocchio could today be called “nearly practical,” says Parno. “We’ve certainly come a long way, but we need another iteration or two to be truly practical.”

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.