Google launches an open-source version of its differential privacy library
Google today released an open-source version of the differential privacy library it uses to power some of its own core products. Developers will be able to take this library and build their own tools that can work with aggregate data without revealing personally identifiable information either inside or outside their companies.
"Whether you're a city planner, a small business owner, or a software developer, gaining useful insights from data can help make services work better and answer important questions," writes Miguel Guevara, a product manager in the company's Privacy and Data Protection Office. "But, without strong privacy protections, you risk losing the trust of your citizens, customers, and users. Differentially-private data analysis is a principled approach that enables organizations to learn from the majority of their data while simultaneously ensuring that those results do not allow any individual's data to be distinguished or re-identified."
As Google notes, the current version of the Apache-licensed C++ library focuses on features that are typically hard to build from scratch and includes many of the standard statistical functions that developers would need (think count, sum, mean, variance, etc.). The company also stresses that the the library includes an additional library for "rigorous testing" (because getting differential privacy right is hard), as well as a PostreSQL extension and a number of recipes to help developers get started.
These days, people often roll their eyes when they see 'Google' and 'privacy' in the same sentence. That's understandable (though I think there is considerable tension inside the company about this, too). In this case, however, this is unquestionably a useful tool for developers that will allow them and the users they serve to build tools that analyze personal data without compromising the privacy of the people whose data they are working with. Typically, building those takes some considerable expertise, to the point where they may either not build them or simply not bother to to include these privacy features. With a library like this, they have no excuse not to implement differential privacy.