Deep Dive#

What is Qrlew?#

Qrlew is an open source library that can parse SQL queries into Relations — an intermediate representation — that keeps track of rich data types, value ranges, and row ownership; so that they can easily be rewritten into differentially-private equivalent and turned back into SQL queries for execution in a variety of standard data stores.

With Qrlew, a data practitioner can express their data queries in standard SQL; the data owner can run the rewritten query without any technical integration and with strong privacy guarantees on the output; and the query rewriting can be operated by a privacy-expert who must be trusted by the owner, but may belong to a separate organization.

_images/qrlew_process.svg

Fig. 2 The rewriting process occurs in three stages: The data practitioner’s query is parsed into a Relation, which is rewritten into a DP equivalent and finally executed by the the data owner which returns the privacy-safe result.#

Main Features#

Qrlew provides the following features:

  • Qrlew provides automatic output privacy guarantees: With Qrlew a data owner can let an analyst (data practitioner) with no expertise in privacy protection run arbitrary SQL queries with strong privacy guarantees on the output.

  • Qrlew leverages existing infrastructures: Qrlew rewrites a SQL query into a differentially private SQL query that can be run on any data-store with a SQL interface: from lightweight DB to big-data stores. This removes the need for a custom execution engine and enables differentially private analytics with virtually no technical integration.

  • Qrlew leverages synthetic data: Synthetic data are an increasingly popular way of privatizing a dataset. Using jointly: differentially private mechanisms and differentially private synthetic data can be a simple, yet powerful, way of managing a privacy budget and reaching better utility-privacy tradeoffs.

Design Goals#

Qrlew assumes the central model of differential privacy [AGKZ18], where a trusted central organization: hospital, insurance company, utility provider, called the data owner, collects and stores personal data in a secure database and whishes to let untrusted data practitioners run SQL queries on its data.

At a high level we pursued the following requirements:

  • Ease of use for the data practitioners. The data practitioners are assumed to be a data experts but no privacy experts. They should be able to express their queries in a standard way. We chose SQL as the query language as it is very commonly used for analytics tasks.

  • Ease of integration for the data owner. As SQL is a common language to express data analysis tasks, many data-stores support it from small embedded databases to big data stores.

  • Simplicity for the data owner to setup privacy protection. Differential privacy is about capping the sensitivity of a result to the addition or removal of an individual that we call privacy unit. Qrlew assumes that the data owner can tell if a table is public and, if it is not, that it can assign exactly one privacy unit to each row of data. In the case there are multiple related tables, Qrlew enables to define easily the privacy units for each tables transitively.

  • Simple integration with other privacy enhancing technologies such as synthetic data. To avoid repeated privacy losses or give result when a DP rewriting is not easily available (e.g. when the query is: SELECT * FROM table) Qrlew can use synthetic data to blend in the computation.

These requirements dictated the overall query rewriting architecture and many features, the most important of which, are detailed below.

How does Qrlew work?#

The Qrlew library, solves the problem of running a SQL query with DP guarantees in three steps. First the SQL query submitted by the data practitioner is parsed and converted into a Relation, this Relation is an intermediate representation that is designed to ease the tracking of data types ranges or possible values, to ease the tracking of the privacy unit and to ease the rewriting into a DP Relation. Then, the rewriting into DP happens. Once the relation is rewritten into a DP one, it can be rendered as an SQL query string and submitted to the data store of the data owner. The output can then safely be shared with the data practitioner. This process is illustrated in Fig. 2.

Qrlew Intermediate Representation#

As the SQL language is very rich and complex, simply parsing a query into an abstract syntax tree does not produce a convenient representation for our needs. Therefore, it is converted into a simpler normalized representation with properties well aligned with the requirements of Differential Privacy: the Relation. A Relation is a collection of rows adhering to a given schema. It is a recursively defined structure composed of:

  • Tables: This is simply a data source from a database.

  • Maps: A Map takes an input Relation, filters the rows and transform them one by one. The filtering conditions and row transforms are expressed with expressions similar to those of SQL. It acts as a SELECT exprs FROM input WHERE expr LIMIT value and therefore preserves the privacy unit ownership structure.

  • Reduces: A Reduce takes an input Relation and aggregates some columns, possibly group by group. It acts as a SELECT aggregates FROM input GROUP BY expr. This is where the rewriting into DP will happen as described bellow.

  • Joins: This Relation combines two input Relations as a SELECT * FROM left JOIN right ON expr would do it. The privacy properties are more complex to propagate in this case.

It may also be a static list of values or a set operation between two Relations, but those are less important for our uses.

_images/relation.svg

Fig. 3 Relation (Map) associated to the query: SELECT a, count(abs(10*a+b)) AS x FROM table_1 WHERE b>-0.1 AND a IN (1,2,3) GROUP BY a. The arrows point to the inputs of each Relation. Note the propagation of the data type ranges.#

This representation is central to Qrlew; all the features described below are built upon it. A Relation, along with all the sub-Relations it depends on, will be called the computation graph or the graph of a Relation.

Range Propagation#

Most DP mechanisms aggregating numbers require the knowledge of some bounds on the values (see [DR+14]). Even if some bounds are known for some Relations like source Tables, it is not trivial to propagate these bounds through the steps of the computation.

To help with range propagation, Qrlew introduces two useful concepts:

  • The concept of \(k\)-Interval, which are finite unions of at most \(k\) closed intervals. A \(k\)-Interval can be noted:

    \[I = \bigcup_{i=1}^{j\leq k}\left[a_i, b_i\right]\]
    Note that the union of \(k\)-Intervals may not be a \(k\)-Interval as it may be the union of more than \(k\) intervals. Unions of many intervals can be simplified into their convex envelope interval, which are often sufficient bounds approximations for our use cases:
    \[J = \bigcup_{i=1}^{j> k}\left[a_i, b_i\right] \subseteq \left[\min_i a_i, \max_i b_i\right]\]

  • And the concept of piecewise-monotonic-functions[1], which are functions \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) whose domain can be partitioned in cartesian products of intervals: \(P_j\) on which they are coordinatewise-monotonic. The image of a cartesian product of \(n\) \(k\)-Intervals by a piecewise-monotonic-function can be easily computed as a \(k\)-Interval. Indeed, let \(I\) be:

    \[\begin{split}I = I_1\times I_2\times \ldots \times I_n = \bigcup_{\substack{1\leq i_1\leq k\\\ldots\\1\leq i_n\leq k}}\left[a_{i_1}, b_{i_1}\right]\times \ldots \times \left[a_{i_n}, b_{i_n}\right]\end{split}\]
    If \(f\) is piecewise-monotonic, then one can show that on each partition \(P_j\) where it is coordinatewise-monotonic, if we note:
    \[\begin{split}I_j = I \cap P_j = \bigcup_{\substack{1\leq j_1\leq k\\\ldots\\1\leq j_n\leq k}}\left[a_{j_1}, b_{j_1}\right]\times \ldots \times \left[a_{j_n}, b_{j_n}\right]\end{split}\]
    \[\begin{split}f(I_j) = \bigcup_{\substack{1\leq i_1\leq k\\\ldots\\1\leq i_n\leq k}}\text{Conv}\left(f\left( \left\{a_{i_1}, b_{i_1}\right\}\times \ldots \times \left\{a_{i_n}, b_{i_n}\right\}\right)\right)\end{split}\]
    where \(\text{Conv}\left(f\left( \left\{a_{i_1}, b_{i_1}\right\}\times \ldots \times \left\{a_{i_n}, b_{i_n}\right\}\right)\right)\) can be efficiently computed in \(n\) steps, without testing all the \(2^n\) combinations, thanks to the coordinatewise monotony of \(f\) on \(P_j\). Then \(f(I) = \bigcup_j f(I_j)\), of which we can derive the bounding: \(f(I) \subseteq \text{Conv}\left(\bigcup_j f(I_j)\right)\) when the number of terms in the union exceeds \(k\).

The notion of \(k\)-Interval is convenient for tracking value bounds as it can express natural patterns in SQL such as:

  • WHERE x>0 AND x<=1, which translates into the implied \(x\in \left[0, 1\right]\) ;

  • WHERE x IN (1,2,3), which is also easily expressed as a k-Interval: \(x \in \left[1, 1\right] \cup \left[2, 2\right] \cup \left[3, 3\right]\)

The idea of piecewise-monotonic-function is also very useful as in SQL many standard arithmetic operators (+, -, \*, /, <, >, =, !=, …) and functions (EXP, LOG, ABS, SIN, COS, LEAST, GREATEST, …) are trivially piecewise-monotonic-function (in one, two or many variables).

Most of the range propagation in Qrlew is based on these concepts. It enables a rather simple and efficient range propagation mechanism, leading to better utility / privacy tradeoffs.

Privacy Unit Definition#

Tables in a database rarely come properly formatted for privacy-preserving applications. Many rows in many tables may refer to the same individual, hence, adding or removing an individual means adding or removing many rows. To help the definition of the privacy unit Qrlew introduces a small Privacy Unit (PU) description language. As exemplified in Listing 1, PU definition associates to each private table in a database a path defining the PID of each row. For a table containing the PU itself, like a users table for example, the PU definition will look like ("users",[],"id"), where id is the name of a column identifying the user, like its name. If the database defines tables related to this tables, the way the tables are related should be specified following this scheme: \((\mathtt{tab}_1, path, \mathtt{pid})\) where \(\mathtt{tab}_1\) is the name of the table for which the PID is defined, \(\mathtt{pid}\) is the name of the column defining the PID in the table referred by \(path\) and \(path\) is a list of elements of the form \([(\mathtt{ref}_1, \mathtt{tab}_2, \mathtt{id}_2),\ldots, (\mathtt{ref}_{m-1}, \mathtt{tab}_m, \mathtt{id}_m)]\) where \(\mathtt{ref}_{i-1}\) is a column in \(\mathtt{tab}_{i-1}\) — usually a foreign key — referring to \(\mathtt{tab}_i\) with a column of referred id \(\mathtt{id}_i\) — usually a primary key. Following the path of tables referring to one another, we end up with the table defining the PID (e.g. users).

This small PU description language allows for a variety of useful PID scenarii, beyond the simple, but restrictive privacy per row.

Listing 1 Example of privacy unit definition for a database with three tables holding users, orders and items records. Each user is protected individually by designating their ids as PID. Orders are attached to a user through the foreign key: user_id. Items’s ownership is defined the same way by specifying the lineage: item -> order -> user.#
privacy_unit = [
    ("users",[],"id"),
    ("orders",[
    ("user_id","users","id")
    ],"id"),
    ("items",[
    ("order_id","orders","id"),
    ("user_id", "users", "id")
    ],"id")
]

Rewriting#

Rewriting in Qrlew, refers to the process of altering the computation graph by substituting computation sub-graphs to Relations (see Fig. 4) to alter the properties of the result. This substitution aims to achieve specific objectives, such as ensuring privacy through the incorporation of differentially private mechanisms. The rewriting process (see Fig. 4) happens in two phases:

  • a rewriting rule allocation phase, where each Relation in the computation graph gets allocated a rewriting rule (RR) compatible with its input and with the desired output property;

  • a rule application phase, where each Relation is rewritten to a small computation graph implementing the logic of the rewriting and stitched together with the other rewritten Relations.

_images/rewriting.svg

Fig. 4 The rewriting process happens in two phases: a rewriting rule allocation phase, where each node in the computation graph gets allocated a rewriting rule (RR) compatible with its input and with the desired output property; and a rule application phase, where each Relation is rewritten according to its allocated RR.#

Before we describe these phases into more details, let’s define the various properties we may want to guarantee on each Relation and the ones we need for the output.

Privacy Properties and Rewriting Rules#

Each Relation can have one of the following properties:

  • Privacy Unit Preserving (PUP): A Relation is PUP if each row is associated with a PU. In practice it will have a column containing the PID identifying the PU.

  • Differentially Private (DP): A Relation will be DP if it implements a DP mechanism. A DP Relation can be safely executed on private data and the result be published. Note that the privacy loss associated with the DP mechanism has to be accurately accounted for (see section Privacy Analysis).

  • Synthetic Data (SD): In some contexts a synthetic data version of source tables is available. Any Relation derived from other SD or Public Relations is itself SD.

  • Public (Pub) A relation derived from public tables is labeled as such and does not require any further protection to be disclosed.

  • Published (Pubd): A relation is considered Published if its input relations are either Public, DP, in some cases SD, or Published themselves. It can be considered as Published but with some more care like the need to account for the privacy loss incurred by its DP ancestors.

These properties usually require some rewriting of the computation graph to be achieved. The requirements for a specific Relation to meet some property are embodied in what we call: rewriting rules. A rewriting rule has input requirements, and an achievable output property that tells what property can be achieved by rewriting provided the input property requirements are fulfilled. Each Relation can be assigned different rewriting rules depending on their nature: Map, Reduce, etc. and the way they are parametrized.

Rewriting Rules can be — for instance — PU propagation rules of the form:

  • \(\varnothing \rightarrow PUP\) for private Tables with a simple rewriting consisting in taking the definition of the privacy unit and computing the PID column.

  • \(PUP \rightarrow PUP\) for Maps (or for Reduce when the PID is in the GROUP BY part) with a rewriting consisting in propagating the PID column from the input to the output.

  • \((PUP, PUP) \rightarrow PUP\) (or its variants with one published input) for Join and a rewriting consisting in adding the PID in the ON clause.

Another key Rewriting Rules is \(PUP \rightarrow DP\) for Reduces, it simply means that if the parent of the Relation can be rewritten as PUP, then we can rewrite the relation to be DP by substituting DP aggregations to the original aggregations of the Reduce.

One easily see that by simply applying \(PUP \rightarrow PUP\) and \(PUP \rightarrow DP\) rules, one can propagate the privacy unit across the computation graph of a Relation and compute some DP aggregate such as a noisy sum or average.

Rewriting Rule Allocation#

The first phase of the rewriting process consists in allocating one and only one rule to each Relation. This is done in three steps illustrated in Fig. 5:

  • Rule Setting: We assign the set of potential rewriting rules to each Relation in a computation graph.

  • Rule Elimination: Only feasible rewriting rules are preserved. A rewriting rule that would require a PUP input is only feasible if its input Relation has a feasible rule outputting a PUP Relation.

  • Rule Selection: All feasible allocations of one rewriting rule per Relation are listed, a score depending on the desired ultimate output property is assigned to each allocation and the highest scoring allocation is selected. Then, a simple split \(\left(\frac{\varepsilon}{n}, \frac{\delta}{n}\right)\) of the overall privacy budget \(\left(\varepsilon, \delta\right)\) depending on the number of \(PUP \rightarrow DP\) rules: \(n\) is chosen.

In the computation graph, while each node’s multiple rewriting rules might suggest a combinatorial explosion in the number of possible feasible allocations, this is mitigated in practice. The pruning of infeasible rules, dictated by the requirement for most relations to have a PUP input for a DP or PUP outcome, significantly reduces the complexity. Hence, despite the theoretical breadth of possibilities, the actual number of feasible paths remains manageable, avoiding substantial computational problems in practice.

_images/set_eliminate_select.svg

Fig. 5 The rewriting happens in three steps: Rule Setting when we assign the set of potential rewriting rules to each Relation in a computation graph; Rule Elimination, when only feasible rewriting rules are preserved; and Rule Selection, when an actual allocation is selected.#

Rule Application#

Once the first phase of rule allocation is achieved, starts the second phase: rule application, as illustrated in Fig. 4. In the allocation phase, a global rewriting scheme was set in the form of an allocation satisfying a system of requirements; in the rewriting phase, each rewriting rule is applied independently for each Relation. This is possible because once a rewriting rule is applied to a Relation, the Relation is transformed into a computation graph of Relations whose ultimate inputs are compatible (same schema, i.e. same columns with same types, plus the new columns provided by the property achieved) with the inputs of the original Relation and the ultimate output is also compatible with the output of the original Relation so that rewritten Relations can be stitched together in a larger graph the same way the original Relations were connected: see Fig. 4.

Privacy Analysis#

When rewriting, a user can require the output Relation to have the Published property. All rewriting rules with Published outputs require their inputs to be either Public, DP, SD or Published themselves. We assume synthetic data provided to the system are differentially private, so the privacy of the result depends on the way Qrlew rewrites Reduces into DP equivalent Relations.

All rewriting rules with DP outputs require the input of the Reduce to be PUP so we can assume a PID column clearly assign one and only one PU to each rows of the rewritten input. The Reduce is made DP by:

  • Making sure the aggregate columns of the Reduce are computed with differentially private mechanisms.

  • Making sure the grouping keys of the GROUP BY clause are either public or released through a differentially private mechanism.

Protecting aggregation results#

The protection of aggregation functions is carried out in two steps. Given that all currently supported aggregations (COUNT, SUM, AVG, VARIANCE STDDEV) can be reduced to sums, our focus will be on SUM aggregations, i.e. the computation of partial sums of a column for different groups: \(j\in\{1,\ldots,m\}\), of rows.

Let the column be a vector of \(N\) real numbers: \(x = \left(x_1,\ldots, x_N\right)\in\mathbb{R}^N\). We note: \(\pi_k = i \in \{1,\ldots,n\}\) the PID and \(g_k = j \in \{1,\ldots,m\}\) the grouping key associated to \(x_k\). We want to compute all the sums:

\[S_j = \sum_{g_k = j} x_k\]
with some DP guarantees. To this end we:

  • Limit the contribution of each privacy unit to the sum: We represent the contribution of each PU: \(i\), by a vector: \(s_i\) whose components are the partial sums within each of the \(m\) groups: \(s_i = \left(s_{i,1},\ldots, s_{i,m}\right)\), where:

    \[\begin{split}s_{i,j} = \sum_{\substack{\pi_k = i\\g_k = j}}x_k\end{split}\]
    The \(s_i\)’s \(\ell^2\) norms are then clipped to \(c\):
    \[\overline{s_i} = \left(\overline{s_{i,j}}\right)_j = \left(\frac{s_{i,j}}{\max\left(1, \frac{\|s_i\|_2}{c}\right)}\right)_j\]
    See the whitepaper for more details.

  • Add gaussian noise to each group: The clipped contributions are summed and perturbed with gaussian noise \(\nu = \left(\nu_1,\ldots \nu_m\right) \sim \mathcal{N}\left(0, \sigma^2I_m\right)\):

    \[\widetilde{S_j} = \sum_{i=1}^n \overline{s_{i,j}} + \nu_j\]
    With \(\sigma^2={\frac {2\ln(1.25/\delta )\cdot c^{2}}{\varepsilon ^{2}}}\). Note that the vector of sums has \(\ell^2\) Global Sensitivity of \(c\), so this is an application of the Gaussian Mechanism (see: theorem A.1. in [DR+14]) and the mechanism is \(\varepsilon, \delta\)-differentially private.

Protecting grouping keys#

When the grouping keys from a are derived from the data, they are not safe for publication. Following [KKMN09, WZL+19], we use a mechanism called \(\tau\)-thresholding to safely release these grouping keys. Note that, thanks to range propagation (see section Range Propagation), some groups are already public and need no differentially private mechanism to be published. Ultimately, the rewriting of: SELECT sum(x) FROM table GROUP BY g WHERE g IN (1, 2, 3) as a DP equivalent will not use \(\tau\)-thresholding, while SELECT sum(x) FROM table GROUP BY g will most certainly do if nothing more is known about g beforehand.

To summarize the various mechanisms used in Qrlew to date: the rewriting of Reduces with \(PUP \rightarrow DP\) rules requires the use of gaussian mechanisms and \(\tau\)-thresholding mechanisms; then the DP mechanisms used in all the rewritings are aggregated by the Qrlew rewriter as a composed mechanism. The overall privacy loss is aggregated in a RDP accountant [Mir17].

Comparison to other systems#

There are a few existing open-source libraries for differential privacy.

Some libraries focus on deep learning and DP-SGD [ACG+16], such as: Opacus [YSS+21], Tensorflow Privacy [Google23c] or Optax’s DP-SGD [DBB+20]. Qrlew has a very different goal: analytics and SQL.

GoogleDP [Google23a] is a library implementing many differentially private mechanisms in various languages (C++, Go and Java). IBM’s diffprivlib [HBMAL19] is also a rich library implementing a wide variety of DP primitives in python and in particular many DP versions of classical machine learning algorithms. These libraries provide the bricks for experts to build DP algorithms. Qrlew has a very different approach, it is a high level tool designed to take queries written in SQL by a data practitioner with no expertise in privacy and to rewrite them into DP equivalent able to run on any SQL-enabled data store. Qrlew implemented very few DP mechanisms to date, but automated the whole process of rewriting a query, while these library offer a rich variety of DP mechanism, and give full control to the user to use them as they wish.

Google built several higher-level tools on top of [Google23a]. PrivacyOnBeam [Google23b] is a framework to run DP jobs written in Apache Beam with its Go SDK. PipelineDP [Google22] is a framework that let analysts write Beam-like or Spark-like programs and have them run on Apache Spark or Apache Beam as back-end. It focuses on the Beam and Spark ecosystem, while Qrlew tries to provide an SQL interface to the analyst and runs on SQL-enabled back-ends (including Spark, a variety of data warehouses, and more traditional databases). [OpenMinedaGoogle23], gives the user a way to write SQL-like queries and have them executed on tables using GoogleDB custom code, so it is not compatible with any SQL data store and support relatively simple queries only.

OpenDP [TheOTeam23] is a powerful Rust library with a python bindings. It offers many possibilities of building complex DP computations by composing basic elements. Nonetheless, it require both expertise in privacy and to learn a new API to describe a query. Also, the computations are handled by the Rust core, so it does not integrate easily with existing data stores and may not scale well either.

Tumult Analytics [BBD+22] shares many of the nice composable design of OpenDP, but runs on Apache Spark, making it a scalable alternative to OpenDP. Still, it require the learning of a specific API (close to that of Spark) and cannot leverage any SQL back-end.

SmartNoise SQL is a library that share some of the design choices of Qrlew. An analyst can write SQL queries, but the scope of possible queries is relatively limited: no JOINs, no sub-queries, no CTEs (WITH) that Qrlew supports. Also, it does not run the full computation in the DB so the integration with existing systems may not be straightforward.

Other systems such as PINQ [McS09] and Chorus [JNHS20] are prototypes that do not seem to be actively maintained. Chorus shares many of the design goals of Qrlew, but requires post-processing outside of the DB, which can make the integration more complex on the data-owner side (as the computation happens in two distinct places).

Beyond that, Qrlew brings unique functionalities, such as:

  • advanced automated range propagation;

  • the possibility to automatically blend in synthetic data;

  • advanced privacy unit definition capabilities across many related tables;

  • the possibility for the non-expert to simply write standard SQL, but for the DP aware analyst to improve its utility by adding WHERE x < b or WHERE x IN (1,2,3) to give hints to the Qrlew;

  • all the compute happens in the DB.

This last point comes with some limitations (see section Known Limitations), but opens new possibilities like the delegation of the rewriting to a trusted third party. The data practitioner could simply write his desired query in SQL, send it to the rewriter that would keep track of the privacy losses and use Qrlew to rewrite the query, sign it, and send it back to the data practitioner that can then send the data-owner, who will check the signature certifying the DP properties of the rewritten query[2].

Known Limitations#

Qrlew still implements a limited number of DP mechanisms, it is still lacking basic functionalities such as: quantile estimation, exponential mechanisms.

Qrlew relies on the random number generator of the SQL engine used. It is usually not a cryptographic secure random number generator.

Qrlew uses the floating-point numbers of the host SQL engine, therefore it is liable to the vulnerabilities described in [CSVW22].

Qrlew white paper#

Read Qrlew white paper.

And cite us:

@misc{grislain2024qrlew,
  title={Qrlew: Rewriting SQL into Differentially Private SQL}, 
  author={Nicolas Grislain and Paul Roussel and Victoria de Sainte Agathe},
  year={2024},
  eprint={2401.06273},
  archivePrefix={arXiv},
  primaryClass={cs.DB},
  url={https://arxiv.org/pdf/2401.06273.pdf}
}

Definitions#

Useful definitions can be found there.

Bibliography#

[ACG+16]

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, 308–318. 2016.

[AGKZ18]

Maryam Archie, Sophie Gershon, Abigail Katcoff, and Aaron Zeng. Who’s watching? de-anonymization of netflix reviews using amazon reviews. 2018.

[BBD+22]

Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, and others. Tumult analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy. arXiv preprint arXiv:2212.04133, 2022. URL: https://arxiv.org/pdf/2212.04133.pdf.

[CSVW22]

Sílvia Casacuberta, Michael Shoemate, Salil Vadhan, and Connor Wagaman. Widespread underestimation of sensitivity in differentially private libraries and how to fix it. 2022. arXiv:2207.10635.

[DBB+20]

DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena Martens, Hamza Merzic, Vladimir Mikulik, Tamara Norman, George Papamakarios, John Quan, Roman Ring, Francisco Ruiz, Alvaro Sanchez, Laurent Sartran, Rosalia Schneider, Eren Sezener, Stephen Spencer, Srivatsan Srinivasan, Miloš Stanojević, Wojciech Stokowiec, Luyu Wang, Guangyao Zhou, and Fabio Viola. The DeepMind JAX Ecosystem. 2020. URL: deepmind.

[DR+14] (1,2)

Cynthia Dwork, Aaron Roth, and others. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.

[HBMAL19]

Naoise Holohan, Stefano Braghin, Pól Mac Aonghusa, and Killian Levacher. Diffprivlib: the IBM differential privacy library. ArXiv e-prints, July 2019.

[JNHS20]

Noah Johnson, Joseph P Near, Joseph M Hellerstein, and Dawn Song. Chorus: a programming framework for building scalable differential privacy mechanisms. In 2020 IEEE European Symposium on Security and Privacy (EuroS&P), 535–551. IEEE, 2020. URL: https://arxiv.org/pdf/1809.07750.pdf.

[KKMN09]

Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World wide web, 171–180. 2009.

[LSY+20]

Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Michael Riley. Learning discrete distributions: user vs item-level privacy. Advances in Neural Information Processing Systems, 33:20965–20976, 2020.

[McS09]

Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, 19–30. 2009.

[Mir17]

Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF), 263–275. IEEE, 2017.

[PHK+23]

Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta. How to dp-fy ml: a practical guide to machine learning with differential privacy. Journal of Artificial Intelligence Research, 77:1113–1201, 2023.

[WZL+19]

Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. Differentially private sql with bounded user contribution. arXiv preprint arXiv:1909.01917, 2019. URL: https://arxiv.org/pdf/1909.01917.pdf.

[YSS+21]

Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, and others. Opacus: user-friendly differential privacy library in pytorch. arXiv preprint arXiv:2109.12298, 2021. URL: https://arxiv.org/pdf/2109.12298.pdf.

[Google22]

Google. PipelineDP Library. 2022. URL: OpenMined/PipelineDP.

[Google23a] (1,2)

Google. Google's differential privacy libraries. 2023. URL: google/differential-privacy.

[Google23b]

Google. Privacy On Beam. 2023. URL: google/differential-privacy.

[Google23c]

Google. TensorFlow Privacy. 2023. URL: tensorflow/privacy.

[OpenMinedaGoogle23]

OpenMined and Google. ZetaSQL Library. 2023. URL: google/differential-privacy.

[TheOTeam23]

The OpenDP Team. OpenDP Library. 2023. URL: opendp/opendp.