# Deep Dive#

## What is Qrlew?#

Qrlew is an open source library that can parse SQL queries into Relations — an intermediate representation — that keeps track of rich data types, value ranges, and row ownership; so that they can easily be rewritten into differentially-private equivalent and turned back into SQL queries for execution in a variety of standard data stores.

With Qrlew, a data practitioner can express their data queries in standard SQL; the data owner can run the rewritten query without any technical integration and with strong privacy guarantees on the output; and the query rewriting can be operated by a privacy-expert who must be trusted by the owner, but may belong to a separate organization.

## Main Features#

Qrlew provides the following features:

**Qrlew provides automatic output privacy guarantees:**With Qrlew a data owner can let an analyst (data practitioner) with no expertise in privacy protection run arbitrary SQL queries with strong privacy guarantees on the output.**Qrlew leverages existing infrastructures:**Qrlew rewrites a SQL query into a differentially private SQL query that can be run on any data-store with a SQL interface: from lightweight DB to big-data stores. This removes the need for a custom execution engine and enables differentially private analytics with virtually no technical integration.**Qrlew leverages synthetic data:**Synthetic data are an increasingly popular way of privatizing a dataset. Using jointly: differentially private mechanisms and differentially private synthetic data can be a simple, yet powerful, way of managing a privacy budget and reaching better utility-privacy tradeoffs.

## Design Goals#

Qrlew assumes the *central model of differential privacy* [AGKZ18], where a trusted central organization: hospital, insurance company, utility provider, called the data owner, collects and stores personal data in a secure database and whishes to let untrusted data practitioners run SQL queries on its data.

At a high level we pursued the following requirements:

Ease of use for the data practitioners. The data practitioners are assumed to be a data experts but no privacy experts. They should be able to express their queries in a standard way. We chose SQL as the query language as it is very commonly used for analytics tasks.

Ease of integration for the data owner. As SQL is a common language to express data analysis tasks, many data-stores support it from small embedded databases to big data stores.

Simplicity for the data owner to setup privacy protection. Differential privacy is about capping the sensitivity of a result to the addition or removal of an individual that we call privacy unit. Qrlew assumes that the data owner can tell if a table is public and, if it is not, that it can assign exactly one privacy unit to each row of data. In the case there are multiple related tables, Qrlew enables to define easily the privacy units for each tables transitively.

Simple integration with other privacy enhancing technologies such as synthetic data. To avoid repeated privacy losses or give result when a DP rewriting is not easily available (e.g. when the query is:

`SELECT * FROM table`

) Qrlew can use synthetic data to blend in the computation.

These requirements dictated the overall *query rewriting* architecture and many features, the most important of which, are detailed below.

## How does Qrlew work?#

The Qrlew library, solves the problem of running a SQL query with DP guarantees in three steps.
First the SQL query submitted by the data practitioner is parsed and converted into a Relation, this Relation is an intermediate representation that is designed to ease the tracking of data types ranges or possible values, to ease the tracking of the privacy unit and to ease the rewriting into a DP *Relation*. Then, the rewriting into DP happens. Once the relation is rewritten into a DP one, it can be rendered as an SQL query string and submitted to the data store of the *data owner*. The output can then safely be shared with the *data practitioner*. This process is illustrated in Fig. 2.

### Qrlew Intermediate Representation#

As the SQL language is very rich and complex, simply parsing a query into an abstract syntax tree does not produce a convenient representation for our needs. Therefore, it is converted into a simpler normalized representation with properties well aligned with the requirements of Differential Privacy: the *Relation*. A *Relation* is a collection of rows adhering to a given *schema*. It is a recursively defined structure composed of:

**Tables:**This is simply a data source from a database.**Maps:**A Map takes an input*Relation*, filters the rows and transform them one by one. The filtering conditions and row transforms are expressed with expressions similar to those of SQL. It acts as a`SELECT exprs FROM input WHERE expr LIMIT value`

and therefore preserves the privacy unit ownership structure.**Reduces:**A*Reduce*takes an input*Relation*and aggregates some columns, possibly group by group. It acts as a`SELECT aggregates FROM input GROUP BY expr`

. This is where the rewriting into DP will happen as described bellow.**Joins:**This*Relation*combines two input*Relations*as a`SELECT * FROM left JOIN right ON expr`

would do it. The privacy properties are more complex to propagate in this case.

It may also be a static list of values or a set operation between two *Relations*, but those are less important for our uses.

This representation is central to Qrlew; all the features described below are built upon it. A *Relation*, along with all the sub-*Relations* it depends on, will be called the *computation graph* or the *graph* of a *Relation*.

### Range Propagation#

Most DP mechanisms aggregating numbers require the knowledge of some bounds on the values (see [DR+14]).
Even if some bounds are known for some *Relations* like source **Tables**, it is not trivial to propagate these bounds through the steps of the computation.

To help with range propagation, Qrlew introduces two useful concepts:

The concept of

*\(k\)-Interval*, which are finite unions of at most \(k\) closed intervals. A*\(k\)-Interval*can be noted:\[I = \bigcup_{i=1}^{j\leq k}\left[a_i, b_i\right]\]Note that the union of*\(k\)-Intervals*may not be a*\(k\)-Interval*as it may be the union of more than \(k\) intervals. Unions of many intervals can be simplified into their convex envelope interval, which are often sufficient bounds approximations for our use cases:\[J = \bigcup_{i=1}^{j> k}\left[a_i, b_i\right] \subseteq \left[\min_i a_i, \max_i b_i\right]\]And the concept of

*piecewise-monotonic-functions*[1], which are functions \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) whose domain can be partitioned in cartesian products of intervals: \(P_j\) on which they are*coordinatewise-monotonic*. The image of a cartesian product of \(n\)*\(k\)-Intervals*by a*piecewise-monotonic-function*can be easily computed as a*\(k\)-Interval*. Indeed, let \(I\) be:\[\begin{split}I = I_1\times I_2\times \ldots \times I_n = \bigcup_{\substack{1\leq i_1\leq k\\\ldots\\1\leq i_n\leq k}}\left[a_{i_1}, b_{i_1}\right]\times \ldots \times \left[a_{i_n}, b_{i_n}\right]\end{split}\]If \(f\) is*piecewise-monotonic*, then one can show that on each partition \(P_j\) where it is*coordinatewise-monotonic*, if we note:\[\begin{split}I_j = I \cap P_j = \bigcup_{\substack{1\leq j_1\leq k\\\ldots\\1\leq j_n\leq k}}\left[a_{j_1}, b_{j_1}\right]\times \ldots \times \left[a_{j_n}, b_{j_n}\right]\end{split}\]\[\begin{split}f(I_j) = \bigcup_{\substack{1\leq i_1\leq k\\\ldots\\1\leq i_n\leq k}}\text{Conv}\left(f\left( \left\{a_{i_1}, b_{i_1}\right\}\times \ldots \times \left\{a_{i_n}, b_{i_n}\right\}\right)\right)\end{split}\]where \(\text{Conv}\left(f\left( \left\{a_{i_1}, b_{i_1}\right\}\times \ldots \times \left\{a_{i_n}, b_{i_n}\right\}\right)\right)\) can be efficiently computed in \(n\) steps, without testing all the \(2^n\) combinations, thanks to the coordinatewise monotony of \(f\) on \(P_j\). Then \(f(I) = \bigcup_j f(I_j)\), of which we can derive the bounding: \(f(I) \subseteq \text{Conv}\left(\bigcup_j f(I_j)\right)\) when the number of terms in the union exceeds \(k\).

The notion of *\(k\)-Interval* is convenient for tracking value bounds as it can express natural patterns in SQL such as:

`WHERE x>0 AND x<=1`

, which translates into the implied \(x\in \left[0, 1\right]\) ;`WHERE x IN (1,2,3)`

, which is also easily expressed as a*k-Interval*: \(x \in \left[1, 1\right] \cup \left[2, 2\right] \cup \left[3, 3\right]\)

The idea of *piecewise-monotonic-function* is also very useful as in SQL many standard arithmetic operators (`+`

, `-`

, `\*`

, `/`

, `<`

, `>`

, `=`

, `!=`

, …) and functions (`EXP`

, `LOG`

, `ABS`

, `SIN`

, `COS`

, `LEAST`

, `GREATEST`

, …) are trivially *piecewise-monotonic-function* (in one, two or many variables).

Most of the range propagation in Qrlew is based on these concepts. It enables a rather simple and efficient range propagation mechanism, leading to better utility / privacy tradeoffs.

### Privacy Unit Definition#

Tables in a database rarely come properly formatted for privacy-preserving applications. Many rows in many tables may refer to the same individual, hence, *adding or removing an individual* means *adding or removing many rows*. To help the definition of the privacy unit Qrlew introduces a small Privacy Unit (PU) description language.
As exemplified in Listing 1, PU definition associates to each private table in a database a path defining the PID of each row. For a table containing the PU itself, like a `users`

table for example, the PU definition will look like `("users",[],"id"),`

where `id`

is the name of a column identifying the user, like its name. If the database defines tables related to this tables, the way the tables are related should be specified following this scheme: \((\mathtt{tab}_1, path, \mathtt{pid})\) where \(\mathtt{tab}_1\) is the name of the table for which the PID is defined, \(\mathtt{pid}\) is the name of the column defining the PID in the table referred by \(path\) and \(path\) is a list of elements of the form \([(\mathtt{ref}_1, \mathtt{tab}_2, \mathtt{id}_2),\ldots, (\mathtt{ref}_{m-1}, \mathtt{tab}_m, \mathtt{id}_m)]\)
where \(\mathtt{ref}_{i-1}\) is a column in \(\mathtt{tab}_{i-1}\) — usually a foreign key — referring to \(\mathtt{tab}_i\) with a column of referred id \(\mathtt{id}_i\) — usually a primary key. Following the path of tables referring to one another, we end up with the table defining the PID (e.g. `users`

).

This small PU description language allows for a variety of useful PID scenarii, beyond the simple, but restrictive *privacy per row*.

```
privacy_unit = [
("users",[],"id"),
("orders",[
("user_id","users","id")
],"id"),
("items",[
("order_id","orders","id"),
("user_id", "users", "id")
],"id")
]
```

### Rewriting#

Rewriting in Qrlew, refers to the process of altering the *computation graph* by substituting computation *sub-graphs* to *Relations* (see Fig. 4) to alter the properties of the result. This substitution aims to achieve specific objectives, such as ensuring privacy through the incorporation of differentially private mechanisms. The rewriting process (see Fig. 4) happens in two phases:

a

*rewriting rule allocation*phase, where each*Relation*in the*computation graph*gets allocated a*rewriting rule*(RR) compatible with its input and with the desired output property;a

*rule application*phase, where each*Relation*is rewritten to a small*computation graph*implementing the logic of the rewriting and stitched together with the other rewritten*Relations*.

Before we describe these phases into more details, let’s define the various properties we may want to guarantee on each *Relation* and the ones we need for the output.

#### Privacy Properties and Rewriting Rules#

Each *Relation* can have one of the following properties:

**Privacy Unit Preserving (PUP):**A*Relation*is PUP if each row is associated with a PU. In practice it will have a column containing the PID identifying the PU.**Differentially Private (DP):**A*Relation*will be DP if it implements a DP mechanism. A DP*Relation*can be safely executed on private data and the result be published. Note that the*privacy loss*associated with the DP mechanism has to be accurately accounted for (see section Privacy Analysis).**Synthetic Data (SD):**In some contexts a*synthetic data*version of source tables is available. Any*Relation*derived from other SD or Public*Relations*is itself SD.**Public (Pub)**A relation derived from public tables is labeled as such and does not require any further protection to be disclosed.**Published (Pubd):**A relation is considered Published if its input relations are either Public, DP, in some cases SD, or Published themselves. It can be considered as Published but with some more care like the need to account for the privacy loss incurred by its DP ancestors.

These properties usually require some rewriting of the computation graph to be achieved. The requirements for a specific *Relation* to meet some property are embodied in what we call: *rewriting rules*.
A *rewriting rule* has input requirements, and an achievable output *property* that tells what *property* can be achieved by rewriting provided the input *property* requirements are fulfilled.
Each *Relation* can be assigned different *rewriting rules* depending on their nature: *Map*, *Reduce*, etc. and the way they are parametrized.

*Rewriting Rules* can be — for instance — PU propagation rules of the form:

\(\varnothing \rightarrow PUP\) for private

*Tables*with a simple rewriting consisting in taking the definition of the privacy unit and computing the PID column.\(PUP \rightarrow PUP\) for

*Maps*(or for*Reduce*when the PID is in the`GROUP BY`

part) with a rewriting consisting in propagating the PID column from the input to the output.\((PUP, PUP) \rightarrow PUP\) (or its variants with one published input) for

*Join*and a rewriting consisting in adding the PID in the`ON`

clause.

Another key *Rewriting Rules* is \(PUP \rightarrow DP\) for *Reduces*, it simply means that if the parent of the *Relation* can be rewritten as PUP, then we can rewrite the relation to be DP by substituting DP aggregations to the original aggregations of the *Reduce*.

One easily see that by simply applying \(PUP \rightarrow PUP\) and \(PUP \rightarrow DP\) rules, one can propagate the privacy unit across the computation graph of a *Relation* and compute some DP aggregate such as a noisy sum or average.

#### Rewriting Rule Allocation#

The first phase of the rewriting process consists in allocating one and only one rule to each *Relation*.
This is done in three steps illustrated in Fig. 5:

**Rule Setting:**We assign the set of potential rewriting rules to each*Relation*in a computation graph.**Rule Elimination:**Only feasible rewriting rules are preserved. A rewriting rule that would require a PUP input is only feasible if its input Relation has a feasible rule outputting a PUP*Relation*.**Rule Selection:**All feasible allocations of one rewriting rule per*Relation*are listed, a score depending on the desired ultimate output property is assigned to each allocation and the highest scoring allocation is selected. Then, a simple split \(\left(\frac{\varepsilon}{n}, \frac{\delta}{n}\right)\) of the overall privacy budget \(\left(\varepsilon, \delta\right)\) depending on the number of \(PUP \rightarrow DP\) rules: \(n\) is chosen.

In the computation graph, while each node’s multiple rewriting rules might suggest a combinatorial explosion in the number of possible feasible allocations, this is mitigated in practice. The pruning of infeasible rules, dictated by the requirement for most relations to have a PUP input for a DP or PUP outcome, significantly reduces the complexity. Hence, despite the theoretical breadth of possibilities, the actual number of feasible paths remains manageable, avoiding substantial computational problems in practice.

#### Rule Application#

Once the first phase of rule allocation is achieved, starts the second phase: *rule application*, as illustrated in Fig. 4.
In the allocation phase, a *global rewriting scheme* was set in the form of an allocation satisfying a system of requirements; in the rewriting phase, each rewriting rule is applied *independently* for each *Relation*. This is possible because once a rewriting rule is applied to a *Relation*, the *Relation* is transformed into a computation graph of *Relations* whose ultimate inputs are compatible (same schema, i.e. same columns with same types, plus the new columns provided by the property achieved) with the inputs of the original *Relation* and the ultimate output is also compatible with the output of the original *Relation* so that rewritten *Relations* can be stitched together in a larger graph the same way the original *Relations* were connected: see Fig. 4.

## Privacy Analysis#

When rewriting, a user can require the output *Relation* to have the *Published* property. All *rewriting rules* with *Published* outputs require their inputs to be either *Public*, *DP*, *SD* or *Published* themselves. We assume synthetic data provided to the system are differentially private, so the privacy of the result depends on the way Qrlew rewrites *Reduces* into *DP* equivalent *Relations*.

All *rewriting rules* with *DP* outputs require the input of the *Reduce* to be *PUP* so we can assume a PID column clearly assign one and only one PU to each rows of the rewritten input. The *Reduce* is made DP by:

Making sure the aggregate columns of the

*Reduce*are computed with differentially private mechanisms.Making sure the grouping keys of the

`GROUP BY`

clause are either public or released through a differentially private mechanism.

### Protecting aggregation results#

The protection of aggregation functions is carried out in two steps. Given that all currently supported aggregations (`COUNT`

, `SUM`

, `AVG`

, `VARIANCE`

`STDDEV`

) can be reduced to sums, our focus will be on `SUM`

aggregations, i.e. the computation of partial sums of a column for different groups: \(j\in\{1,\ldots,m\}\), of rows.

Let the column be a vector of \(N\) real numbers: \(x = \left(x_1,\ldots, x_N\right)\in\mathbb{R}^N\). We note: \(\pi_k = i \in \{1,\ldots,n\}\) the PID and \(g_k = j \in \{1,\ldots,m\}\) the grouping key associated to \(x_k\). We want to compute all the sums:

**Limit the contribution of each**: We represent the contribution of each PU: \(i\), by a vector: \(s_i\) whose components are the partial sums within each of the \(m\) groups: \(s_i = \left(s_{i,1},\ldots, s_{i,m}\right)\), where:*privacy unit*to the sum\[\begin{split}s_{i,j} = \sum_{\substack{\pi_k = i\\g_k = j}}x_k\end{split}\]The \(s_i\)’s \(\ell^2\) norms are then clipped to \(c\):\[\overline{s_i} = \left(\overline{s_{i,j}}\right)_j = \left(\frac{s_{i,j}}{\max\left(1, \frac{\|s_i\|_2}{c}\right)}\right)_j\]See the whitepaper for more details.**Add gaussian noise to each group**: The clipped contributions are summed and perturbed with gaussian noise \(\nu = \left(\nu_1,\ldots \nu_m\right) \sim \mathcal{N}\left(0, \sigma^2I_m\right)\):\[\widetilde{S_j} = \sum_{i=1}^n \overline{s_{i,j}} + \nu_j\]With \(\sigma^2={\frac {2\ln(1.25/\delta )\cdot c^{2}}{\varepsilon ^{2}}}\). Note that the vector of sums has \(\ell^2\)*Global Sensitivity*of \(c\), so this is an application of the*Gaussian Mechanism*(see: theorem A.1. in [DR+14]) and the mechanism is \(\varepsilon, \delta\)-differentially private.

### Protecting grouping keys#

When the grouping keys from a are derived from the data, they are not safe for publication.
Following [KKMN09, WZL+19], we use a mechanism called *\(\tau\)-thresholding* to safely release these grouping keys.
Note that, thanks to *range propagation* (see section Range Propagation), some groups are already public and need no differentially private mechanism to be published.
Ultimately, the rewriting of: `SELECT sum(x) FROM table GROUP BY g WHERE g IN (1, 2, 3)`

as a DP equivalent will not use *\(\tau\)-thresholding*, while `SELECT sum(x) FROM table GROUP BY g`

will most certainly do if nothing more is known about `g`

beforehand.

To summarize the various mechanisms used in Qrlew to date:
the rewriting of *Reduces* with \(PUP \rightarrow DP\) rules requires the use of *gaussian mechanisms* and *\(\tau\)-thresholding* mechanisms;
then the DP mechanisms used in all the rewritings are aggregated by the Qrlew rewriter as a composed mechanism.
The overall privacy loss is aggregated in a RDP accountant [Mir17].

## Comparison to other systems#

There are a few existing open-source libraries for differential privacy.

Some libraries focus on deep learning and *DP-SGD* [ACG+16], such as: *Opacus* [YSS+21], *Tensorflow Privacy* [Google23c] or *Optax’s DP-SGD* [DBB+20]. Qrlew has a very different goal: analytics and SQL.

*GoogleDP* [Google23a] is a library implementing many differentially private mechanisms in various languages (C++, Go and Java).
*IBM’s diffprivlib* [HBMAL19] is also a rich library implementing a wide variety of DP primitives in python and in particular many DP versions of classical machine learning algorithms.
These libraries provide the bricks for experts to build DP algorithms. Qrlew has a very different approach, it is a high level tool designed to take queries written in SQL by a data practitioner with no expertise in privacy and to rewrite them into DP equivalent able to run on any SQL-enabled data store. Qrlew implemented very few DP mechanisms to date, but automated the whole process of rewriting a query, while these library offer a rich variety of DP mechanism, and give full control to the user to use them as they wish.

Google built several higher-level tools on top of [Google23a].
*PrivacyOnBeam* [Google23b] is a framework to run DP jobs written in Apache Beam with its Go SDK.
*PipelineDP* [Google22] is a framework that let analysts write Beam-like or Spark-like programs and have them run on Apache Spark or Apache Beam as back-end. It focuses on the Beam and Spark ecosystem, while Qrlew tries to provide an SQL interface to the analyst and runs on SQL-enabled back-ends (including Spark, a variety of data warehouses, and more traditional databases).
[OpenMinedaGoogle23], gives the user a way to write SQL-like queries and have them executed on tables using GoogleDB custom code, so it is not compatible with any SQL data store and support relatively simple queries only.

*OpenDP* [TheOTeam23] is a powerful Rust library with a python bindings. It offers many possibilities of building complex DP computations by composing basic elements. Nonetheless, it require both expertise in privacy and to learn a new API to describe a query. Also, the computations are handled by the Rust core, so it does not integrate easily with existing data stores and may not scale well either.

*Tumult Analytics* [BBD+22] shares many of the nice composable design of OpenDP, but runs on Apache Spark, making it a scalable alternative to OpenDP. Still, it require the learning of a specific API (close to that of Spark) and cannot leverage any SQL back-end.

*SmartNoise SQL* is a library that share some of the design choices of Qrlew. An analyst can write SQL queries, but the scope of possible queries is relatively limited: no `JOIN`

s, no sub-queries, no CTEs (`WITH`

) that Qrlew supports. Also, it does not run the full computation in the DB so the integration with existing systems may not be straightforward.

Other systems such as *PINQ* [McS09] and *Chorus* [JNHS20] are prototypes that do not seem to be actively maintained. *Chorus* shares many of the design goals of Qrlew, but requires post-processing outside of the DB, which can make the integration more complex on the data-owner side (as the computation happens in two distinct places).

Beyond that, Qrlew brings unique functionalities, such as:

advanced automated range propagation;

the possibility to automatically blend in synthetic data;

advanced privacy unit definition capabilities across many related tables;

the possibility for the non-expert to simply write standard SQL, but for the DP aware analyst to improve its utility by adding

`WHERE x < b`

or`WHERE x IN (1,2,3)`

to give hints to the Qrlew;all the compute happens in the DB.

This last point comes with some limitations (see section Known Limitations), but opens new possibilities like the delegation of the rewriting to a trusted third party. The data practitioner could simply write his desired query in SQL, send it to the rewriter that would keep track of the privacy losses and use Qrlew to rewrite the query, sign it, and send it back to the data practitioner that can then send the data-owner, who will check the signature certifying the DP properties of the rewritten query[2].

## Known Limitations#

Qrlew still implements a limited number of DP mechanisms, it is still lacking basic functionalities such as: quantile estimation, exponential mechanisms.

Qrlew relies on the random number generator of the SQL engine used. It is usually not a cryptographic secure random number generator.

Qrlew uses the floating-point numbers of the host SQL engine, therefore it is liable to the vulnerabilities described in [CSVW22].

# Qrlew white paper#

Read Qrlew white paper.

And cite us:

```
@misc{grislain2024qrlew,
title={Qrlew: Rewriting SQL into Differentially Private SQL},
author={Nicolas Grislain and Paul Roussel and Victoria de Sainte Agathe},
year={2024},
eprint={2401.06273},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/pdf/2401.06273.pdf}
}
```

# Definitions#

Useful *definitions* can be found there.

# Bibliography#

Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In *Proceedings of the 2016 ACM SIGSAC conference on computer and communications security*, 308–318. 2016.

Maryam Archie, Sophie Gershon, Abigail Katcoff, and Aaron Zeng. Who’s watching? de-anonymization of netflix reviews using amazon reviews. 2018.

Skye Berghel, Philip Bohannon, Damien Desfontaines, Charles Estes, Sam Haney, Luke Hartman, Michael Hay, Ashwin Machanavajjhala, Tom Magerlein, Gerome Miklau, and others. Tumult analytics: a robust, easy-to-use, scalable, and expressive framework for differential privacy. *arXiv preprint arXiv:2212.04133*, 2022. URL: https://arxiv.org/pdf/2212.04133.pdf.

Sílvia Casacuberta, Michael Shoemate, Salil Vadhan, and Connor Wagaman. Widespread underestimation of sensitivity in differentially private libraries and how to fix it. 2022. arXiv:2207.10635.

DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena Martens, Hamza Merzic, Vladimir Mikulik, Tamara Norman, George Papamakarios, John Quan, Roman Ring, Francisco Ruiz, Alvaro Sanchez, Laurent Sartran, Rosalia Schneider, Eren Sezener, Stephen Spencer, Srivatsan Srinivasan, Miloš Stanojević, Wojciech Stokowiec, Luyu Wang, Guangyao Zhou, and Fabio Viola. The DeepMind JAX Ecosystem. 2020. URL: deepmind.

Cynthia Dwork, Aaron Roth, and others. The algorithmic foundations of differential privacy. *Foundations and Trends® in Theoretical Computer Science*, 9(3–4):211–407, 2014.

Naoise Holohan, Stefano Braghin, Pól Mac Aonghusa, and Killian Levacher. Diffprivlib: the IBM differential privacy library. *ArXiv e-prints*, July 2019.

Noah Johnson, Joseph P Near, Joseph M Hellerstein, and Dawn Song. Chorus: a programming framework for building scalable differential privacy mechanisms. In *2020 IEEE European Symposium on Security and Privacy (EuroS&P)*, 535–551. IEEE, 2020. URL: https://arxiv.org/pdf/1809.07750.pdf.

Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately. In *Proceedings of the 18th international conference on World wide web*, 171–180. 2009.

Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Michael Riley. Learning discrete distributions: user vs item-level privacy. *Advances in Neural Information Processing Systems*, 33:20965–20976, 2020.

Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In *Proceedings of the 2009 ACM SIGMOD International Conference on Management of data*, 19–30. 2009.

Ilya Mironov. Rényi differential privacy. In *2017 IEEE 30th computer security foundations symposium (CSF)*, 263–275. IEEE, 2017.

Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H Brendan McMahan, Sergei Vassilvitskii, Steve Chien, and Abhradeep Guha Thakurta. How to dp-fy ml: a practical guide to machine learning with differential privacy. *Journal of Artificial Intelligence Research*, 77:1113–1201, 2023.

Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. Differentially private sql with bounded user contribution. *arXiv preprint arXiv:1909.01917*, 2019. URL: https://arxiv.org/pdf/1909.01917.pdf.

Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, and others. Opacus: user-friendly differential privacy library in pytorch. *arXiv preprint arXiv:2109.12298*, 2021. URL: https://arxiv.org/pdf/2109.12298.pdf.

Google. PipelineDP Library. 2022. URL: OpenMined/PipelineDP.

Google. Google's differential privacy libraries. 2023. URL: google/differential-privacy.

Google. Privacy On Beam. 2023. URL: google/differential-privacy.

Google. TensorFlow Privacy. 2023. URL: tensorflow/privacy.

OpenMined and Google. ZetaSQL Library. 2023. URL: google/differential-privacy.

The OpenDP Team. OpenDP Library. 2023. URL: opendp/opendp.