Rewriting rules#
Relation#
A Relation is an intermediate representation of a query that allow to easily handle DP rewriting by making it easy:
track the privacy unit,
and estimate the sensitivity of the query.
Any SQL query can be transformed into a Relation.
Relations can be have different properties and be rewritten into other Relations with specified properties.
This page lists the different properties and the different rewriting rules.
Properties#
Each Relation can have one of the following properties:
Public
: A relation derived from public tables is labeled as such and does not require any further protection to be disclosed.Published
: A relation is considered Published if its input relations are either Public, DP, in some cases SD, or Published themselves. It can be considered as Published but with some more care like the need to account for the privacy loss incurred by its DP ancestors.Private
: Cannot be disclosed.PrivacyUnitPreserving
(PUP): A Relation is PUP if each row is associated with a PU. In practice it will have a column containing the PID identifying the PUDifferentiallyPrivate
: A Relation will be DP if it implements a DP mechanism. A DP Relation can be safely executed on private data and the result be published. Note that the privacy loss associated with the DP mechanism has to be accurately accounted for (see section~\ref{sec:privacy_analysis}).SyntheticData
: In some contexts a synthetic data version of source tables is available. Any Relation derived from other SD or Public Relations is itself SD.
Table#
A Relation::Table
may be:
Private
PrivacyUnitPreserving
Public
SyntheticData
Rewriting rules#
The Rewriting rules are defined there.
They can be summarized as follow.
Map#
if its input is |
the Map can be rewritten into |
---|---|
|
|
|
|
|
|
|
|
|
|
Reduce#
if its input is |
and if |
the Reduce can be rewritten into |
---|---|---|
|
|
|
|
|
|
|
|
|
|
all the aggregations are DP-rewritable |
|
Join#
if its left input is |
if its right input is |
the Join can be rewritten into |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Set#
if its left input is |
if its right input is |
the Set can be rewritten into |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Values#
A Relation::Values
may be:
Public
SyntheticData
Rewriting#
The rewriting process consists of three sequential steps:
Initial Assignment: Each Relation is assigned a set of rules based on its type. For instance, if it contains only supported aggregations, a
Reduce
may be categorized asPublic
,Published
,SyntheticData
, orDifferentiallyPrivate
.Bottom-Up Rule Selection: Starting from the bottom of the tree, rules are selected based on the possible rules of the input(s). For example, if the input of a
Reduce
isPublic
, then theReduce
can only be categorized asPublic
. The other properties are eliminated during this step.Scoring: Each property is assigned a score:
Public
: 10Published
: 1Private
: 0PrivacyUnitPreserving
: 2DifferentiallyPrivate
: 5SyntheticData
: 1 The higher the score, the more desirable the property. The score of a graph of relations with properties is the sum of the scores of all the relations. The combination of rules that maximizes the total score is selected.
This process is explained in more details in the deep dive section.