{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "L4E6fRhRSqcW" }, "source": [ "# Getting Started\n", "[![View On GitHub](https://img.shields.io/badge/View_in_Github-grey?logo=github)](https://github.com/Qrlew/docs/blob/main/tutorials/getting_started.ipynb)\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Qrlew/docs/blob/main/tutorials/getting_started.ipynb)\n", "\n", "[Qrlew](https://qrlew.github.io/) is an [open source library](https://github.com/Qrlew) that can turn SQL queries into their [differentially private (DP)](https://en.wikipedia.org/wiki/Differential_privacy) equivalent.\n", "* It takes SQL – the universal language of small and big data analytics – as input, so there is no new language or API to learn.\n", "* It returns DP SQL queries that can be executed at scale on any SQL datastore.\n", "* The only required extra inputs are the *privacy unit* specification and privacy budget $(\\varepsilon, \\delta)$.\n", "\n", "With [Qrlew](https://qrlew.github.io/) a *data owner* can receive SQL queries from an untrusted *data analyst*, turn them into DP equivalent, execute them and safely return the result to the *data analyst*.\n", "\n", "In this quick intro, we will setup a database, load a few demo tables.\n", "Then we will define the *privacy unit* and run simple DP SQL queries to show how [Qrlew](https://qrlew.github.io/) works on various kind of queries with Joins, Subqueries or Common Table Expressions." ] }, { "cell_type": "markdown", "metadata": { "id": "QNGEv8kYT3xD" }, "source": [ "## Get access to a database\n", "\n", "Let's start by installing a local PostgreSQL database.\n", "\n", "Qrlew currently supports the following SQL dialects:\n", "- [PostgreSQL](https://www.postgresql.org/),\n", "- [BigQuery](https://cloud.google.com/bigquery?hl=en),\n", "- [Synapse/MSSQL](https://azure.microsoft.com/en-us/products/synapse-analytics/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "iRCHrt3FSnfu" }, "outputs": [], "source": [ "%%capture\n", "!sudo apt-get -y -qq update\n", "!sudo apt-get -y -qq install postgresql\n", "!sudo service postgresql start\n", "!sudo -u postgres psql -U postgres -c \"ALTER USER postgres PASSWORD 'password'\"\n", "!pip install psycopg2" ] }, { "cell_type": "markdown", "metadata": { "id": "cF5AVwDI5foi" }, "source": [ "## Install [Qrlew](https://qrlew.github.io/) python bindings: `pyqrlew`\n", "\n", "`pyqrlew` is distributed on [PyPI](https://pypi.org/project/pyqrlew/) and simply installed with `pip install`.\n", "\n", "If no binary is available for your platform, you will need a [Rust compiler](https://rustup.rs/) to install the library from sources.\n", "\n", "Binaries are [available](https://github.com/Qrlew/pyqrlew/blob/main/.github/workflows/ci.yml) for Linux, MacOS, Windows and major architectures including x86, arm or Apple chips." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "7WLg9GGM5ech" }, "outputs": [], "source": [ "%%capture\n", "!pip install pyqrlew" ] }, { "cell_type": "markdown", "metadata": { "id": "47NGQELH6b65" }, "source": [ "## Load a few tables\n", "\n", "Let's load a `pyqrlew` demo dataset: [IMDb](https://relational.fit.cvut.cz/dataset/IMDb), composed of several related tables." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "S6RaM-BfWjMl" }, "outputs": [], "source": [ "from pyqrlew.io import PostgreSQL\n", "\n", "database = PostgreSQL('postgres', 'postgres', 'password', 5432)\n", "database.load_imdb()\n", "engine = database.engine()" ] }, { "cell_type": "markdown", "metadata": { "id": "Rx56sYHHmVYi" }, "source": [ "We can display a short extract of each table." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "uPwRA5Fh64-z", "outputId": "f0c9f6c5-67cb-4e11-ac14-4a85f68af7d9" }, "outputs": [ { "data": { "text/markdown": [ "## actors" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idfirst_namelast_namegender
02Michael'babeepower' VieraM
13Eloy'Chincheta'M
24Dieguito'El Cigala'M
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " id first_name last_name gender\n", "0 2 Michael 'babeepower' Viera M\n", "1 3 Eloy 'Chincheta' M\n", "2 4 Dieguito 'El Cigala' M" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## directors" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idfirst_namelast_name
01Todd1
12Les12 Poissons
23Lejarena'Hiller
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " id first_name last_name\n", "0 1 Todd 1\n", "1 2 Les 12 Poissons\n", "2 3 Lejaren a'Hiller" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## directors_genres" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
director_idgenreprob
02Short1.0
13Drama1.0
25Documentary1.0
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " director_id genre prob\n", "0 2 Short 1.0\n", "1 3 Drama 1.0\n", "2 5 Documentary 1.0" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## movies" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameyearrank
00#282002NaN
11#7 Train: An Immigrant Journey, The2000NaN
22$19716.4
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " id name year rank\n", "0 0 #28 2002 NaN\n", "1 1 #7 Train: An Immigrant Journey, The 2000 NaN\n", "2 2 $ 1971 6.4" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## movies_directors" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
director_idmovie_id
01378879
12281325
2330621
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " director_id movie_id\n", "0 1 378879\n", "1 2 281325\n", "2 3 30621" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## movies_genres" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movie_idgenre
01Documentary
11Short
22Comedy
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " movie_id genre\n", "0 1 Documentary\n", "1 1 Short\n", "2 2 Comedy" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## roles" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actor_idmovie_idrole
02280088Stevie
12396232Various/lyricist
23376687Gitano 1
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " actor_id movie_id role\n", "0 2 280088 Stevie\n", "1 2 396232 Various/lyricist\n", "2 3 376687 Gitano 1" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display, Markdown\n", "import pandas as pd\n", "\n", "for table in ['actors', 'directors', 'directors_genres', 'movies', 'movies_directors', 'movies_genres', 'roles']:\n", " display(Markdown(f'## {table}'))\n", " display(pd.read_sql(f'SELECT * FROM imdb_ijs.{table} LIMIT 3', engine))" ] }, { "cell_type": "markdown", "metadata": { "id": "PUK-TkEZIIr-" }, "source": [ "## Connect to the database and create a `pyqrlew.Dataset`\n", "\n", "To use [Qrlew](https://qrlew.github.io/) the SQL tables have to be collected into a `Dataset` object that will enrich the tables with metadata such as privacy unit or value ranges definitions.\n", "\n", "[Qrlew](https://qrlew.github.io/) basic constructs are [relations](https://en.wikipedia.org/wiki/Relation_(database)): abstract representation of SQL queries.\n", "\n", "Tables are the most basic [relations](https://en.wikipedia.org/wiki/Relation_(database)) in qrlew." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "ZVtS_LbbV11_" }, "outputs": [], "source": [ "from pyqrlew import Dataset\n", "\n", "source_dataset = Dataset.from_database('imdb', engine, 'imdb_ijs')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 791 }, "id": "T2eWtcBJ6qlj", "outputId": "6e6ac026-77cd-4744-dccf-ad76f48c6ad2" }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_ma77\n", "\n", "\n", "\n", "graph_ma77\n", "\n", "IMDB_IJS_ACTORS size ∈ int{817718}\n", "id = id ∈ int\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_8nf1\n", "\n", "\n", "\n", "graph_8nf1\n", "\n", "IMDB_IJS_DIRECTORS size ∈ int{86880}\n", "id = id ∈ int\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_yn_2\n", "\n", "\n", "\n", "graph_yn_2\n", "\n", "IMDB_IJS_DIRECTORS_GENRES size ∈ int{156562}\n", "director_id = director_id ∈ int\n", "genre = genre ∈ str\n", "prob = prob ∈ float\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_3yxb\n", "\n", "\n", "\n", "graph_3yxb\n", "\n", "IMDB_IJS_MOVIES size ∈ int{388269}\n", "id = id ∈ int\n", "name = name ∈ str\n", "year = year ∈ int\n", "rank = rank ∈ float\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_mmzc\n", "\n", "\n", "\n", "graph_mmzc\n", "\n", "IMDB_IJS_MOVIES_DIRECTORS size ∈ int{371180}\n", "director_id = director_id ∈ int\n", "movie_id = movie_id ∈ int\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_qpgf\n", "\n", "\n", "\n", "graph_qpgf\n", "\n", "IMDB_IJS_MOVIES_GENRES size ∈ int{395119}\n", "movie_id = movie_id ∈ int\n", "genre = genre ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_6mn0\n", "\n", "\n", "\n", "graph_6mn0\n", "\n", "IMDB_IJS_ROLES size ∈ int{3431966}\n", "actor_id = actor_id ∈ int\n", "movie_id = movie_id ∈ int\n", "role = role ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from graphviz import Source\n", "\n", "for path, relation in source_dataset.relations():\n", " display(Source(relation.dot()))" ] }, { "cell_type": "markdown", "metadata": { "id": "ficjC22yKhuy" }, "source": [ "## Optionally declare value ranges and unique constraints\n", "\n", "If ranges are known for some numeric columns, it is possible to define them with the following syntax `new_dataset = dataset....with_range(min, max)`.\n", "\n", "Note that inferring those ranges from the data may reveal private information if not done right." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "sDv6Fs7aKpu-" }, "outputs": [], "source": [ "bounded_dataset = source_dataset.imdb_ijs.movies.rank.with_range(0, 20)" ] }, { "cell_type": "markdown", "metadata": { "id": "nt0JYN8PpuND" }, "source": [ "If some columns have known possible values, they can also be declared.\n", "\n", "Here again, simple inference from the data would be unsafe." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "cvejyKdXpt0U" }, "outputs": [], "source": [ "bounded_dataset = bounded_dataset.imdb_ijs.actors.gender.with_possible_values(['M', 'F'])" ] }, { "cell_type": "markdown", "metadata": { "id": "by6bHp4Hqpaj" }, "source": [ "Some column are known to have a distincts values (although not specified in the DB schema)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "JzBlbYUAqpEc" }, "outputs": [], "source": [ "dataset = (bounded_dataset\n", " .imdb_ijs.actors.id.with_unique_constraint()\n", " .imdb_ijs.directors.id.with_unique_constraint()\n", " .imdb_ijs.movies.id.with_unique_constraint())" ] }, { "cell_type": "markdown", "metadata": { "id": "TbFAhvpQrT8N" }, "source": [ "The new dataset enriched with new metadata can be visualized." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 791 }, "id": "KhDkEQ6xLphI", "outputId": "785b9ce6-3fe3-4f23-a08f-4750ab0d0ceb" }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_hq3c\n", "\n", "\n", "\n", "graph_hq3c\n", "\n", "IMDB_IJS_ACTORS size ∈ int{817718}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str{F, M}\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_crfp\n", "\n", "\n", "\n", "graph_crfp\n", "\n", "IMDB_IJS_DIRECTORS size ∈ int{86880}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_yn_2\n", "\n", "\n", "\n", "graph_yn_2\n", "\n", "IMDB_IJS_DIRECTORS_GENRES size ∈ int{156562}\n", "director_id = director_id ∈ int\n", "genre = genre ∈ str\n", "prob = prob ∈ float\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_773q\n", "\n", "\n", "\n", "graph_773q\n", "\n", "IMDB_IJS_MOVIES size ∈ int{388269}\n", "id = id ∈ int UNIQUE\n", "name = name ∈ str\n", "year = year ∈ int\n", "rank = rank ∈ float[0 20]\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_mmzc\n", "\n", "\n", "\n", "graph_mmzc\n", "\n", "IMDB_IJS_MOVIES_DIRECTORS size ∈ int{371180}\n", "director_id = director_id ∈ int\n", "movie_id = movie_id ∈ int\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_qpgf\n", "\n", "\n", "\n", "graph_qpgf\n", "\n", "IMDB_IJS_MOVIES_GENRES size ∈ int{395119}\n", "movie_id = movie_id ∈ int\n", "genre = genre ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_6mn0\n", "\n", "\n", "\n", "graph_6mn0\n", "\n", "IMDB_IJS_ROLES size ∈ int{3431966}\n", "actor_id = actor_id ∈ int\n", "movie_id = movie_id ∈ int\n", "role = role ∈ str\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for path, relation in dataset.relations():\n", " display(Source(relation.dot()))" ] }, { "cell_type": "markdown", "metadata": { "id": "KGfyBJYnOHT8" }, "source": [ "## Define the privacy unit and budget\n", "\n", "### Privacy unit definition\n", "\n", "To protect the privacy of individuals in a dataset, we need to define a *privacy unit*.\n", "\n", "The *privacy unit* is the *\"element\"* we are referring to when we say two datasets *\"differ on a single element\"* in [the DP definition](https://en.wikipedia.org/wiki/Differential_privacy).\n", "\n", "It is therefore what is to be protected in the dataset.\n", "\n", "Let's say we want to protect the privacy of actors and directors.\n", "For the tables `actors` and `directors` the column `id` will directly define the *privacy unit* (i.e. entity to protect).\n", "\n", "But the tables `directors_genres`, `movies_directors` and `roles` also define directors and actors, because they refer to them through `director_id` and `actor_id`.\n", "\n", "### Budget definition\n", "\n", "Explaining what $\\varepsilon$ and $\\delta$ precisely mean is beyond the scope of this document, but in a word:\n", "* $\\varepsilon$ is the amount of information about any individual that is allowed to leak. It should remain small (<1) so that little information is leaked.\n", "* $\\delta$ is some tolerance that should remain very small, typically lower than one over the number of elements in the dataset." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "TPx0_WLQKAg8" }, "outputs": [], "source": [ "# Privacy unit definition\n", "privacy_unit = [\n", " # The column `id` directly defines the privacy unit\n", " (\"actors\", [], \"id\"),\n", " (\"directors\", [], \"id\"),\n", " # The column `director_id` refers to the column `id` of table `directors`, the `id` of which defines the privacy unit\n", " (\"directors_genres\", [(\"director_id\", \"directors\", \"id\")], \"id\"),\n", " (\"movies_directors\", [(\"director_id\", \"directors\", \"id\")], \"id\"),\n", " # The column `actor_id` refers to the column `id` of table `actors`, the `id` of which defines the privacy unit\n", " (\"roles\", [(\"actor_id\", \"actors\", \"id\")], \"id\"),\n", "]\n", "# Budget\n", "budget = {\"epsilon\": 1.0, \"delta\": 1e-3}" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 321 }, "id": "5ct-q8I8NMJV", "outputId": "4a2316c9-4fbf-47e2-c4e9-a26e5efea566" }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_jcyp\n", "\n", "\n", "\n", "graph_jcyp\n", "\n", "MAP_O7OD size ∈ int[0 817718]\n", "_PRIVACY_UNIT_ = md5(cast_as_text(id)) ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = 1 ∈ int{1}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str{F, M}\n", "\n", "\n", "\n", "graph_hq3c\n", "\n", "IMDB_IJS_ACTORS size ∈ int{817718}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str{F, M}\n", "\n", "\n", "\n", "graph_jcyp->graph_hq3c\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Privacy unit is propagted accross the relation\n", "pup_actors = dataset.imdb_ijs.actors.relation().rewrite_as_privacy_unit_preserving(\n", " dataset,\n", " privacy_unit,\n", " budget,\n", ")\n", "display(Source(pup_actors.relation().dot()))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 514 }, "id": "2FgBljDGNm5k", "outputId": "f723cd76-1672-4494-faf8-e9a0db68c877" }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_621r\n", "\n", "\n", "\n", "graph_621r\n", "\n", "MAP_AZCR size ∈ int[0 3431966]\n", "_PRIVACY_UNIT_ = md5(cast_as_text(field_6w80)) ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = 1 ∈ int{1}\n", "actor_id = field_rvp5 ∈ int\n", "movie_id = field_kwgs ∈ int\n", "role = field_cenr ∈ str\n", "\n", "\n", "\n", "graph_mgxe\n", "\n", "JOIN_5PXQ size ∈ int[0 3431966]\n", "field_6w80 = _LEFT_.id ∈ int\n", "field_adyv = _LEFT_.first_name ∈ str\n", "field_mg8l = _LEFT_.last_name ∈ str\n", "field_ey3i = _LEFT_.gender ∈ str{F, M}\n", "field_rvp5 = _RIGHT_.actor_id ∈ int\n", "field_kwgs = _RIGHT_.movie_id ∈ int\n", "field_cenr = _RIGHT_.role ∈ str\n", "INNER ON (_RIGHT_.actor_id = _LEFT_.id)\n", "\n", "\n", "\n", "graph_621r->graph_mgxe\n", "\n", "\n", "\n", "\n", "\n", "graph_hq3c\n", "\n", "IMDB_IJS_ACTORS size ∈ int{817718}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str{F, M}\n", "\n", "\n", "\n", "graph_mgxe->graph_hq3c\n", "\n", "\n", "\n", "\n", "\n", "graph_6mn0\n", "\n", "IMDB_IJS_ROLES size ∈ int{3431966}\n", "actor_id = actor_id ∈ int\n", "movie_id = movie_id ∈ int\n", "role = role ∈ str\n", "\n", "\n", "\n", "graph_mgxe->graph_6mn0\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Privacy unit is propagted accross the relation\n", "pup_roles = dataset.imdb_ijs.roles.relation().rewrite_as_privacy_unit_preserving(\n", " dataset,\n", " privacy_unit,\n", " budget,\n", ")\n", "display(Source(pup_roles.relation().dot()))" ] }, { "cell_type": "markdown", "metadata": { "id": "qk3eioXXrgji" }, "source": [ "## Convert a simple aggregation query into a DP equivalent" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ddhnyutdPSZM", "outputId": "07dc4af5-e2ee-419a-90a9-e950f0e80ee9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DpEvent(Gaussian { noise_multiplier: 3.776479532659047 })\n" ] } ], "source": [ "# We create a basic aggregation relation\n", "query = \"SELECT gender, COUNT(*) FROM imdb_ijs.actors GROUP BY gender\"\n", "relation = dataset.relation(query)\n", "# The relation is turned into a DP equivalent\n", "relation_with_dp_event = relation.rewrite_with_differential_privacy(\n", " dataset,\n", " privacy_unit,\n", " budget,\n", ")\n", "# Rewritten relation\n", "dp_relation = relation_with_dp_event.relation()\n", "# Privacy loss\n", "dpe = relation_with_dp_event.dp_event()\n", "print(dpe)" ] }, { "cell_type": "markdown", "metadata": { "id": "NcrsgziRusvy" }, "source": [ "`dp_events` are compatible with Google's `dp_accounting`" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "slRgVu4nFE34" }, "outputs": [], "source": [ "%%capture\n", "!pip install dp_accounting==0.4.1" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "79jowwk0PrO7", "outputId": "5782d8b6-6ff8-4e53-b534-a927916515e4" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.10/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5\n", " warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "GaussianDpEvent(noise_multiplier=3.776479532659047)\n" ] } ], "source": [ "# Google's dp_accounting\n", "from dp_accounting import DpEvent\n", "print(DpEvent.from_named_tuple(dpe.to_named_tuple()))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "NUGUTOOEP8BQ", "outputId": "0adfd82c-95a0-401a-c0ec-a807c2174bce" }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "graph_993t\n", "\n", "\n", "\n", "graph_993t\n", "\n", "MAP__I7V size ∈ int[0 1635436]\n", "gender = field_d4_o ∈ str{F, M} UNIQUE\n", "field_34j2 = field_i4xy ∈ int[0 1635436]\n", "\n", "\n", "\n", "graph_2hsm\n", "\n", "MAP_SU7N size ∈ int[0 1635436]\n", "field_d4_o = field_d4_o ∈ str{F, M} UNIQUE\n", "field_i4xy = field_i4xy ∈ int[0 1635436]\n", "\n", "\n", "\n", "graph_993t->graph_2hsm\n", "\n", "\n", "\n", "\n", "\n", "graph_kcx7\n", "\n", "MAP_WRHO size ∈ int[0 1635436]\n", "field_i4xy = cast_as_integer(_COUNT_field_n9r3) ∈ int[0 1635436]\n", "field_d4_o = field_d4_o ∈ str{F, M} UNIQUE\n", "\n", "\n", "\n", "graph_2hsm->graph_kcx7\n", "\n", "\n", "\n", "\n", "\n", "graph_sgro\n", "\n", "MAP_UIRA size ∈ int[0 1635436]\n", "field_d4_o = field_d4_o ∈ str{F, M} UNIQUE\n", "_COUNT_field_n9r3 = least(1635436, greatest(0, (coalesce(_COUNT_field_n9r3, 0) + (3.776479532659047 * (sqrt((-2 * ln(random()...\n", "\n", "\n", "\n", "graph_kcx7->graph_sgro\n", "\n", "\n", "\n", "\n", "\n", "graph_7mg8\n", "\n", "REDUCE_14X4 size ∈ int[0 1635436]\n", "field_d4_o = first(field_zyse) ∈ str{F, M} UNIQUE\n", "_COUNT_field_n9r3 = sum(field_8d51) ∈ option(float[0 1635436])\n", "GROUP BY (field_zyse)\n", "\n", "\n", "\n", "graph_sgro->graph_7mg8\n", "\n", "\n", "\n", "\n", "\n", "graph_2d74\n", "\n", "MAP_KLRF size ∈ int[0 1635436]\n", "field_8d51 = _CLIPPED__ONE_field_n9r3 ∈ option(float[0.0000000004781340467849906 1])\n", "field_zyse = field_d4_o ∈ str{F, M}\n", "\n", "\n", "\n", "graph_7mg8->graph_2d74\n", "\n", "\n", "\n", "\n", "\n", "graph_g5pd\n", "\n", "MAP_7DPD size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = _PRIVACY_UNIT_ ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = _PRIVACY_UNIT_WEIGHT_ ∈ float{0, 1}\n", "field_d4_o = field_d4_o ∈ str{F, M}\n", "_ONE_field_n9r3 = _ONE_field_n9r3 ∈ option(float{1})\n", "_SCALE_FACTOR__PRIVACY_UNIT_ = _SCALE_FACTOR__PRIVACY_UNIT_ ∈ str\n", "_SCALE_FACTOR__ONE_field_n9r3 = _SCALE_FACTOR__ONE_field_n9r3 ∈ option(float[0.0000000004781340467849906 1])\n", "_CLIPPED__ONE_field_n9r3 = (_ONE_field_n9r3 * _SCALE_FACTOR__ONE_field_n9r3) ∈ option(float[0.0000000004781340467849906 1])\n", "\n", "\n", "\n", "graph_2d74->graph_g5pd\n", "\n", "\n", "\n", "\n", "\n", "graph_qu25\n", "\n", "JOIN_RCG8 size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = _LEFT_._PRIVACY_UNIT_ ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = _LEFT_._PRIVACY_UNIT_WEIGHT_ ∈ float{0, 1}\n", "field_d4_o = _LEFT_.field_d4_o ∈ str{F, M}\n", "_ONE_field_n9r3 = _LEFT_._ONE_field_n9r3 ∈ option(float{1})\n", "_SCALE_FACTOR__PRIVACY_UNIT_ = _RIGHT_._PRIVACY_UNIT_ ∈ str\n", "_SCALE_FACTOR__ONE_field_n9r3 = _RIGHT_._ONE_field_n9r3 ∈ option(float[0.0000000004781340467849906 1])\n", "INNER ON (_LEFT_._PRIVACY_UNIT_ = _RIGHT_._PRIVACY_UNIT_)\n", "\n", "\n", "\n", "graph_g5pd->graph_qu25\n", "\n", "\n", "\n", "\n", "\n", "graph_uw1x\n", "\n", "MAP_Z1NZ size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = _PRIVACY_UNIT_ ∈ str UNIQUE\n", "_ONE_field_n9r3 = (1 / greatest(1, (sqrt(_ONE_field_n9r3) / 1))) ∈ option(float[0.0000000004781340467849906 1])\n", "\n", "\n", "\n", "graph_qu25->graph_uw1x\n", "\n", "\n", "\n", "\n", "\n", "graph_ub3w\n", "\n", "MAP___WZ size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = coalesce(cast_as_text(_PRIVACY_UNIT_), _PRIVACY_UNIT_DEFAULT_) ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = coalesce(_PRIVACY_UNIT_WEIGHT_, 0) ∈ float{0, 1}\n", "field_d4_o = field_d4_o ∈ str{F, M}\n", "_ONE_field_n9r3 = case(is_null(field_n9r3), 0, 1) ∈ option(float{1})\n", "\n", "\n", "\n", "graph_qu25->graph_ub3w\n", "\n", "\n", "\n", "\n", "\n", "graph_kf0d\n", "\n", "REDUCE_SA4I size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = first(field_grep) ∈ str UNIQUE\n", "_ONE_field_n9r3 = sum(field_7ra5) ∈ option(float[0 4374220385803761700])\n", "GROUP BY (field_grep)\n", "\n", "\n", "\n", "graph_uw1x->graph_kf0d\n", "\n", "\n", "\n", "\n", "\n", "graph_h79n\n", "\n", "MAP_TXEX size ∈ int[0 1635436]\n", "field_7ra5 = _NORM__ONE_field_n9r3 ∈ option(float[0 2674650910096])\n", "field_grep = _PRIVACY_UNIT_ ∈ str\n", "\n", "\n", "\n", "graph_kf0d->graph_h79n\n", "\n", "\n", "\n", "\n", "\n", "graph_t24c\n", "\n", "MAP_5820 size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = _PRIVACY_UNIT_ ∈ str\n", "field_d4_o = field_d4_o ∈ str{F, M}\n", "_NORM__ONE_field_n9r3 = abs((_NORM__ONE_field_n9r3 * _NORM__ONE_field_n9r3)) ∈ option(float[0 2674650910096])\n", "\n", "\n", "\n", "graph_h79n->graph_t24c\n", "\n", "\n", "\n", "\n", "\n", "graph_2to3\n", "\n", "REDUCE_MQ2I size ∈ int[0 1635436]\n", "_PRIVACY_UNIT_ = first(field_grep) ∈ str\n", "field_d4_o = first(field_zyse) ∈ str{F, M}\n", "_NORM__ONE_field_n9r3 = sum(field_1iro) ∈ option(int[0 1635436])\n", "GROUP BY (field_grep, field_zyse)\n", "\n", "\n", "\n", "graph_t24c->graph_2to3\n", "\n", "\n", "\n", "\n", "\n", "graph_7d7_\n", "\n", "MAP_MVF9 size ∈ int[0 1635436]\n", "field_1iro = _ONE_field_n9r3 ∈ option(float{1})\n", "field_grep = _PRIVACY_UNIT_ ∈ str\n", "field_zyse = field_d4_o ∈ str{F, M}\n", "\n", "\n", "\n", "graph_2to3->graph_7d7_\n", "\n", "\n", "\n", "\n", "\n", "graph_7d7_->graph_ub3w\n", "\n", "\n", "\n", "\n", "\n", "graph_p3ws\n", "\n", "MAP_1V6W size ∈ int[0 1635436]\n", "field_d4_o = field_d4_o ∈ str{F, M}\n", "_PRIVACY_UNIT_ = _PRIVACY_UNIT_ ∈ option(str)\n", "_PRIVACY_UNIT_WEIGHT_ = _PRIVACY_UNIT_WEIGHT_ ∈ option(int{1})\n", "field_n9r3 = field_n9r3 ∈ option(int{1})\n", "\n", "\n", "\n", "graph_ub3w->graph_p3ws\n", "\n", "\n", "\n", "\n", "\n", "graph__zq1\n", "\n", "JOIN_XCZI size ∈ int[0 1635436]\n", "field_d4_o = _LEFT_.field_d4_o ∈ str{F, M}\n", "_PRIVACY_UNIT_ = _RIGHT_._PRIVACY_UNIT_ ∈ option(str)\n", "_PRIVACY_UNIT_WEIGHT_ = _RIGHT_._PRIVACY_UNIT_WEIGHT_ ∈ option(int{1})\n", "field_n9r3 = _RIGHT_.field_n9r3 ∈ option(int{1})\n", "left__q8p1 = _RIGHT_.field_d4_o ∈ option(str{F, M})\n", "LEFT ON (_LEFT_.field_d4_o = _RIGHT_.field_d4_o)\n", "\n", "\n", "\n", "graph_p3ws->graph__zq1\n", "\n", "\n", "\n", "\n", "\n", "graph_6pjn\n", "\n", "FIELD_D4_O size ∈ int{2}\n", "[F, M]\n", "\n", "\n", "\n", "graph__zq1->graph_6pjn\n", "\n", "\n", "\n", "\n", "\n", "graph_y8n0\n", "\n", "MAP_Q1HJ size ∈ int[0 817718]\n", "_PRIVACY_UNIT_ = _PRIVACY_UNIT_ ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = _PRIVACY_UNIT_WEIGHT_ ∈ int{1}\n", "field_n9r3 = 1 ∈ int{1}\n", "field_d4_o = gender ∈ str{F, M}\n", "\n", "\n", "\n", "graph__zq1->graph_y8n0\n", "\n", "\n", "\n", "\n", "\n", "graph_jcyp\n", "\n", "MAP_O7OD size ∈ int[0 817718]\n", "_PRIVACY_UNIT_ = md5(cast_as_text(id)) ∈ str\n", "_PRIVACY_UNIT_WEIGHT_ = 1 ∈ int{1}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str{F, M}\n", "\n", "\n", "\n", "graph_y8n0->graph_jcyp\n", "\n", "\n", "\n", "\n", "\n", "graph_hq3c\n", "\n", "IMDB_IJS_ACTORS size ∈ int{817718}\n", "id = id ∈ int UNIQUE\n", "first_name = first_name ∈ str\n", "last_name = last_name ∈ str\n", "gender = gender ∈ str{F, M}\n", "\n", "\n", "\n", "graph_jcyp->graph_hq3c\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(Source(dp_relation.dot()))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 125 }, "id": "-jdXAb-xQE9W", "outputId": "b7c6ebc9-56f4-4974-c36f-57e695d62319" }, "outputs": [ { "data": { "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" }, "text/plain": [ "\"WITH field_d4_o (field_d4_o) AS (SELECT * FROM (VALUES ('F'), ('M')) AS field_d4_o (field_d4_o)), map_o7od (_PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, id, first_name, last_name, gender) AS (SELECT MD5(CAST(id AS TEXT)) AS _PRIVACY_UNIT_, 1 AS _PRIVACY_UNIT_WEIGHT_, id AS id, first_name AS first_name, last_name AS last_name, gender AS gender FROM imdb_ijs.actors), map_q1hj (_PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, field_n9r3, field_d4_o) AS (SELECT _PRIVACY_UNIT_ AS _PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_ AS _PRIVACY_UNIT_WEIGHT_, 1 AS field_n9r3, gender AS field_d4_o FROM map_o7od), join_xczi (field_d4_o, _PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, field_n9r3, left__q8p1) AS (SELECT * FROM field_d4_o AS _LEFT_ LEFT JOIN map_q1hj AS _RIGHT_ ON (_LEFT_.field_d4_o) = (_RIGHT_.field_d4_o)), map_1v6w (field_d4_o, _PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, field_n9r3) AS (SELECT field_d4_o AS field_d4_o, _PRIVACY_UNIT_ AS _PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_ AS _PRIVACY_UNIT_WEIGHT_, field_n9r3 AS field_n9r3 FROM join_xczi), map___wz (_PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, field_d4_o, _ONE_field_n9r3) AS (SELECT COALESCE(CAST(_PRIVACY_UNIT_ AS TEXT), '_PRIVACY_UNIT_DEFAULT_') AS _PRIVACY_UNIT_, COALESCE(_PRIVACY_UNIT_WEIGHT_, 0) AS _PRIVACY_UNIT_WEIGHT_, field_d4_o AS field_d4_o, CASE WHEN field_n9r3 IS NULL THEN 0 ELSE 1 END AS _ONE_field_n9r3 FROM map_1v6w), map_mvf9 (field_1iro, field_grep, field_zyse) AS (SELECT _ONE_field_n9r3 AS field_1iro, _PRIVACY_UNIT_ AS field_grep, field_d4_o AS field_zyse FROM map___wz), reduce_mq2i (_PRIVACY_UNIT_, field_d4_o, _NORM__ONE_field_n9r3) AS (SELECT field_grep AS _PRIVACY_UNIT_, field_zyse AS field_d4_o, SUM(field_1iro) AS _NORM__ONE_field_n9r3 FROM map_mvf9 GROUP BY field_grep, field_zyse), map_5820 (_PRIVACY_UNIT_, field_d4_o, _NORM__ONE_field_n9r3) AS (SELECT _PRIVACY_UNIT_ AS _PRIVACY_UNIT_, field_d4_o AS field_d4_o, ABS((_NORM__ONE_field_n9r3) * (_NORM__ONE_field_n9r3)) AS _NORM__ONE_field_n9r3 FROM reduce_mq2i), map_txex (field_7ra5, field_grep) AS (SELECT _NORM__ONE_field_n9r3 AS field_7ra5, _PRIVACY_UNIT_ AS field_grep FROM map_5820), reduce_sa4i (_PRIVACY_UNIT_, _ONE_field_n9r3) AS (SELECT field_grep AS _PRIVACY_UNIT_, SUM(field_7ra5) AS _ONE_field_n9r3 FROM map_txex GROUP BY field_grep), map_z1nz (_PRIVACY_UNIT_, _ONE_field_n9r3) AS (SELECT _PRIVACY_UNIT_ AS _PRIVACY_UNIT_, (1) / (GREATEST(1, (SQRT(_ONE_field_n9r3)) / (1))) AS _ONE_field_n9r3 FROM reduce_sa4i), join_rcg8 (_PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, field_d4_o, _ONE_field_n9r3, _SCALE_FACTOR__PRIVACY_UNIT_, _SCALE_FACTOR__ONE_field_n9r3) AS (SELECT * FROM map___wz AS _LEFT_ JOIN map_z1nz AS _RIGHT_ ON (_LEFT_._PRIVACY_UNIT_) = (_RIGHT_._PRIVACY_UNIT_)), map_7dpd (_PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_, field_d4_o, _ONE_field_n9r3, _SCALE_FACTOR__PRIVACY_UNIT_, _SCALE_FACTOR__ONE_field_n9r3, _CLIPPED__ONE_field_n9r3) AS (SELECT _PRIVACY_UNIT_ AS _PRIVACY_UNIT_, _PRIVACY_UNIT_WEIGHT_ AS _PRIVACY_UNIT_WEIGHT_, field_d4_o AS field_d4_o, _ONE_field_n9r3 AS _ONE_field_n9r3, _SCALE_FACTOR__PRIVACY_UNIT_ AS _SCALE_FACTOR__PRIVACY_UNIT_, _SCALE_FACTOR__ONE_field_n9r3 AS _SCALE_FACTOR__ONE_field_n9r3, (_ONE_field_n9r3) * (_SCALE_FACTOR__ONE_field_n9r3) AS _CLIPPED__ONE_field_n9r3 FROM join_rcg8), map_klrf (field_8d51, field_zyse) AS (SELECT _CLIPPED__ONE_field_n9r3 AS field_8d51, field_d4_o AS field_zyse FROM map_7dpd), reduce_14x4 (field_d4_o, _COUNT_field_n9r3) AS (SELECT field_zyse AS field_d4_o, SUM(field_8d51) AS _COUNT_field_n9r3 FROM map_klrf GROUP BY field_zyse), map_uira (field_d4_o, _COUNT_field_n9r3) AS (SELECT field_d4_o AS field_d4_o, LEAST(1635436, GREATEST(0, (COALESCE(_COUNT_field_n9r3, 0)) + ((3.776479532659047) * ((SQRT((-2) * (LN(RANDOM())))) * (COS((6.283185307179586) * (RANDOM()))))))) AS _COUNT_field_n9r3 FROM reduce_14x4), map_wrho (field_i4xy, field_d4_o) AS (SELECT CAST(_COUNT_field_n9r3 AS INTEGER) AS field_i4xy, field_d4_o AS field_d4_o FROM map_uira), map_su7n (field_d4_o, field_i4xy) AS (SELECT field_d4_o AS field_d4_o, field_i4xy AS field_i4xy FROM map_wrho), map__i7v (gender, field_34j2) AS (SELECT field_d4_o AS gender, field_i4xy AS field_34j2 FROM map_su7n) SELECT * FROM map__i7v\"" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dp_query = dp_relation.to_query()\n", "display(dp_query)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 302 }, "id": "DY3fazGJQcOr", "outputId": "3817c556-1650-47c8-f8bd-42b03ff852e8" }, "outputs": [ { "data": { "text/markdown": [ "## Result" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gendercount
0F304412
1M513306
\n", "\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", " \n" ], "text/plain": [ " gender count\n", "0 F 304412\n", "1 M 513306" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/markdown": [ "## DP Result" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderfield_34j2
0M513300
1F304413
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ " gender field_34j2\n", "0 M 513300\n", "1 F 304413" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "\n", "true_res = pd.read_sql(query, database.engine())\n", "dp_res = pd.read_sql(dp_query, database.engine())\n", "\n", "display(Markdown(f'## Result'))\n", "display(true_res)\n", "\n", "display(Markdown(f'## DP Result'))\n", "display(dp_res)" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 0 }