{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "XYIE2WkNm1UE" }, "source": [ "# Range propagation\n", "[](https://github.com/Qrlew/docs/blob/main/tutorials/range_propagation.ipynb)\n", "[](https://colab.research.google.com/github/Qrlew/pyqrlew/blob/main/examples/range_propagation.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "MzGGFbQtm1UG" }, "source": [ "When one wants to release aggregate statistics with the guarantee that the output will not reveal anything about the individuals in the data, [differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) is the way to go.\n", "Many [differentialy private mechanisms](https://en.wikipedia.org/wiki/Differential_privacy) consist of sums where each term is known to be bounded — so that the *sensitivity* is easy to compute — to which some noise is added, usually [Laplace](https://en.wikipedia.org/wiki/Additive_noise_differential_privacy_mechanisms#Laplace_Mechanism) or [Gaussian](https://en.wikipedia.org/wiki/Additive_noise_differential_privacy_mechanisms#Gaussian_Mechanism).\n", "For these mechanisms and others, it is crucial to be able to bound some values.\n", "\n", "*Bounding* can be achieved in many ways.\n", "\n", "- Bounds can be *forced* by clipping values, but then the computation of the statistics may be biased.\n", "- Bounds can be *inferred* by ranges propagation, a range of the values is propagated across successive transforms.\n", "\n", "A case where the tradeoff between *clipping* and *propagating ranges* is particularly difficult is the case of values with few remote outliers.\n", "If ranges are simply propagated, the presence of outliers forces the sensitivities to be large and therefore the noise added reduces drastically the utility of the result.\n", "To avoid adding too much noise, the values can be clipped so that the noise added is smaller, but then the outliers are dropped and the statistics are biased.\n", "\n", "In this notebook, we'll focus on *range propagation* using [`qrlew`](https://qrlew.github.io/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "swnCxCAXm1UH" }, "outputs": [], "source": [ "%%capture\n", "!sudo apt-get -y -qq update\n", "!sudo apt-get -y -qq install graphviz\n", "!pip install graphviz\n", "!pip install pyqrlew" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "e0wcoOf2m1UG" }, "outputs": [], "source": [ "import logging\n", "logging.disable(logging.INFO)" ] }, { "cell_type": "markdown", "metadata": { "id": "ktdEFvqpm1UI" }, "source": [ "We load a csv extract of the [Kuzak Dempsy's dataset](https://data.world/kudem):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "CwWxexE5m1UI" }, "outputs": [], "source": [ "import pyqrlew as pq\n", "from pyqrlew.io.utils import from_csv\n", "qdb = from_csv(\n", " table_name=\"heart_data\",\n", " csv_file=\"https://storage.googleapis.com/qrlew-demo-data/heart_data.csv\"\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "Bp3E4wwWm1UI", "outputId": "9d841bbc-7d41-42c2-d058-946f1ac99988" }, "outputs": [ { "data": { "text/html": [ "
| \n", " | id | \n", "age | \n", "gender | \n", "height | \n", "weight | \n", "ap_hi | \n", "ap_lo | \n", "cholesterol | \n", "gluc | \n", "smoke | \n", "alco | \n", "active | \n", "cardio | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "18393 | \n", "2 | \n", "168 | \n", "62.0 | \n", "110 | \n", "80 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
| 1 | \n", "1 | \n", "20228 | \n", "1 | \n", "156 | \n", "85.0 | \n", "140 | \n", "90 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
| 2 | \n", "2 | \n", "18857 | \n", "1 | \n", "165 | \n", "64.0 | \n", "130 | \n", "70 | \n", "3 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "
| 3 | \n", "3 | \n", "17623 | \n", "2 | \n", "169 | \n", "82.0 | \n", "150 | \n", "100 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "1 | \n", "
| 4 | \n", "4 | \n", "17474 | \n", "1 | \n", "156 | \n", "56.0 | \n", "100 | \n", "60 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "