# Information and Estimation Theoretic Approaches to Data Privacy

Asoodeh, Shahab

thesis

eng

## Keyword

Information theory , Estimation theory , Data privacy , Privacy-preserving mechanism design

## Abstract

Warner [145] in 1960s proposed a simple mechanism, now referred to as the randomized response model, as a remedy for what he termed “evasive answer bias” in survey sampling. The randomized response setting is as follows: $n$ people participate in a survey and a statistician asks each individual a sensitive yes-no question and seeks to find the ratio of "yes" responses. For privacy purposes, individuals are given a biased coin that comes up heads with probability $a\in(0,\frac{1}{2})$. Each individual flips the coin in private. If it comes up heads, they lie and if it comes up tails, they tell the truth. Warner derived a maximum likelihood unbiased estimator for the true ratio of "yes" based on the reported responses. Thus the parameter of interest is estimated accurately while preserving the privacy of each user and avoiding survey answer bias. In this thesis, we generalize Warner's randomized response model in several directions: (i) we assume that the response of each individual consists of private and non-private data and the goal is to generate a response which carries as much "information" about the non-private data as possible while limiting the "information leakage" about the private data, (ii) we propose mathematically well founded metrics to quantify the tradeoff between how much the response leaks about the private data and how much information it conveys about the non-private data, (iii) we make no assumptions on the alphabets of the private and non-private data, and (iv) we design optimal response mechanisms which achieve the fundamental tradeoffs. Unlike the large body of recent research on privacy which studied the problem of reducing disclosure risk, in this thesis we formulate and study the tradeoff between utility (e.g., statistical efficiency) and privacy (e.g., information leakage). Our approach (which is two-fold: information-theoretic and estimation-theoretic) and results shed light on the fundamental limits of the utility-privacy tradeoff.