CMI Seminar: Adam Smith
Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. This work answers a basic question about testing simple hypotheses subject to a stability, or "privacy", constraint: given two distributions P and Q, and a privacy level ε, how many i.i.d. samples are needed to distinguish P from Q such that changing one of the samples changes the test's acceptance probability by at most ε. What sort of tests have optimal sample complexity? We characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level ε, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. Our result is an analogue of the classical Neyman-Pearson lemma in the setting of private hypothesis testing. Our analysis also sheds light on classical hypothesis testing, giving a new interpretation of the Hellinger distance between distributions.
Joint work with Clément Canonne, Gautam Kamath, Audra McMillan, and Jonathan Ullman. To appear as STOC 2019. Available as https://arxiv.org/abs/1811.11148