Theory of Computing Seminar
Abstract:
Suppose we would like to estimate the mean of a
high-dimensional Gaussian. It is easy to see that the sample mean is an
accurate estimator of the mean. Now suppose that an adversary corrupts a
small fraction of our samples. Can we still recover the true mean, up to
a small error? A moment's thought reveals that the sample mean is not a
good estimator in this noisy setting.
The concept of the Tukey median is a robust estimator of the mean in the
noisy setting under quite general conditions. Unfortunately, it is hard
to compute in the worst-case, even approximately. This prompts the
following question: Can we reconcile robustness
andcomputationalefficiency in high-dimensional distribution estimation?
In this talk, I will describe a polynomial time spectral technique to
robustly estimate the mean of a high-dimensional Gaussian, up to
*near-optimal* accuracy. Time permitting, I will also discuss a
plausible computational barrier towards improving the performance of
this algorithm.
The aforementioned spectral technique forms the basis of a general
recipe to efficiently detect corruptions in high dimensions. As an
application of this recipe, we obtain the first polynomial time
algorithms for robustly learning various families of high-dimensional
distributions, including Gaussians (with unknown mean and covariance),
discrete product distributions, mixtures thereof (under some natural
conditions), and certain families of graphical models.
The talk will be based on joint works with (different subsets of) G.
Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.
high-dimensional Gaussian. It is easy to see that the sample mean is an
accurate estimator of the mean. Now suppose that an adversary corrupts a
small fraction of our samples. Can we still recover the true mean, up to
a small error? A moment's thought reveals that the sample mean is not a
good estimator in this noisy setting.
The concept of the Tukey median is a robust estimator of the mean in the
noisy setting under quite general conditions. Unfortunately, it is hard
to compute in the worst-case, even approximately. This prompts the
following question: Can we reconcile robustness
andcomputationalefficiency in high-dimensional distribution estimation?
In this talk, I will describe a polynomial time spectral technique to
robustly estimate the mean of a high-dimensional Gaussian, up to
*near-optimal* accuracy. Time permitting, I will also discuss a
plausible computational barrier towards improving the performance of
this algorithm.
The aforementioned spectral technique forms the basis of a general
recipe to efficiently detect corruptions in high dimensions. As an
application of this recipe, we obtain the first polynomial time
algorithms for robustly learning various families of high-dimensional
distributions, including Gaussians (with unknown mean and covariance),
discrete product distributions, mixtures thereof (under some natural
conditions), and certain families of graphical models.
The talk will be based on joint works with (different subsets of) G.
Kamath, D. Kane, J. Li, A. Moitra, and A. Stewart.
For more information, please contact Thomas Vidick by email at [email protected].
Event Series
Theory of Computing Seminar Series