Mathematics & Machine Learning Seminar
The median of means (MOM) estimator and its variants has become a go-to method for problems involving heavy-tailed and outlier-contaminated data. Examples include robust versions of mean and covariance estimators, linear regression, and k-means clustering, among others. Achieving the best possible performance for the MOM estimator in the simplest univariate case has positive implications for many of these problems. In the first part of the talk, we will demonstrate how to obtain an efficient version of the MOM estimator that satisfies deviation inequalities with sharp constants, requiring only minimal assumptions on the underlying distribution. Moreover, we will discuss the interplay between this question and the theory of U-statistics.
The second part of the talk will be devoted to the multivariate version of the MOM estimator based on the geometric median. We will demonstrate that for large classes of heavy-tailed distributions, the geometric MOM estimator attains sub-exponential deviation guarantees, improving the known bounds in many cases. New analysis of this estimator reveals interesting connections with the small ball probabilities and some questions about the negative moments of the norms.