Poster
Robust Sparsification via Sensitivity
Chansophea Wathanak In · Yi Li · David Woodruff · Xuan Wu
[
Abstract
]
Thu 17 Jul 4:30 p.m. PDT
— 7 p.m. PDT
Abstract:
Robustness to outliers is important in machine learning. Many classical problems, including subspace embedding, clustering, and low-rank approximation, lack scalable, outlier-resilient algorithms. This paper considers machine learning problems of the form $\min_{x\in \mathbb{R}^d} F(x)$, where $F(x)=\sum_{i=1}^n F_i(x)$, and their robust counterparts $\min_{x\in\mathbb{R}^d} F^{(m)}(x)$, where $F^{(m)}(x)$ denotes the sum of all but the $m$ largest $F_i(x)$ values. We develop a general framework for constructing $\epsilon$-coresets for such robust problems, where an $\epsilon$-coreset is a weighted subset of functions $\{F_1(x),\dots,F_n(x)\}$ that provides a $(1+\epsilon)$-approximation to $F(x)$. Specifically, if the original problem $F$ has total sensitivity $T$ and admits a vanilla $\epsilon$-coreset of size $S$, our algorithm constructs an $\epsilon$-coreset of size $\tilde{O}(\frac{mT}{\epsilon})+S$ for the robust objective $F^{(m)}$. This coreset size can be shown to be near-tight for $\ell_2$ subspace embedding. Our coreset algorithm has scalable running time and leads to new or improved algorithms for the robust optimization problems. Empirical evaluations demonstrate that our coresets outperform uniform sampling on real-world data sets.
Live content is unavailable. Log in and register to view live content