“Is Memorization Compatible with Learning?”
Speaker: Alexander (Sasha) Rakhlin, Associate Professor, Massachusetts Institute of Technology
Center for Statistics, IDSS, Department of Brain & Cognitive Sciences, Laboratory for Information & Decision Systems, Center for Brains, Minds, and Machines
Abstract: One of the key tenets taught in courses on Statistics and Machine Learning is that fitting the data too well inevitably leads to overfitting and poor prediction performance. Yet, over-parametrized neural networks appear to defy this “rule”. We will provide theoretical evidence that challenges the common wisdom. In particular, we will consider the minimum norm interpolant in a reproducing kernel Hilbert space and show its good generalization properties in certain high-dimensional regimes. Furthermore, our estimates suggest a counterintuitive “multiple descent” phenomenon whereby more data leads to alternating phases of better and worse performance.
Since gradient dynamics for wide randomly-initialized neural networks provably converge to a minimum-norm interpolant (with respect to a certain kernel), our results imply generalization and consistency for such neural networks. We will contrast our approach with the classical techniques based on uniform convergence and Rademacher averages and argue that these techniques are not sufficient for analyzing the memorization regime.
Joint work with Tengyuan Liang and Xiyu Zhai.
Alexander (Sasha) Rakhlin is an Associate Professor at MIT. His research is in Statistics and Machine Learning. He received his bachelor’s degrees in mathematics and computer science from Cornell University, and doctoral degree in computational neuroscience from MIT. He was a postdoc at UC Berkeley EECS before joining the University of Pennsylvania, where he was an Associate Professor in the Department of Statistics before joining MIT. http://www.mit.edu/~rakhlin/