YINS Seminar Archives: Suvrit Sra (Nov. 20, 2019)

YINS Seminar Archives: Suvrit Sra (Nov. 20, 2019)

Talk Summary: 

“ReLU nets are powerful memorizers: a tight analysis of finite sample expressive power”

 
Speaker: Suvrit Sra, Associate Professor, Department of Electrical Engineering & Computer Science, Institute for Data, Systems & Society (IDSS), Massachusetts Institute of Technology
 
I will talk about finite sample expressivity, aka memorization power of ReLU networks. Recent results showed (unsurprisingly) that arbitrary input data could be perfectly memorized using a shallow ReLU network with one hidden layer having N hidden nodes. I will describe a more careful construction that trades of width with depth to show that a ReLU network with 2 hidden layers, each with 2*sqrt(N) hidden nodes, can perfectly memorize arbitrary datasets. Moreover, we prove that width of Θ(sqrt(N)) is necessary and sufficient for having perfect memorization power. A notable corollary of this result is that mild overparametrization suffices to permit a NN to achieve zero training loss!
 
We extend our results to deep networks too and combined with recent results on VC-dimension of deep nets, we show that our results on memorization power are nearly tight. Time permitting, I will mention expressive power results for Resnets, as well as how SGD behaves on optimizing such networks.
 
This presentation was a part of the YINS Weekly Seminar Series and took place on November 20, 2019.
Speaker: 
Suvrit Sra
Bio: 

Suvrit Sra is an Associate Professor in the EECS Department at MIT, and also a core faculty member of the Laboratory for Information and Decision Systems (LIDS), the Institute for Data, Systems, and Society (IDSS), as well as a member of MIT-ML and Statistics groups. He obtained his PhD in Computer Science from the University of Texas at Austin. Before moving to MIT, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, Tübingen, Germany. He has held visiting faculty positions at UC Berkeley (EECS) and Carnegie Mellon University (Machine Learning Department) during 2013-2014. His research bridges a number of mathematical areas such as differential geometry, matrix analysis, convex analysis, probability theory, and optimization with machine learning. He founded the OPT (Optimization for Machine Learning) series of workshops, held from OPT2008–2017 at the NeurIPS (erstwhile NIPS) conference. He has co-edited a book with the same name (MIT Press, 2011). He is also a co-founder and chief scientist of a healthcare-AI startup.