YINS Distinguished Lecturer Series: Jeffrey Bilmes
“The Science of Data Management”
Abstract: The recent growth of available data is both a blessing and a curse for the field of data science. Large data sets can lead to improved predictive accuracy, create research opportunities in parallel computing, and (as we will discuss) expose holistic knowledge. Such data sets also be plagued with redundancy, leading to wasted computation. In this talk we will discuss a class of approaches to data management based on submodular functions, a powerful class of discrete functions that have properties analogous to both convexity and concavity. We will see how a form of “combinatorial dependence” over data sets can be naturally induced via submodular functions, and how resulting submodular programs (that often have approximation guarantees) can yield practical and high-quality data management strategies, such as data summarization and data partitioning for large-scale parallel computing. The effectiveness will be demonstrated via results from a range of applications, including computer vision, natural language processing, functional genomics, and distributed parallel computation.
.