Seminars & Colloquia
Cynthia Rudin
Electrical & Computer Engineering Department, Duke University
"On the Solutions of Some Discrete Optimization Problems in Statistics: Optimal Sparse Decision Trees and Matching in Causal Inference"
Thursday March 29, 2018 02:00 PM
Location: 4101, Talley Student Union NCSU Main Campus
(Visitor parking instructions)
This talk is part of the Bioinformatics Seminar Series
Abstract: There are fundamental problems in statistics that require the solutions to hard discrete optimization problems. We take an approach at the intersection of machine learning and computer systems that allows us to solve some of these problems in ways that are very different than the usual approaches. I will first discuss work on Certifiably Optimal RulE ListS (CORELS). CORELS produces one-sided decision trees for categorical data that are provably optimal with respect to accuracy, regularized by sparsity (as measured by the depth of the tree). CORELS models are not constructed in a greedy way like CART or C4.5 models, they are constructed using a combination of several theorems, carefully chosen data structures, custom bit-vector libraries, computational reuse, and taking advantage of symmetry. Because CORELS optimizes over the set of rule lists, we can construct exotic one-sided decision trees that obey monotonicity constraints called 'Falling Rule Lists.' We can also construct optimal treatment regimes that take into account costs of false positives, false negatives, and the cost to gather information, called Cost-Effective Interpretable Treatment Regimes (CITR). In the second part of the talk I will discuss a matching method for causal inference called Fast Large Almost Matching Exactly (FLAME). FLAME tries to match treatment and control units exactly on as many important covariates as possible. It uses tech-niques that are natural for query processing in database manage-ment and bit-vector techniques. It produces high quality matched groups fast for large datasets. Work on CORELS is joint with post-doc Elaine Angelino, students Nicholas Larus-Stone and Daniel Alabi, and colleague Margo Seltzer. Work on Falling Rule Lists is joint with students Fulton Wang and Chaofan Chen, and work on CITR is joint with student Himabindu Lakkaraju. Work on FLAME is joint with student Tianyu Wang, and colleagues Alex Volfovsky and Sudeepa Roy.
Short Bio: Cynthia Rudin is an associate professor of computer science, electrical and computer engineering, and statistics at Duke University, and directs the Prediction Analysis Lab. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo, and a PhD in applied and computational mathematics from Princeton University. She is the recipient of the 2013 and 2016 INFORMS Innovative Applications in Analytics Awards, an NSF CAREER award, was named as one of the “Top 40 Under 40” by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. Work from her lab has won 10 best paper awards in the last 5 years. She is past chair of the INFORMS Data Mining Section, and is currently chair of the Statistical Learning and Data Science section of the American Statistical Association. She also serves on (or has served on) committees for DARPA, the National Institute of Justice, the National Academy of Sciences (for both statistics and criminology/law), and AAAI.
Host: Yi-Hui Zhou, NCSU Biological Sciences