Seminars & Colloquia
Mayank Goswami
Queens College, City University of New York
"Searching in the Age of Big Data: Filters, Nearest Neighbor Search, and the Role of Geometry"
Monday March 15, 2021 09:00 AM
Location: Zoom, EB2 NCSU Centennial Campus
Zoom Meeting Info (Visitor parking instructions)
We will then move on to faster, hashing-based methods, particularly in the context of big-data. Most data today can’t be stored on a single machine, thus necessitating the need for a) succinct sketches to store on RAM, and b) I/O-efficient algorithms (also known as external memory algorithms). I will talk about several variants of Bloom Filters, a sketching-based data structure ubiquitous in databases today. While Bloom Filters solve the approximate membership problem (is the query in the database?), the related similarity search problem (is something similar to the query in the database?) has gained a lot of importance, and I will describe latest results on the development of such similarity filters. This is closely related to the k- nearest neighbor (kNN) problem and I will end with results on its I/O-complexity and applications to machine learning.
Host: Anthony Fortune-Linton, CSC