Seminars & Colloquia

Cong Guo

Duke University

"Unlocking New Opportunities in Quantization and Sparsity Co-Design for Large Language Models"

Friday March 07, 2025 12:00 PM
Location: 3211, EB2 NCSU Centennial Campus
(Visitor parking instructions)

This talk is part of the System Research Seminar series

 

Abstract: The rapid expansion of Large Language Models (LLMs) has significantly advanced natural language processing. However, the increasing size of these models has led to inference costs that outpace the development of acceleration hardware. To address this challenge, various techniques have been proposed, including structured sparsity patterns to enhance execution efficiency and adaptive numerical data types to balance precision and performance. Additionally, methods focusing on managing outliers in model data have been developed to maintain accuracy during quantization. Building upon these foundations, I will introduce our latest work, **Transitive Array**, a novel framework that unifies quantization and sparsity. Transitive Array minimizes redundant computations and optimizes memory usage, offering a hardware-friendly solution for efficient LLM inference. This advancement presents a new opportunity to co-design quantization and sparsity in LLMs, effectively bridging the gap between escalating model complexities and current hardware limitations.
Short Bio: Cong Guo is a Postdoctoral Associate at Duke University, collaborating with Professors Hai Li and Yiran Chen. He earned his Ph.D. in Computer Science from Shanghai Jiao Tong University and was honored with the 2023 Shanghai Jiao Tong University Outstanding Doctoral Dissertation Award. Cong Guo's research interests lie in computer architecture and high-performance computing, with a focus on software-hardware co-optimization to accelerate efficient artificial intelligence applications. His work includes designing novel architectures and systems for neural networks, particularly in the areas of sparsity and quantization. Over the past five years, he has published more than 10 papers in leading conferences such as ISCA, MICRO, HPCA, and ASPLOS. His work received an Honorable Mention in the 2022 IEEE Micro Top Picks.

Host: Justin Bradley, CSC


Back to Seminar Listings
Back to Colloquia Home Page