Behavior-Aware Data Valuation for LLMs at Scale

CS AI Seminar Series
EB2 3001 890 Oval Drive, Raleigh

Title: Behavior-Aware Data Valuation for LLMs at Scale Abstract:  Large Language Models (LLMs) depend on massive datasets whose quality and influence remain largely opaque. Data valuation offers principled methods to quantify how training data contributes to model performance and behavior. Yet, scaling classical approaches such as influence functions to trillion-token corpora continues to be a…