Skip to main content

Xiaohui (Helen) Gu

XG

Professor

3274 Engineering Building II (EB2)

919-515-7045 Website

Bio

Xiaohui (Helen) Gu is a professor in the Department of Computer Science at NC State University. Her research focuses on computer systems, with particular emphasis on autonomous system management using machine learning, large-scale data stream processing, service overlay networks, peer-to-peer systems, and mobile wireless systems.

Before joining NC State, Gu was a research staff member at IBM T. J. Watson Research Center in Hawthorne, New York, from 2004 to 2007. She has received numerous honors throughout her career, including the ILLIAC Fellowship, David J. Kuck Best Master’s Thesis Award, and Saburo Muroga Fellowship at the University of Illinois at Urbana-Champaign. She was also recognized with IBM Invention Achievement Awards in 2004, 2006, and 2007.

Education

Ph.D. Computer Science University of Illinois at Urbana-Champaign 2004

M.S. Computer Science University of Illinois at Urbana-Champaign 2001

B.S. Computer Science Peking University, China 1999

Area(s) of Expertise

Architecture and Operating Systems
Cloud Computing
Data Sciences and Analytics
Embedded and Real-Time Systems
Networking and Performance Evaluation
Parallel and Distributed Systems
Scientific and High Performance Computing

Publications

View all publications

Grants

Date: 08/16/21 - 8/15/23
Amount: $155,033.00
Funding Agencies: Cisco Systems, Inc.

Production computing infrastructures, particularly multi-tenant cloud infrastructures, have become increasingly complex and require constant monitoring and maintenance. Cloud service providers are faced with the challenge of both high operation cost and daunting service downtime penalty. Existing monitoring tools continuously collect a large amount of metric and log data but still fail to answer the key operation questions about when and why a cloud infrastructure experiences a problem. In this project, we propose to develop a set of new machine learning technology to automatically detect and diagnose performance and security bugs in production cloud environments.

Date: 08/01/15 - 7/31/21
Amount: $518,000.00
Funding Agencies: National Science Foundation (NSF)

Hosting infrastructures provide users with cost-effective computing solutions by obviating the need for users to maintain complex computing infrastructures themselves. Unfortunately, due to their inherent complexity and sharing nature, hosting infrastructures are prone to performance anomalies caused by various external or internal faults.The goal of this project is to investigate a holistic,cross-site, hybrid performance anomaly debugging framework that intelligently integrates production-site black-box diagnosis and developer-site white-box analysis into a more powerful performance anomaly debugging system.

Date: 01/01/12 - 12/31/18
Amount: $450,000.00
Funding Agencies: National Science Foundation (NSF)

Large-scale virtualized hosting infrastructures have become the fundamental platform for many real world systems such as cloud computing, enterprise data centers, and educational computing lab. However, due to their inherent complexity and sharing nature, hosting infrastructures are prone to various runtime problems such as performance anomalies and software/hardware failures. The overarching objective of this CAREER project is to explore innovative runtime reliability management techniques for large-scale virtualized hosting infrastructures. Our research will focus on handling latent non-crashing distributed system performance problems that are often very difficult to reproduce offline. We propose to combine the power of online learning, knowledge-driven recovery, and in-situ diagnosis to handle unexpected runtime problems more efficiently and effectively.

Date: 01/02/15 - 8/15/16
Amount: $49,567.00
Funding Agencies: Credit Suisse Securities, LLC

Cloud computing infrastructure provides an elastic application deployment environment. Applications can be dynamically instantiated on different physical hosts on demand. However, in order to fully explore the elasticity of the cloud infrastructure, applications should be able to automatically configure themselves when their components are placed in or migrated to different data centers at geographically distributed regions. Unfortunately, today������������������s technology does not provide such an automatic configuration support. The application developer still needs to deal with the hassle of configuring their applications manually. The objective of this proposed project is to develop an automatic application configuration management framework that can decouple the configuration management from the application logic. I propose to conduct the following tasks: 1) identifying the configuration problems in existing cloud platform and container model; 2) Designing the configuration management framework and interfaces that can decouple the configuration management from the application. We will use a multi-tier online auction application as a case study example; and 3) developing our designed configuration management framework and optimizing its performance.

Date: 07/01/10 - 8/15/14
Amount: $300,000.00
Funding Agencies: US Army - Army Research Office

Large-scale virtualized computing infrastructures have become important platforms for many real-world systems such as cloud computing, virtual computing lab, and massive information processing. However, due to its inherent complexity and sharing nature, virtualized computing infrastructures are inevitably prone to various system anomaly problems such as software/hardware failures, performance anomalies, and malicious attacks. The goal of this project is to develop a new predictive anomaly management system to enhance the resilience of virtualized computing infrastructure. The major contributions will be an integrated framework consisting of four synergistic techniques: 1) scalable runtime virtual machine monitoring; 2) self-evolving online anomaly prediction; 3) speculative anomaly diagnosis; and 4) online anomaly correction.

Date: 09/01/09 - 12/31/13
Amount: $320,000.00
Funding Agencies: National Science Foundation (NSF)

We propose to explore the new computing model of offering computation- and/or data-intensive cloud services on active nodes serving on-demand utility computing users. More specifically, we plan to (1) assess the efficacy of resource sharing between foreground interactive utility computing workloads and background high-throughput cloud computing workloads on multi-core servers, in terms of energy saving and performance interference; (2) develop a scheduling and load management middleware that performs dynamic background workload distribution considering the energy-performance tradeoff; and (3) exploits the use of GPGPUs for cloud services on active nodes running foreground workloads mainly on the CPUs.

Date: 09/01/09 - 8/31/13
Amount: $405,000.00
Funding Agencies: National Science Foundation (NSF)

Large-scale hosting infrastructures have become important platforms for many real-world systems such as cloud computing, virtual computing lab, enterprise data centers, and web hosting services. However, system administrators are often overwhelmed by the tasks of correcting various system anomalies such as performance bottlenecks, resource hotspots, and service level objective (SLO) violations. The goal of this project is to develop novel online anomaly prediction and diagnosis techniques to achieve robust continuous system operation. The major contributions will be an integrated framework consisting of three synergistic techniques: i) self-compressing information tracking to achieve low-cost continuous system monitoring; ii) online anomaly prediction that can raise advance alerts to impending anomalies; and iii) just-in-time anomaly diagnosis that can perform online anomaly diagnosis while the system approaches the anomaly state.

Date: 03/01/10 - 2/28/13
Amount: $549,999.00
Funding Agencies: National Science Foundation (NSF)

Scalability is one of the key challenges to computing with hundreds if not thousands of processor. System software solutions generally remain untested with respect to robustness, efficiency, or even correctness under scalability. The inability to change system software at will in large-scale computing installations thus impedes progress in system software. This proposal seeks support to create a mid-size computational infrastructure, called ARC (A Root Cluster), that directly supports research into scalability for system-level software solutions. ARC empowers users temporarily with administrator (root) rights and allows them to replace arbitrary components of the software stack. ARC ultimately enables a multitude of research directions to be assessed under scalability. The objective of this effort is to establish a viable platform to study scalability in practice at mid-level granularity to convince large-scale operators to adopt significant advances in system software development.

Date: 07/01/08 - 12/31/09
Amount: $40,000.00
Funding Agencies: NCSU Institute for Next Generation IT Systems (ITng) [formerly CACC]

The goal of this project is to develop efficient and light-weight VM management techniques to greatly improve the scalability and resource-efficiency of the distributed virtual computing infrastructure.

Date: 07/01/08 - 6/30/09
Amount: $6,000.00
Funding Agencies: NCSU Faculty Research & Professional Development Fund

Large-scale virtual hosting infrastructures have become important platforms for many systems such as virtual computing lab, corporate data centers, and multi-tier web servers. However, system administrators are often overwhelmed by the tasks of correcting various system health problems such as performance bottlenecks, resource hotspots, service level objective (SLO) violations, and various software/hardware failures. We are addressing this challenge through three synergistic techniques: i)intelligent information management system that can adaptively and selectively collect important information and provide query support with low monitoring cost; ii) context-aware distributed anomaly prediction that can raise advance alerts for impending system anomalies; and iii) just-in-time anomaly remediation and diagnosis tools that can dynamically alleviate impending anomalies and produce informative diagnosis reports to the system administrator by applying time-traveling executions and analysis techniques on abnormal system components. Our research will lead to a fundamentally new online health management approach that offers a more cost-effective self-healing solution for large-scale virtual computing environments than previous reactive or proactive approaches. We are developing the prototypes of the proposed system, and will evaluate it on the VCL infrastructure at NCSU.


View all grants
  • National Science Foundation Faculty Early CAREER Award - 2011
  • IBM Faculty Award - 2008, 2010, 2011
  • IBM Exploratory Stream Analytics Innovation Award - 2008