- Cloud Computing and Resource Management
- Software-Defined Networks and 5G
- Network Traffic and Congestion Control
- Advanced Graph Neural Networks
- Various Chemistry Research Topics
- Machine Learning in Materials Science
- Computational Drug Discovery Methods
- Scientific Computing and Data Management
- IoT-based Smart Home Systems
- Advanced Neural Network Applications
- Technology and Data Analysis
- Brain Tumor Detection and Classification
- Context-Aware Activity Recognition Systems
- Robotics and Automated Systems
- Experimental Learning in Engineering
- Online Learning and Analytics
- QR Code Applications and Technologies
- Machine Learning in Healthcare
- Advanced Data Storage Technologies
- Internet Traffic Analysis and Secure E-voting
- Data Stream Mining Techniques
- Graph Theory and Algorithms
- Parallel Computing and Optimization Techniques
- Advanced Memory and Neural Computing
- Energy Efficient Wireless Sensor Networks
Hong Kong University of Science and Technology
2020-2025
University of Hong Kong
2020-2025
Imperial College London
2022-2023
Guangzhou Experimental Station
2022-2023
Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including fairness, fast and stability, due the mismatch between their objective functions these properties. Despite being intuitive, integrating properties into existing CC is challenging, because: 1) training environments are designed optimization single flow...
Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions nodes billions edges. However, previous GNN systems demonstrate poor scalability because large interleaved computation dependencies training cause significant overhead current parallelization methods. We...
Mixture-of-Expert (MoE) models outperform conventional by selectively activating different subnets, named \emph{experts}, on a per-token basis. This gated computation generates dynamic communications that cannot be determined beforehand, challenging the existing GPU interconnects remain \emph{static} during distributed training process. In this paper, we advocate for first-of-its-kind system, called mFabric, unlocks topology reconfiguration \emph{during} MoE training. Towards vision, first...
Parameter/gradient exchange plays an important role in large-scale distributed machine learning (DML). However, prior solutions such as parameter server (PS) or ring-allreduce (Ring) fall short since they are not resilient to issues uncertainties like oversubscription, congestion failures that may occur datacenter networks (DCN).
In Machine Learning (ML) system research, efficient resource scheduling and utilization have always been an important topic given the compute-intensive nature of ML applications. this paper, we introduce design TACC, a full-stack cloud infrastructure that efficiently manages executes large-scale machine learning applications in compute clusters. TACC implements 4-layer application workflow abstraction through which optimization techniques can be dynamically combined applied to various types...
Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years. Despite continuous efforts, prior solutions such as gradient compression/reduction, compute/communication overlapping layer-wise flow scheduling, etc., are still coarse-grained insufficient for efficient especially when the network is under pressure. We present DLCP, a novel solution exploiting domain-specific properties of deep learning optimize communication...
Distributed GNN training tends to generate huge volumes of communication. To reduce communication cost, the state-of-the-art sampling-based techniques sample and retrieve only a subset nodes. However, our analysis shows that current sampling algorithms are still inefficient in network for distributed training, which is mainly because three problems: first, they overlook locality sampled neighbor nodes cluster; second, data at coarse-grained graph node level; third, some mechanisms adopted...
Rate limiter is required by RDMA NIC (RNIC) to enforce the rate limits calculated congestion control. RNIC expects be accurate and scalable: precisely shape traffic for numerous flows with minimized resource consumption, thereby mitigating incasts congestions improving network performance. Previous works, however, fail meet performance requirements of while achieving accuracy scalability.
Abstract Rapid and accurate prediction of molecular properties is a fundamental task in drug discovery. In recent years, deep learning-based property methods have received much attention successes shown that learning the representations structures by applying graph neural networks (GNNs) can achieve better results. However, most previous approaches typically focus on atomic embedding, while this paper, we propose novel method based atom pair it was applied to two types task. Firstly,...
Rapid and accurate prediction of molecular properties is a fundamental task in drug discovery. In recent years, deep learning-based property methods have received much attention successes shown that learning the representations structures by applying graph neural networks (GNNs) can achieve better results. However, most previous approaches typically focus on atomic embedding, while this paper, we propose novel method based atom pair it was applied to two types task. Firstly, embedding done...