- Cloud Computing and Resource Management
- Caching and Content Delivery
- Software-Defined Networks and 5G
- Peer-to-Peer Network Technologies
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Mobile Ad Hoc Networks
- Parallel Computing and Optimization Techniques
- Advanced Neural Network Applications
- Wireless Networks and Protocols
- IoT and Edge/Fog Computing
- Network Traffic and Congestion Control
- Cooperative Communication and Network Coding
- Opportunistic and Delay-Tolerant Networks
- Stochastic Gradient Optimization Techniques
- Network Packet Processing and Optimization
- Network Security and Intrusion Detection
- Distributed systems and fault tolerance
- Hydrocarbon exploration and reservoir analysis
- Advanced Wireless Network Optimization
- Domain Adaptation and Few-Shot Learning
- Internet Traffic Analysis and Secure E-voting
- Distributed and Parallel Computing Systems
- Software System Performance and Reliability
- Ultra-Wideband Communications Technology
Microsoft Research (United Kingdom)
2006-2025
Microsoft Research Asia (China)
2016-2025
Wenzhou Medical University
2024-2025
First Affiliated Hospital of Wenzhou Medical University
2024-2025
Second Affiliated Hospital of Xi'an Jiaotong University
2022-2024
Chinese Academy of Sciences
2019-2023
Guangzhou Institute of Geochemistry
2019-2023
University of Chinese Academy of Sciences
2022
Arizona State University
2020-2022
Microsoft (United States)
2005-2011
Highly flexible software network functions (NFs) are crucial components to enable multi-tenancy in the clouds. However, packet processing on a commodity server has limited capacity and induces high latency. While NFs could scale out using more servers, doing so adds significant cost. This paper focuses accelerating with programmable hardware, i.e., FPGA, which is now mature technology inexpensive for datacenters. FPGA predominately programmed low-level hardware description languages (HDLs),...
Performance of in-memory key-value store (KVS) continues to be great importance as modern KVS goes beyond the traditional object-caching workload and becomes a key infrastructure support distributed main-memory computation in data centers. Recent years have witnessed rapid increase network bandwidth centers, shifting bottleneck most from CPU. RDMA-capable NIC partly alleviates problem, but primitives provided by RDMA abstraction are rather limited. Meanwhile, programmable NICs become...
Application-layer overlay networks have recently emerged as a promising solution for live media multicast on the Internet. A tree is probably most natural structure overlay, but vulnerable in presence of dynamic end-hosts. Data-driven approaches form mesh out nodes to exchange data, which greatly enhances resilience. It however suffers from an efficiency-latency tradeoff, given that data be pulled neighbors with periodical notifications. In this paper, we suggest novel hybrid tree/mesh...
Clos-based networks including Fat-tree and VL2 are being built in data centers, but existing per-flow based routing causes low network utilization long latency tail. In this paper, by studying the structural properties of VL2, we propose a per-packet round-robin algorithm called Digit-Reversal Bouncing (DRB). DRB achieves perfect packet interleaving. Our analysis simulations show that, compared with random-based load-balancing algorithms, results smaller bounded queues even when traffic load...
This paper presents a systematic in-depth study on the existence, importance, and application of stable nodes in peer- to-peer live video streaming. Using traces from real large-scale system as well analytical models, we show that, while number is small throughout whole session, their longer lifespans make them constitute significant portion per-snapshot view peer-to-peer overlay. As result, they have substantially affected performance overall system. Inspired by this, propose tiered overlay...
As one of the fundamental infrastructures for cloud computing, data center networks (DCN) have recently been studied extensively. We currently use pure software-based systems, FPGA based platforms, e.g., NetFPGA, or OpenFlow switches, to implement and evaluate various DCN designs including topology design, control plane routing, congestion control. However, approaches suffer from high CPU overhead processing latency; platforms are difficult program incur cost; focuses on functions at...
There have been some serious concerns about the TCP performance in data center networks, including long completion time of short flows competition with flows, and congestion due to incast. In this paper, we show that a properly tuned instant queue length based Explicit Congestion Notification (ECN) at intermediate switches can alleviate both problems. Compared previous work, our approach is appealing as it be supported on current commodity simple parameter setting does not need any...
Recently, application-layer overlay networks have been suggested as a promising solution for live video streaming over the Internet. To organize multicast overlay, natural structure is tree, which, however, known vulnerable to end-hosts dynamics. Data-driven approaches address this problem by employing mesh structure, which enables data exchanges among multiple neighbors, and thus, greatly improves resilience. It unfortunately suffers from an efficiency-delay trade-off, because be pulled...
During recent years, the Internet has witnessed a rapid growth in deployment of data-driven (or swarming based) peer-to-peer (P2P) media streaming. In these applications, each node independently selects some other nodes as its neighbors (i.e. gossip-style overlay construction), and exchanges streaming data with scheduling). To improve performance such protocol, many existing works focus on construction issue. However, few them concentrate optimizing scheduling to maximize throughput...
Commodity switches are becoming increasingly important as they the basic building blocks for enterprise and data center networks. With availability of all-in-one switching ASICs, these almost universally adopt single ASIC design. However, such design also brings two major limitations, i.e, limited forwarding table flow-based scheme Openflow shallow buffer bursty traffic pattern. In this paper, we propose to use CPU in handle not only control plane but traffic. We show that can provide large...
The link speed in datacenters is growing fast, from 1Gbps to 100Gbps. However, the buffer size of commodity switches increases slowly, thus significantly outpaced by speed. In such extremely shallow-buffered datacenter networks, prior TCP/ECN solutions suffer either excessive packet losses or significant throughput degradation. Motivated this, we introduce BCC, a simple yet effective solution with only one more configuration (shared ECN/RED) at switches. BCC operates based on real-time...
Driven by explosive demand on computing power and slowdown of Moore's law, cloud providers have started to deploy FPGAs into datacenters for workload offloading acceleration. In this paper, we propose an operating system FPGA, called Feniks, facilitate large scale FPGA deployment in datacenters. XFeniks provides abstracted interface accelerators, so that developers can get rid underlying hardware details. addtion, Feniks also (1) development runtime environment accelerators share chip...
Ideally, minimizing the flow completion time (FCT) requires millions of priorities supported by underlying network so that each has its unique priority. However, in production datacenters, available switch priority queues for scheduling are very limited (merely 2 or 3). This practical constraint seriously degrades performance previous approaches. In this paper, we introduce Explicit Priority Notification (EPN), a novel mechanism which emulates fine-grained (i.e., desired DP) using only two...
This paper presents URSA, a hybrid block store that provides virtual disks for various applications to run efficiently on cloud VMs. Trace analysis shows the I/O patterns served by storage have limited locality exploit. Therefore, instead of using SSDs as cache layer, URSA proposes an SSD-HDD-hybrid structure directly stores primary replicas and replicates backup HDDs, journals bridge performance gap between HDDs. integrates with designs high reliability, scalability, availability....
The link speed in production datacenters is growing fast, from 1 Gbps to 40 or even 100 Gbps. However, the buffer size of commodity switches increases slowly, e.g., 4 MB at 16 Gbps, thus significantly outpaced by speed. In such extremely shallow-buffered networks, today's TCP/ECN solutions, as DCTCP, suffer either excessive packet losses significant throughput degradation. Motivated this, we introduce BCC, <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Sparsely-gated mixture-of-experts (MoE) has been widely adopted to scale deep learning models trillion-plus parameters with fixed computational cost. The algorithmic performance of MoE relies on its token routing mechanism that forwards each input the right sub-models or experts. While dynamically determines amount expert workload at runtime, existing systems suffer inefficient computation due their static execution, namely parallelism and pipelining, which does not adapt dynamic workload....
Abstract Background For a long time, the prevailing viewpoint suggests that shorter telomere contribute to chromosomal instability, which is shared characteristic of both aging and cancer. The newest research presented T cell immune deficiency rather than chromosome instability predisposes patients with short syndromes some cancers. However, relationship between genetically determined length (TL) cells remains unclear. Methods two‐sample Mendelian randomization analysis was conducted...