- Cloud Computing and Resource Management
- Software System Performance and Reliability
- Software-Defined Networks and 5G
- Advanced Memory and Neural Computing
- Cloud Data Security Solutions
- Interconnection Networks and Systems
- Semiconductor materials and devices
- Parallel Computing and Optimization Techniques
Duke University
2023-2024
Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information each remote procedure, and compiler generates stub code linked into application to marshal unmarshal arguments message buffers. Increasingly, however, service operations teams need high degree of visibility control over the flow RPCs between services, leading many installations use sidecars or mesh proxies manageability policy flexibility. These typically involve inspection...
High-speed RDMA networks are getting rapidly adopted in the industry for their low latency and reduced CPU overheads. To verify that can be used production, system administrators need to understand set of application workloads potentially trigger abnormal performance behaviors (e.g., unexpected throughput, PFC pause frame storm). We design implement Collie, a tool users systematically uncover anomalies subsystems without access hardware internal designs. Instead individually testing each...
Intra-host networks, including heterogeneous devices and interconnect fabrics, have become increasingly complex crucial. However, intra-host networks today do not provide sufficient manageability. This prevents data center operators from running a reliable efficient end-to-end network, especially for multi-tenant clouds. In this paper, we analyze the main manageability deficiencies of argue that systematic solution should be implemented to bridge function gap. We propose two key building...