- Cloud Computing and Resource Management
- Software-Defined Networks and 5G
- Parallel Computing and Optimization Techniques
- Network Traffic and Congestion Control
- Peer-to-Peer Network Technologies
- Distributed and Parallel Computing Systems
- Interconnection Networks and Systems
- Advanced Optical Network Technologies
- Privacy-Preserving Technologies in Data
- Advanced Data Storage Technologies
- Caching and Content Delivery
- IoT and Edge/Fog Computing
- Cooperative Communication and Network Coding
- Distributed Sensor Networks and Detection Algorithms
Hong Kong University of Science and Technology
2016-2018
University of Hong Kong
2016-2018
Fair and efficient coflow scheduling improves application-level networking performance in today's datacenters. Ideally, a scheduler should provide isolation guarantees on the minimum progress to achieve predictable performance. Network operators, other hand, strive decrease average completion time (CCT). Unfortunately, optimal CCT are conflicting objectives cannot be achieved at same time. Existing schedulers either optimize expense of long CCTs (e.g., HUG [1]), or without Varys Aalo [2],...
BBR is a new congestion-based congestion control algorithm proposed by Google. A flow sequentially measures the bottleneck bandwidth and round-trip delay of network pipe, uses measured results to govern its sending behavior, maximizing delivery while minimizing delay. However, our deployment in geo-distributed cloud servers reveals severe RTT fairness problem: with longer dominates competing shorter RTT. Somewhat surprisingly, on Internet an in-house cluster unearthed consistent disparity...
Even with the recent proliferation of in-memory computation in data-parallel frameworks (such as Spark), transfers over network are still time-consuming. Similar to computation, serve main roadblocks we try minimize job completion times. Existing schedulers were designed isolated solutions that focused on or performance only. Without any coordination, utilization and resources may become unbalanced, leading a reduced level overall resource utilization. In this paper, design, implement,...
BBR is a congestion-based congestion control algorithm recently proposed by Google. It proactively measures the bottleneck bandwidth and round trip times (RTTs) of connection pipe, based on which it governs its sending behaviors. Despite significant throughput gains latency reduction, some experimental studies reveal that may result in salient RTT-fairness problem, short-RTT flows can be starved allocation when comnetina with lons-R'I'T flows. In this paper, we study BBR's problem from...
Guaranteed performance for data-parallel applications is important both service providers and cloud data centers that host such services. A job of involves communication among multiple machines to transmit intermediate results. Such comprises a collection parallel flows, which abstracted as coflow in recent proposals. In this paper, we study the problem meeting deadlines coflows center networks. Existing flow-level scheduling schemes are insufficient guarantee coflow-level performance, since...
Tasks in a data-parallel job communicate with each other through number of concurrent flows, which is described as coflow. These flows are correlated the sense that performance coflow dictated by flow takes longest time to complete. Minimizing completion times, however, turns out be challenge, given correlation across and how they routed collectively datacenter network. In this paper, we propose Tailor, simple yet effective mechanism objective trimming times To achieve our objective, Tailor...
Link utilization has received extensive attention since data centers become the most pervasive platform for data-parallel applications. A specific job of such applications involves communication among multiple machines. The recently proposed coflow abstraction depicts through a group parallel flows, and captures application performance corresponding requirements. Existing techniques to improve link utilization, however, either restrict themselves achieving work conservation, or merely focus...
Data-parallel applications, especially those associated with user-facing web services, have struggled to enhance their worst case performance. It is therefore important improve the minimum amount of resources guaranteed for applications in a cluster. Existing cluster management frameworks, however, provide isolation computation (such as CPU) only, and are oblivious network guarantees. In this paper, we design, implement evaluate Libra, new framework that helps maximize guarantee bandwidth...
With the advent of big data processing frameworks, performance data-parallel applications is heavily affected by time it takes to read input data, making important improve locality. Existing methods in achieving locality have primarily focused on selecting machines place tasks applications. Nevertheless, set that an application can choose from determined a cluster manager, which oblivious location existing resource sharing frameworks. In this paper, we design, implement and evaluate Custody,...
User selection has become crucial for decreasing the communication costs of federated learning (FL) over wireless networks. However, centralized user causes additional system complexity. This study proposes a network intrinsic approach distributed that leverages radio resource competition mechanism in random access. Taking carrier sensing multiple access (CSMA) as an example access, we manipulate contention window (CW) size to prioritize certain users obtaining resources each round training....
Link utilization has received extensive attention since datacenters become the most prevalent platform for data-parallel computing applications. A specific job of such applications involves communication among multiple machines. The coflow abstraction depicts and captures application performance through corresponding network requirements. Existing techniques to improve link utilization, however, either restrict themselves work conservation, or merely focus on flow-level metrics ignore...