- Software-Defined Networks and 5G
- Network Traffic and Congestion Control
- Cloud Computing and Resource Management
- Interconnection Networks and Systems
- Caching and Content Delivery
- Distributed systems and fault tolerance
- Advanced Optical Network Technologies
- Blockchain Technology Applications and Security
- Parallel Computing and Optimization Techniques
- Data Management and Algorithms
- Advanced Data Storage Technologies
- Wireless Networks and Protocols
- Peer-to-Peer Network Technologies
- Advanced Database Systems and Queries
- Data Stream Mining Techniques
- Software System Performance and Reliability
- Internet Traffic Analysis and Secure E-voting
- Advanced Wireless Network Optimization
- IoT and Edge/Fog Computing
- Machine Learning and Algorithms
- Image and Video Quality Assessment
- Reinforcement Learning in Robotics
- Machine Learning and Data Classification
- Video Coding and Compression Technologies
- Optimization and Search Problems
Amirkabir University of Technology
2024-2025
Massachusetts Institute of Technology
2015-2025
IIT@MIT
2022-2024
Hebrew University of Jerusalem
2024
Tarbiat Modares University
2023
University of California, Berkeley
2022
Berkeley College
2022
Moscow Institute of Thermal Technology
2016-2019
Cisco Systems (United States)
2014
Cisco Systems (China)
2014
Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments lead to high application latencies, rooted in TCP's demands on the limited buffer space available center switches. For example, bandwidth hungry "background" flows build up queues at...
In this paper we present pFabric, a minimalistic datacenter transport design that provides near theoretically optimal flow completion times even at the 99th percentile for short flows, while still minimizing average time long flows. Moreover, pFabric delivers performance with very simple is based on key conceptual insight: should decouple scheduling from rate control. For scheduling, packets carry single priority number set independently by each flow; switches have small buffers and...
We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including use regular Clos topologies overlays network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, allocates flowlets to paths based feedback from remote switches. This enables efficiently balance seamlessly handle asymmetry, without requiring any...
Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments lead to high application latencies, rooted in TCP's demands on the limited buffer space available center switches. For example, bandwidth hungry "background" flows build up queues at...
Congestion control (CC) is the key to achieving ultra-low latency, high bandwidth and network stability in high-speed networks. From years of experience operating large-scale RDMA networks, we find existing CC schemes have inherent limitations for reaching these goals. In this paper, present HPCC (High Precision Control), a new mechanism which achieves three goals simultaneously. leverages in-network telemetry (INT) obtain precise link load information controls traffic precisely. By...
Homa is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially workloads with high volume of very short messages, and it also supports large messages network utilization. uses in-network priority queues to ensure latency messages; allocation managed dynamically by each receiver integrated receiver-driven flow control mechanism. controlled overcommitment downlinks efficient bandwidth utilization at load. Our implementation delivers 99th percentile...
In this paper we present pFabric, a minimalistic datacenter transport design that provides near theoretically optimal flow completion times even at the 99th percentile for short flows, while still minimizing average time long flows. Moreover, pFabric delivers performance with very simple is based on key conceptual insight: should decouple scheduling from rate control. For scheduling, packets carry single priority number set independently by each flow; switches have small buffers and...
Many algorithms for congestion control, scheduling, network measurement, active queue management, and traffic engineering require custom processing of packets in the data plane a switch. To run at line rate, these data-plane must be implemented hardware. With today's switch hardware, cannot changed, nor new installed, after has been built.
Network performance monitoring today is restricted by existing switch support for measurement, forcing operators to rely heavily on endpoints with poor visibility into the network core. Switch vendors have added progressively more features switches, but current trajectory of adding specific unsustainable given ever-changing demands operators. Instead, we ask what hardware primitives are required an expressive language questions. We believe that resulting design could address a wide variety...
Switches today provide a small menu of scheduling algorithms. While we can tweak parameters, cannot modify algorithmic logic, or add completely new algorithm, after the switch has been designed. This paper presents design for {\em programmable} packet scheduler, which allows algorithms---potentially algorithms that are unknown today---to be programmed into without requiring hardware redesign.
Query optimization remains one of the most challenging problems in data management systems. Recent efforts to apply machine learning techniques query challenges have been promising, but shown few practical gains due substantive training overhead, inability adapt changes, and poor tail performance. Motivated by these difficulties drawing upon a long history research multi-armed bandits, we introduce Bao (the BAndit Optimizer). takes advantage wisdom built into existing optimizers providing...
We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including use regular Clos topologies overlays network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, allocates flowlets to paths based feedback from remote switches. This enables efficiently balance seamlessly handle asymmetry, without requiring any...
Cloud computing, social networking and information networks (for search, news feeds, etc) are driving interest in the deployment of large data centers. TCP is dominant Layer 3 transport protocol these networks. However, operating conditions---very high bandwidth links, low round-trip times, small-buffered switches---and traffic patterns cause to perform very poorly. The Data Center (DCTCP) algorithm has recently been proposed as a variant for centers addresses shortcomings.
We present dRMT (disaggregated Reconfigurable Match-Action Table), a new architecture for programmable switches. overcomes two important restrictions of RMT, the predominant pipeline-based switches: (1) table memory is local to an RMT pipeline stage, implying that not used by one stage cannot be reclaimed another, and (2) hardwired always sequentially execute matches followed actions as packets traverse stages. show these make it difficult programs efficiently on RMT.
This paper presents a practical approach to rapidly introducing new dataplane functionality into networks: End-hosts embed tiny programs packets actively query and manipulate network's internal state. We show how this "tiny packet program" (TPP) interface gives end-hosts unprecedented visibility network behavior, enabling them work with the achieve desired functionality. Our design leverages what each component does best: (a) switches forward execute (at most 5~instructions) in-band at line...
Modern data center networks must support a multitude of diverse and demanding workloads at low cost even the most simple architectural choices can impact mission-critical application performance. This forces network architects to continually evaluate tradeoffs between ideal designs pragmatic, effective solutions. In real commercial environments number parameters that architect control is fairly limited typically includes only choice topology, link speeds, over subscription, switch buffer...
Query optimization is one of the most challenging problems in database systems. Despite progress made over past decades, query optimizers remain extremely complex components that require a great deal hand-tuning for specific workloads and datasets. Motivated by this shortcoming inspired recent advances applying machine learning to data management challenges, we introduce Neo ( Neural Optimizer ), novel learning-based optimizer relies on deep neural networks generate executions plans....
Filtering data based on predicates is one of the most fundamental operations for any modern warehouse. Techniques to accelerate execution filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional and, high selectivity queries, secondary indexes. However, these schemes are hard tune and their performance inconsistent. Recent work learned indexes has introduced idea automatically optimizing an index a particular dataset workload. that suffers in...
Query driven cardinality estimation models learn from a historical log of queries. They are lightweight, having low storage requirements, fast inference and training, easily adaptable for any kind query. Unfortunately, such can suffer unpredictably bad performance under workload drift, i.e., if the query pattern or data changes. This makes them unreliable hard to deploy. We analyze reasons why become unpredictable due introduce modifications representation neural network training techniques...
Data Center Networks present a novel, unique and rich environment for algorithm development deployment. Projects are underway in the IEEE 802.1 standards body, especially Bridging Task Group, to define new switched Ethernet functions data center use. One such project is 802.1Qau, Congestion Notification project, whose aim develop an congestion control hardware implementation. A major contribution of this paper description analysis - QCN, Quantized Notification- which has been developed...
Data Center Networks represent the convergence of computing and networking, data storage networks, packet transport mechanisms in Layers 2 3. Congestion control algorithms are a key component this type network. Recently, Layer congestion management algorithm, called QCN (Quantized Notification), has been adopted for IEEE 802.1 Bridging standard: 802.1Qau. The algorithm designed to be stable, responsive, simple implement. However, it does not provide weighted fairness, where weights can set...
We present xFabric, a novel datacenter transport design that provides flexible and fast bandwidth allocation control. xFabric is flexible: it enables operators to specify how allocated amongst contending flows optimize for different service-level objectives such as minimizing flow completion times, weighted allocations, notions of fairness, etc. also very fast, converges the specified one-to-two order magnitudes faster than prior schemes. Underlying distributed algorithm uses in-network...