- Cloud Computing and Resource Management
- Network Security and Intrusion Detection
- Distributed and Parallel Computing Systems
- Privacy-Preserving Technologies in Data
- Software-Defined Networks and 5G
- Internet Traffic Analysis and Secure E-voting
- Network Traffic and Congestion Control
- Stochastic Gradient Optimization Techniques
- Sparse and Compressive Sensing Techniques
- Advanced Queuing Theory Analysis
- Distributed Sensor Networks and Detection Algorithms
- Anomaly Detection Techniques and Applications
- Software System Performance and Reliability
- Advanced Optical Network Technologies
- Advanced Wireless Network Optimization
- Distributed systems and fault tolerance
- Network Packet Processing and Optimization
- Data Stream Mining Techniques
- Cryptography and Data Security
- Parallel Computing and Optimization Techniques
- Age of Information Optimization
- Interconnection Networks and Systems
- Advanced Malware Detection Techniques
- Advanced MIMO Systems Optimization
- IoT and Edge/Fog Computing
Broadcom (Israel)
2023-2024
Technion – Israel Institute of Technology
2016-2022
Herzliya Medical Center
2020-2022
Kitware (United States)
2021
IBM Research - Haifa
2016-2017
We present dRMT (disaggregated Reconfigurable Match-Action Table), a new architecture for programmable switches. overcomes two important restrictions of RMT, the predominant pipeline-based switches: (1) table memory is local to an RMT pipeline stage, implying that not used by one stage cannot be reclaimed another, and (2) hardwired always sequentially execute matches followed actions as packets traverse stages. show these make it difficult programs efficiently on RMT.
Machine learning is widely used to solve networking challenges, ranging from traffic classification and anomaly detection network configuration. However, machine also requires significant processing often increases the load on both networks servers. The introduction of in-network computing, enabled by programmable devices, has allowed run applications within network, providing higher throughput lower latency. Soon after, solutions started emerge, enabling functionality itself. This survey...
The soaring use of machine learning leads to increasing processing demands. As data volume keeps growing, providing classification services with good performance, high throughput, low latency, and minimal equipment overheads becomes a challenge. Offloading tasks network switches can be scalable solution this problem, throughput latency. However, devices are resource constrained, lack support for functionality. In paper, we introduce IIsy -a novel mapping tool models off-the-shelf switches....
New congestion control algorithms are rapidly improving datacenters by reducing latency, overcoming incast, increasing throughput and fairness. Ideally, the operating system in every server virtual machine is updated to support new algorithms. However, legacy applications often cannot be upgraded a version, which means advances off-limits them. Worse, as we show, can squeezed out, worst case prevents entire network from adopting
Using programmable network devices to aid in-network machine learning has been the focus of significant research. However, most research was a limited scope, providing proof concept or describing closed-source algorithm. To date, no general solution provided for mapping algorithms devices. In this paper, we present Planter, an open-source, modular framework trained models Planter supports wide range models, multiple targets and can be easily extended. The evaluation compares different...
The rat race between user-generated data and data-processing systems is currently won by data. increased use of machine learning leads to further increase in processing requirements, while volume keeps growing. To win the race, needs be applied as it goes through network. In-network classification can reduce load on servers, response time scalability. In this paper, we introduce IIsy, implementing models a hybrid fashion using off-the-shelf network devices. IIsy targets three main challenges...
In-network machine learning inference provides high throughput and low latency. It is ideally located within the network, power efficient, improves applications' performance. Despite its advantages, bar to in-network research high, requiring significant expertise in programmable data planes, addition knowledge of application area. Existing solutions are mostly one-time efforts, hard reproduce, change, or port across platforms. In this paper, we present Planter: a modular efficient...
Disaggregated Large Language Model (LLM) inference has gained popularity as it separates the computation-intensive prefill stage from memory-intensive decode stage, avoiding prefill-decode interference and improving resource utilization. However, transmitting Key-Value (KV) data between two stages can be a bottleneck, especially for long prompts. Additionally, computation time overhead is key optimizing Job Completion Time (JCT), KV size become prohibitive prompts sequences. Existing...
Due to the large data volume and number of distinct elements, space is often bottleneck many stream processing systems. The structures used by these systems consist counters whose optimization yields significant memory savings. challenge lies in balancing size counters: too small, they overflow; large, capacity limits their number. In this work, we suggest an efficient encoding scheme that sizes each counter according its needs. Our approach uses fixed-sized pools (e.g., a single word or 64...
Secure aggregation is commonly used in federated learning (FL) to alleviate privacy concerns related the central aggregator seeing all parameter updates clear. Unfortunately, most existing secure schemes ignore two critical orthogonal research directions that aim (i) significantly reduce client-server communication and (ii) mitigate impact of malicious clients. However, both these additional properties are essential facilitate cross-device FL with thousands or even millions (mobile)...
Cloud operators require real-time identification of Heavy Hitters (HH) and Hierarchical (HHH) for applications such as load balancing, traffic engineering, attack mitigation. However, existing techniques are slow in detecting new heavy hitters.
Nowadays, the efficiency and even feasibility of traditional load-balancing policies are challenged by rapid growth cloud infrastructure increasing levels server heterogeneity. In such heterogeneous systems with many loadbalancers, solutions, as JSQ, incur a prohibitively large communication overhead detrimental incast effects due to herd behavior. Alternative low-communication policies, JSQ(d) recently proposed JIQ, either unstable or provide poor performance. We introduce Local Shortest...
Counters are the fundamental building block of many data sketching schemes, which hash items to a small number counters and account for collisions provide good approximations frequencies other measures. Most existing methods rely on fixed-size counters, may be wasteful in terms space, as must large enough eliminate any risk overflow. Instead, some solutions use small, that overflow into secondary structures.This paper takes different approach. We propose simple general method called SALSA...
Consistent hashing is a central building block in many networking applications, such as maintaining connection affinity of TCP flows. However, current consistent solutions do not ensure full consistency under arbitrary changes or scale poorly terms memory footprint, update time and key lookup complexity. We present AnchorHash, scalable fully-consistent algorithm. AnchorHash achieves high rate, low footprint time. formally establish its strong theoretical guarantees, an advanced...
Direct memory access (DMA) renders a system vulnerable to DMA attacks, in which I/O devices regions not intended for their use. Hardware input-output management units (IOMMU) can be used provide protection. However, an IOMMU cannot prevent all attacks because it only restricts at page-level granularity, leading sub-page vulnerabilities.
Counters are a fundamental building block for networking applications such as load balancing, traffic engineering, and intrusion detection, which require estimating flow sizes identifying heavy hitter flows. Existing works suggest replacing counters with shorter multiplicative error estimators that improve the accuracy by fitting more of them within given space. However, impose computational overhead degrades measurement throughput. Instead, we propose additive estimators, simpler, faster,...
Integrating optical circuit switches in data-centers is an on-going research challenge. In recent years, state-of-the-art solutions introduce hybrid packet/circuit architectures for different switch technologies, control techniques, and traffic rerouting methods. These are based on separated packet planes which do not have the ability to utilize with flows that arrive from or delivered directly connected circuit's end-points. Moreover, current SDN-based elephant flow methods require a...
Backpressure schemes are known to stabilize stochastic networks through the use of congestion gradients in routing and resource allocation decisions. Nonetheless, these share a significant drawback, namely, delay guarantees obtained only terms average values. As result, arbitrary packets may never reach their destination due both starvation last-packet problems. These problems occur because backpressure schemes, packet scheduling needs subsequent stream produce required gradient for...
Hybrid switching combines a high-bandwidth optical circuit switch in parallel with low-bandwidth electronic packet switch. It presents an appealing solution for scaling datacenter architectures. Unfortunately, it does not fit many traffic patterns produced by typical applications, and particular the skewed that involve highly intensive one-to-many many-to-one communications.
A parallel server system is considered in which a dispatcher routes incoming jobs to fixed number of heterogeneous servers, each with its own queue. Much effort has been previously made design policies that use limited state information (e.g., the queue lengths small subset set or identity idle servers). However, existing either do not achieve stability region perform poorly terms job completion time. We introduce Persistent-Idle (PI), new, perhaps counterintuitive, load-distribution policy...
Cloud operators require timely identification of Heavy Hitters (HH) and Hierarchical (HHH) for applications such as load balancing, traffic engineering, attack mitigation. However, existing techniques are slow in detecting new heavy hitters. In this paper, we present the case identifying hitters through <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sliding windows</i> . Sliding windows quicker more accurate to detect than current...