- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Security and Verification in Computing
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Advanced Malware Detection Techniques
- Software-Defined Networks and 5G
- Distributed systems and fault tolerance
- Caching and Content Delivery
- Advanced Neural Network Applications
- Network Packet Processing and Optimization
- Interconnection Networks and Systems
- Cloud Data Security Solutions
- Physical Unclonable Functions (PUFs) and Hardware Security
- Advanced Memory and Neural Computing
- Embedded Systems Design Techniques
- Network Security and Intrusion Detection
- Algorithms and Data Compression
- Advanced Image and Video Retrieval Techniques
- Radiation Effects in Electronics
- Tensor decomposition and applications
- Adversarial Robustness in Machine Learning
- Cryptographic Implementations and Security
- Gene expression and cancer classification
- Genomics and Rare Diseases
Technion – Israel Institute of Technology
2016-2025
Elkhorn Slough Foundation
2015
The University of Texas at Austin
2012-2014
Bnai Zion Medical Center
2012
University of Ottawa
2012
Israel Institute
2012
IBM Research - Tokyo
2002
We propose a new set of OS abstractions to support GPUs and other accelerator devices as first class computing resources. These abstractions, collectively called the PTask API, dataflow programming model. Because graph consists OS-managed objects, kernel has sufficient visibility control provide system-wide guarantees like fairness performance isolation, can streamline data movement in ways that are impossible under current GPU models.
Intel Software Guard extensions (SGX) enable secure and trusted execution of user code in an isolated enclave to protect against a powerful adversary. Unfortunately, running I/O-intensive, memory-demanding server applications enclaves leads significant performance degradation. Such put substantial load on the in-enclave system call paging mechanisms, which turn out be main reason for application slowdown. In addition high direct cost thousands-of-cycles long SGX management instructions,...
PU hardware is becoming increasingly general purpose, quickly outgrowing the traditional but constrained GPU-as-coprocessor programming model. To make GPUs easier to program and integrate with existing systems, we propose making host's file system directly accessible from GPU code. GPUfs provides a POSIX-like API for programs, exploits parallelism efficiency, optimizes access by extending buffer cache into memory. Our experiments, based on set of real benchmarks adopted use our system,...
We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped close-to-ALU software-managed memory. The approach is based the efficient use of this memory through implementation cache. also an analytical model performance analysis such algorithms.
Erasure coding schemes provide higher durability at lower storage cost, and thus constitute an attractive alternative to replication in distributed systems, particular for storing rarely accessed "cold" data. These schemes, however, require order of magnitude recovery bandwidth maintaining a constant level the face node failures. In this paper we propose lazy recovery, technique reduce demands down replicated storage. The key insight is that careful adjustment rate substantially reduces...
Despite the popularity of GPUs in high-performance and scientific computing, despite increasingly general-purpose hardware capabilities, use network servers or distributed systems poses significant challenges.GPUnet is a native GPU networking layer that provides socket abstraction high-level APIs for programs. We GPUnet to streamline development high-performance, applications like in-GPU-memory MapReduce new class low-latency, high-throughput GPU-native services such as face verification server.
As GPU hardware becomes increasingly general-purpose, it is quickly outgrowing the traditional, constrained GPU-as-coprocessor programming model. This article advocates for extending standard operating system services and abstractions to GPUs in order facilitate program development enable harmonious integration of computing systems. an example, we describe design implementation GPUFs, a software layer which provides support accessing host files directly from programs. GPUFs POSIX-like API,...
This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve performance efficiency hardware-accelerated computing services in data centers.
Many scientists perform extensive computations by executing large bags of similar tasks (BoTs) in mixtures computational environments, such as grids and clouds. Although the reliability cost may vary considerably across these no tool exists to assist selection environments that can both fulfill deadlines fit budgets. To address this situation, we introduce Expert BoT scheduling framework. Our framework systematically selects from a search space Pareto-efficient strategies, is, strategies...
Abstract Motivation: The use of dense single nucleotide polymorphism (SNP) data in genetic linkage analysis large pedigrees is impeded by significant technical, methodological and computational challenges. Here we describe Superlink-Online SNP, a new powerful online system that streamlines the SNP data. It features fully integrated flexible processing workflow comprising both well-known novel tools, including clustering, erroneous filtering, exact approximate LOD calculations...
GPUs have become an integral part of modern systems, but their implications for system security are not yet clear. This paper demonstrates both that discrete cannot be used as secure co-processors and provide a stealthy platform malware. First, we examine recent proposal to use show the guarantees proposed do hold on investigate. Second, demonstrate (under certain circumstances) it is possible bypass IOMMU protections create stealthy, long-lived GPU-based We novel attack compromises...
We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses (RDMA) across the network directly from GPU kernels. The executes no code on CPU, accessing Host Channel Adapter (HCA) Infiniband hardware both control and data. Slow single-thread performance intricacies of GPU-to-network adapter interaction pose significant challenge. describe several design options analyze their implications in detail.
We introduce ZNSwap , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. leverages ZNS’s explicit control over data management on drive and introduces space-efficient host-side Garbage Collector (GC) storage co-designed with OS logic. enables cross-layer optimizations, such as direct access to in-kernel usage statistics by GC enable fine-grain management, correct accounting of bandwidth in resource isolation mechanisms improve performance multi-tenant environments....
We present a holistic approach for efficient execution of bags-of-tasks (BOTs) on multiple grids, clusters, and volunteer computing grids virtualized as single platform. The challenge is twofold: to assemble this compound environment employ it mixture throughput- performance-oriented BOTs, with dozen millions tasks each. Our generic mechanism allows per BOT specification dynamic arbitrary scheduling replication policies function the system state, priority.
Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor pose two conflicting challenges: short per-stream latency -- to satisfy the milliseconds-long, hard real-time constraints of each stream, high throughput enable efficient many streams possible. High-throughput programmable accelerators modern GPUs hold potential speed up computations. However, their use for is complicated by slow communications with CPUs, variable changing non-linearly input...
Modern discrete GPUs have been the processors of choice for accelerating compute-intensive applications, but using them in large-scale data processing is extremely challenging. Unfortunately, they do not provide important I/O abstractions long established CPU context, such as memory mapped files, which shield programmers from complexity buffer and device management. However, implementing these on poses a problem: limited GPU virtual system provides no address space management page fault...
Speculative vulnerabilities such as Spectre and Meltdown expose speculative execution state that can be exploited to leak information across security domains via side-channels. Such often stay undetected for a long time we lack the tools systematic testing of CPUs find them.
Disaggregated heterogeneous data centers promise higher efficiency, lower total costs of ownership, and more flexibility for data-center operators. However, current software stacks can levy a high tax on application performance. Applications OSes are designed systems where local PCIe-connected devices centrally managed by CPUs, but this centralization introduces unnecessary messages through the shared network in disaggregated system.