- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Low-power high-performance VLSI design
- Cloud Computing and Resource Management
- Distributed and Parallel Computing Systems
- Physical Unclonable Functions (PUFs) and Hardware Security
- Radiation Effects in Electronics
- Advanced Neural Network Applications
- Distributed systems and fault tolerance
- Adversarial Robustness in Machine Learning
- Security and Verification in Computing
- Domain Adaptation and Few-Shot Learning
- Semiconductor materials and devices
- Integrated Circuits and Semiconductor Failure Analysis
- Anomaly Detection Techniques and Applications
- Advanced Memory and Neural Computing
- Real-Time Systems Scheduling
- Advanced Image and Video Retrieval Techniques
- VLSI and Analog Circuit Testing
- Advanced Malware Detection Techniques
- Caching and Content Delivery
- Advancements in Semiconductor Devices and Circuit Design
- Cryptographic Implementations and Security
Technion – Israel Institute of Technology
2016-2025
Nanyang Technological University
2019-2022
Microsoft Research (United Kingdom)
2022
The University of Texas at Austin
2020
University of Lisbon
2020
Cornell University
2020
Taiwan Semiconductor Manufacturing Company (Taiwan)
2020
Korea Advanced Institute of Science and Technology
2020
Intel (United States)
2001-2020
Arizona State University
2020
The success of learning with noisy labels (LNL) methods relies heavily on the a warm-up stage where standard supervised training is performed using full (noisy) set. In this paper, we identify "warm-up obstacle": inability stages to train high quality feature extractors and avert memorization labels. We propose "Contrast Divide" (C2D), simple framework that solves problem by pre-training extractor in self-supervised fashion. Using boosts performance existing LNL approaches drastically...
In the past several decades, world of computers and especially that microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance decreasing costs, making them more affordable in turn, accelerating additional software hardware development fueled this process even more. The technology enabled exponential growth is a combination advancements technology, microarchitecture, architecture, design tools. While pace progress been quite impressive over last...
We study the tradeoffs between many-core machines like Intelpsilas Larrabee and many-thread Nvidia AMD GPGPUs. define a unified model describing superposition of two architectures, use it to identify operation zones for which each machine is more suitable. Moreover, we an intermediate zone in both deliver inferior performance. shape this ldquoperformance valleyrdquo provide insights on how can be avoided.
Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they looked up on every memory access, paramount performance scalability. The emergence of chip-multiprocessors (CMPs) with per-core TLBs, has brought the problem TLB coherence front stage. TLBs kept coherent at software-level by operating system (OS). Whenever OS modifies page permissions a table, it must initiate coherency transaction among process known shoot...
Abstract This article presents a taxonomy and represents repository of open problems in computing for numerically logically intensive number disciplines that have to synergize the best performance simulation-based feasibility studies on nature-oriented engineering general civil particular. Topics include but are not limited to: Nature-based construction, genomics supporting nature-based earthquake engineering, other types geophysical disaster prevention activities, as well processes...
The client computing platform is moving towards a heterogeneous architecture consisting of combination cores focused on scalar performance, and set throughput-oriented cores. throughput oriented (e.g. GPU) may be connected over both coherent non-coherent interconnects, have different ISAs. This paper describes programming model for such platforms. We discuss the language constructs, runtime implementation, memory environment. implemented this environment in x86 simulator. ported number...
Exponential growth of digital data has introduced massively-parallel systems, special orchestration layers, and new scale-out applications. While recent works suggest characteristics workloads are different from those traditional ones, their root causes not understood. Such understanding is extremely important to improve efficiency; even a 1% performance gain for core can have large impact on the datacenter as whole. This paper studies Big Data Analytics (BDA) workload modern cloud server....
This study introduces a novel, practical approach for designing hierarchical online anomaly detection system industrial cyber-physical systems. The proposed method utilizes the Hierarchical Temporal Memory (HTM) unsupervised learning algorithm, which requires data to be encoded as sparse binary distributed representations (SDRs). A new SDR encoding termed temporal sequence encoder (TSSE) is presented convert outputs into SDRs. enables HTM retain high memory capacity and robust performance...
Jailbreak attacks aim to exploit large language models (LLMs) and pose a significant threat their proper conduct; they seek bypass models' safeguards often provoke transgressive behaviors. However, existing automatic jailbreak require extensive computational resources are prone converge on suboptimal solutions. In this work, we propose \textbf{C}ompliance \textbf{R}efusal \textbf{I}nitialization (CRI), novel, attack-agnostic framework that efficiently initializes the optimization in...
This paper explores the possibility of using program profiling to enhance efficiency value prediction. Value prediction attempts eliminate true-data dependencies by predicting outcome values instructions at run-time and executing dependent based on that So far, all published papers in this area have examined hardware-only mechanisms. In order prediction, it is proposed employ collect information describes tendency a be value-predictable. The compiler acts as mediator can pass...
This article presents an experimental and analytical study of value prediction its impact on speculative execution in superscalar microprocessors. Value is a new paradigm that suggests predicting outcome values operations (at run-time ) using these predicted to trigger the true-data-dependent speculatively. As result, stals memory locations can be reduced amount instruction-level parallelism extended beyond limits program's dataflow graph. examines characteristics concept from two...
We present a novel method for neural network quantization. Our method, named UNIQ , emulates non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise at training time. As by-product of weights, we find that activations can also be as low 8-bit only minor accuracy degradation. quantization approach provides alternative existing uniform techniques networks. further propose complexity metric number bit operations performed (BOPs), show...
A detailed analysis of power consumption at low system levels becomes important as a means for reducing the overall and its thermal hot spots. This work presents new estimation method that allows understanding breakdown an application when running on modern processor architecture such newly released Intel Skylake processor. also provides performance characterization report SPEC CPU2006 benchmarks, data using side-by-side breakdowns, well few interesting case studies.
Abstract This article describes a teaching strategy that synergizes computing and management, aimed at the running of complex projects in industry academia, areas civil engineering, physics, geosciences, number other related fields. The course derived from this includes four parts: (a) Computing with selected set modern paradigms—the stress is on Control Flow Data paradigms, but paradigms conditionally referred to as Energy Diffusion are also covered; (b) Project management holistic—the wide...
The EARtH algorithm finds the optimal voltage and frequency operational point of processor in order to achieve minimum energy computing platform. is based on a theoretical model employing small number parameters, which are extracted from real systems using off-line run-time methods. have been validated 45nm, 32nm 22nm Intel® Core processors. can save up 44% compared with commonly used fixed policies.
Deep neural networks (DNNs) are used by different applications that executed on a range of computer architectures, from IoT devices to supercomputers. The footprint these is huge as well their computational and communication needs. In order ease the pressure resources, research indicates in many cases low precision representation (1-2 bit per parameter) weights other parameters can achieve similar accuracy while requiring less resources. Using quantized values enables use FPGAs run NNs,...
The need to reduce power and complexity will increase the interest in switch on event multithreading (coarse grained multithreading). Switch is a low mechanism improve processor throughput by switching threads execution stalls. Fairness may, however, become problem multithreaded processor. Unless fairness properly handled, some may starve while others consume all of cycles. Heuristics that were devised order simultaneous are not applicable multithreading. This paper defines metric using...
Power and thermal are major constraints for delivering compute performance in high-end CPU expected to be so the future. CMP is becoming important by more within power constraints. Dynamic Voltage Frequency Scaling (DVFS) has been studied past work as a mean increase save improving overall processor's while meeting total and/or For such systems, delivery limitations significant practical design consideration, unfortunately this aspect of was almost ignored many research works. This paper...