NFDI4DS | UHH-SEMS - Publication Details

Avi Mendelson

ORCID: 0000-0003-4274-6866

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5089135250

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Embedded Systems Design Techniques
Interconnection Networks and Systems
Low-power high-performance VLSI design
Cloud Computing and Resource Management
Distributed and Parallel Computing Systems
Physical Unclonable Functions (PUFs) and Hardware Security
Radiation Effects in Electronics
Advanced Neural Network Applications
Distributed systems and fault tolerance
Adversarial Robustness in Machine Learning
Security and Verification in Computing
Domain Adaptation and Few-Shot Learning
Semiconductor materials and devices
Integrated Circuits and Semiconductor Failure Analysis
Anomaly Detection Techniques and Applications
Advanced Memory and Neural Computing
Real-Time Systems Scheduling
Advanced Image and Video Retrieval Techniques
VLSI and Analog Circuit Testing
Advanced Malware Detection Techniques
Caching and Content Delivery
Advancements in Semiconductor Devices and Circuit Design
Cryptographic Implementations and Security

Technion – Israel Institute of Technology
2016-2025

Nanyang Technological University
2019-2022

Microsoft Research (United Kingdom)
2022

The University of Texas at Austin
2020

University of Lisbon
2020

Cornell University
2020

Taiwan Semiconductor Manufacturing Company (Taiwan)
2020

Korea Advanced Institute of Science and Technology
2020

Intel (United States)
2001-2020

Arizona State University
2020

Loss aware post-training quantization

OPENALEX - Publications

Yury Nahshan Brian Chmiel Chaim Baskin Evgenii Zheltonozhskii Ron Banner and 2 more

10.1007/s10994-021-06053-z article EN Machine Learning 2021-10-01

Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

OPENALEX - Publications

Evgenii Zheltonozhskii Chaim Baskin Avi Mendelson Alex Bronstein Or Litany

The success of learning with noisy labels (LNL) methods relies heavily on the a warm-up stage where standard supervised training is performed using full (noisy) set. In this paper, we identify "warm-up obstacle": inability stages to train high quality feature extractors and avert memorization labels. We propose "Contrast Divide" (C2D), simple framework that solves problem by pre-training extractor in self-supervised fashion. Using boosts performance existing LNL approaches drastically...

10.1109/wacv51458.2022.00046 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Coming challenges in microarchitecture and architecture

OPENALEX - Publications

Ronny Ronen Avi Mendelson K. Lai Shih‐Lien Lu Fred J. Pollack and 1 more

In the past several decades, world of computers and especially that microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance decreasing costs, making them more affordable in turn, accelerating additional software hardware development fueled this process even more. The technology enabled exponential growth is a combination advancements technology, microarchitecture, architecture, design tools. While pace progress been quite impressive over last...

10.1109/5.915377 article EN Proceedings of the IEEE 2001-03-01

Many-Core vs. Many-Thread Machines: Stay Away From the Valley

OPENALEX - Publications

Zvika Guz Evgeny Bolotin Idit Keidar Avinoam Kolodny Avi Mendelson and 1 more

We study the tradeoffs between many-core machines like Intelpsilas Larrabee and many-thread Nvidia AMD GPGPUs. define a unified model describing superposition of two architectures, use it to identify operation zones for which each machine is more suitable. Moreover, we an intermediate zone in both deliver inferior performance. shape this ldquoperformance valleyrdquo provide insights on how can be avoided.

10.1109/l-ca.2009.4 article EN IEEE Computer Architecture Letters 2009-01-01

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

OPENALEX - Publications

Carlos Villavieja Vasileios Karakostas Lluís Vilanova Yoav Etsion Alex Ramírez and 4 more

Translation Look aside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they looked up on every memory access, paramount performance scalability. The emergence of chip-multiprocessors (CMPs) with per-core TLBs, has brought the problem TLB coherence front stage. TLBs kept coherent at software-level by operating system (OS). Whenever OS modifies page permissions a table, it must initiate coherency transaction among process known shoot...

10.1109/pact.2011.65 article EN International Conference on Parallel Architectures and Compilation Techniques 2011-10-01

Research in computing-intensive simulations for nature-oriented civil-engineering and related scientific fields, using machine learning and big data: an overview of open problems

OPENALEX - Publications

Zoran Babović Branislav Bajat Vladan Đokić Filip Đorđević Dražen Drašković and 27 more

Abstract This article presents a taxonomy and represents repository of open problems in computing for numerically logically intensive number disciplines that have to synergize the best performance simulation-based feasibility studies on nature-oriented engineering general civil particular. Topics include but are not limited to: Nature-based construction, genomics supporting nature-based earthquake engineering, other types geophysical disaster prevention activities, as well processes...

10.1186/s40537-023-00731-6 article EN cc-by Journal Of Big Data 2023-05-22

Programming model for a heterogeneous x86 platform

OPENALEX - Publications

Bratin Saha Xiaocheng Zhou Hu Chen Ying Gao Shoumeng Yan and 5 more

The client computing platform is moving towards a heterogeneous architecture consisting of combination cores focused on scalar performance, and set throughput-oriented cores. throughput oriented (e.g. GPU) may be connected over both coherent non-coherent interconnects, have different ISAs. This paper describes programming model for such platforms. We discuss the language constructs, runtime implementation, memory environment. implemented this environment in x86 simulator. ported number...

10.1145/1542476.1542525 article EN 2009-06-15

TERAFLUX: Harnessing dataflow in next generation teradevices

OPENALEX - Publications

Roberto Giorgi Rosa M. Badía François Bodin Albert Cohen Paraskevas Evripidou and 24 more

10.1016/j.micpro.2014.04.001 article EN Microprocessors and Microsystems 2014-04-18

Deep-dive analysis of the data analytics workload in CloudSuite

OPENALEX - Publications

Ahmad Yasin Yosi Ben-Asher Avi Mendelson

Exponential growth of digital data has introduced massively-parallel systems, special orchestration layers, and new scale-out applications. While recent works suggest characteristics workloads are different from those traditional ones, their root causes not understood. Such understanding is extremely important to improve efficiency; even a 1% performance gain for core can have large impact on the datacenter as whole. This paper studies Big Data Analytics (BDA) workload modern cloud server....

10.1109/iiswc.2014.6983059 article EN 2014-10-01

The Use of Hierarchical Temporal Memory and Temporal Sequence Encoder for Online Anomaly Detection in Industrial Cyber-Physical Systems

OPENALEX - Publications

Roman Malits Avi Mendelson

This study introduces a novel, practical approach for designing hierarchical online anomaly detection system industrial cyber-physical systems. The proposed method utilizes the Hierarchical Temporal Memory (HTM) unsupervised learning algorithm, which requires data to be encoded as sparse binary distributed representations (SDRs). A new SDR encoding termed temporal sequence encoder (TSSE) is presented convert outputs into SDRs. enables HTM retain high memory capacity and robust performance...

10.3390/w17030321 article EN Water 2025-01-23

Enhancing Jailbreak Attacks via Compliance-Refusal-Based Initialization

OPENALEX - Publications

Albert Lévi Rom Himelstein Yaniv Nemcovsky Avi Mendelson Chaim Baskin

Jailbreak attacks aim to exploit large language models (LLMs) and pose a significant threat their proper conduct; they seek bypass models' safeguards often provoke transgressive behaviors. However, existing automatic jailbreak require extensive computational resources are prone converge on suboptimal solutions. In this work, we propose \textbf{C}ompliance \textbf{R}efusal \textbf{I}nitialization (CRI), novel, attack-agnostic framework that efficiently initializes the optimization in...

10.48550/arxiv.2502.09755 preprint EN arXiv (Cornell University) 2025-02-13

Can program profiling support value prediction

OPENALEX - Publications

Freddy Gabbay Avi Mendelson

This paper explores the possibility of using program profiling to enhance efficiency value prediction. Value prediction attempts eliminate true-data dependencies by predicting outcome values instructions at run-time and executing dependent based on that So far, all published papers in this area have examined hardware-only mechanisms. In order prediction, it is proposed employ collect information describes tendency a be value-predictable. The compiler acts as mediator can pass...

10.5555/266800.266826 article EN International Symposium on Microarchitecture 1997-12-01

Using value prediction to increase the power of speculative execution hardware

OPENALEX - Publications

Freddy Gabbay Avi Mendelson

This article presents an experimental and analytical study of value prediction its impact on speculative execution in superscalar microprocessors. Value is a new paradigm that suggests predicting outcome values operations (at run-time ) using these predicted to trigger the true-data-dependent speculatively. As result, stals memory locations can be reduced amount instruction-level parallelism extended beyond limits program's dataflow graph. examines characteristics concept from two...

10.1145/290409.290411 article EN ACM Transactions on Computer Systems 1998-08-01

UNIQ

OPENALEX - Publications

Chaim Baskin Natan Liss Eli Schwartz Evgenii Zheltonozhskii Raja Giryes and 2 more

We present a novel method for neural network quantization. Our method, named UNIQ , emulates non-uniform k -quantile quantizer and adapts the model to perform well with quantized weights by injecting noise at training time. As by-product of weights, we find that activations can also be as low 8-bit only minor accuracy degradation. quantization approach provides alternative existing uniform techniques networks. further propose complexity metric number bit operations performed (BOPs), show...

10.1145/3444943 article EN ACM Transactions on Computer Systems 2019-11-30

Fine-Grain Power Breakdown of Modern Out-of-Order Cores and Its Implications on Skylake-Based Systems

OPENALEX - Publications

Jawad Haj-Yihia Ahmad Yasin Yosi Ben Asher Avi Mendelson

A detailed analysis of power consumption at low system levels becomes important as a means for reducing the overall and its thermal hot spots. This work presents new estimation method that allows understanding breakdown an application when running on modern processor architecture such newly released Intel Skylake processor. also provides performance characterization report SPEC CPU2006 benchmarks, data using side-by-side breakdowns, well few interesting case studies.

10.1145/3018112 article EN ACM Transactions on Architecture and Code Optimization 2016-12-16

A survey of algorithmic methods in IC reverse engineering

OPENALEX - Publications

Leonid Azriel Julian Speith Nils Albartus Ran Ginosar Avi Mendelson and 1 more

10.1007/s13389-021-00268-5 article EN Journal of Cryptographic Engineering 2021-07-13

Teaching computing for complex problems in civil engineering and geosciences using big data and machine learning: synergizing four different computing paradigms and four different management domains

OPENALEX - Publications

Zoran Babović Branislav Bajat Dušan Barać Vesna Crnojević‐Bengin Vladan Đokić and 35 more

Abstract This article describes a teaching strategy that synergizes computing and management, aimed at the running of complex projects in industry academia, areas civil engineering, physics, geosciences, number other related fields. The course derived from this includes four parts: (a) Computing with selected set modern paradigms—the stress is on Control Flow Data paradigms, but paradigms conditionally referred to as Energy Diffusion are also covered; (b) Project management holistic—the wide...

10.1186/s40537-023-00730-7 article EN cc-by Journal Of Big Data 2023-05-31

Energy Aware Race to Halt: A Down to EARtH Approach for Platform Energy Management

OPENALEX - Publications

Efraim Rotem Ran Ginosar C. Weiser Avi Mendelson

The EARtH algorithm finds the optimal voltage and frequency operational point of processor in order to achieve minimum energy computing platform. is based on a theoretical model employing small number parameters, which are extracted from real systems using off-line run-time methods. have been validated 45nm, 32nm 22nm Intel® Core processors. can save up 44% compared with commonly used fixed policies.

10.1109/l-ca.2012.32 article EN IEEE Computer Architecture Letters 2012-10-08

Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform

OPENALEX - Publications

Chaim Baskin Natan Liss Evgenii Zheltonozhskii Alex Bronstein Avi Mendelson

Deep neural networks (DNNs) are used by different applications that executed on a range of computer architectures, from IoT devices to supercomputers. The footprint these is huge as well their computational and communication needs. In order ease the pressure resources, research indicates in many cases low precision representation (1-2 bit per parameter) weights other parameters can achieve similar accuracy while requiring less resources. Using quantized values enables use FPGAs run NNs,...

10.1109/ipdpsw.2018.00032 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2018-05-01

Fairness and Throughput in Switch on Event Multithreading

OPENALEX - Publications

Ron Gabor Shlomo Weiss Avi Mendelson

The need to reduce power and complexity will increase the interest in switch on event multithreading (coarse grained multithreading). Switch is a low mechanism improve processor throughput by switching threads execution stalls. Fairness may, however, become problem multithreaded processor. Unless fairness properly handled, some may starve while others consume all of cycles. Heuristics that were devised order simultaneous are not applicable multithreading. This paper defines metric using...

10.1109/micro.2006.25 article EN 2006-12-01

Multiple clock and voltage domains for chip multi processors

OPENALEX - Publications

Efraim Rotem Avi Mendelson Ran Ginosar Uri Weiser

Power and thermal are major constraints for delivering compute performance in high-end CPU expected to be so the future. CMP is becoming important by more within power constraints. Dynamic Voltage Frequency Scaling (DVFS) has been studied past work as a mean increase save improving overall processor's while meeting total and/or For such systems, delivery limitations significant practical design consideration, unfortunately this aspect of was almost ignored many research works. This paper...

10.1145/1669112.1669170 article EN 2009-12-12

Coming Soon ...