NFDI4DS | UHH-SEMS - Publication Details

Abdel‐Hameed A. Badawy

ORCID: 0000-0001-8027-1449

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5087778749

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Physical Unclonable Functions (PUFs) and Hardware Security
Distributed and Parallel Computing Systems
Interconnection Networks and Systems
Cloud Computing and Resource Management
Low-power high-performance VLSI design
Quantum Computing Algorithms and Architecture
Embedded Systems Design Techniques
Integrated Circuits and Semiconductor Failure Analysis
Advanced Memory and Neural Computing
Adversarial Robustness in Machine Learning
Advancements in Semiconductor Devices and Circuit Design
Distributed systems and fault tolerance
Advanced Malware Detection Techniques
Quantum Information and Cryptography
Ferroelectric and Negative Capacitance Devices
Electrostatic Discharge in Electronics
Quantum and electron transport phenomena
Neuroscience and Neural Engineering
Radiation Effects in Electronics
VLSI and Analog Circuit Testing
Experimental Learning in Engineering
AI in cancer detection
Optical Network Technologies

New Mexico State University
2016-2024

Miami University
2023-2024

Los Alamos National Laboratory
2018-2023

Sandia National Laboratories
2019

Zewail City of Science and Technology
2019

National Tsing Hua University
2019

Hiroshima University of Economics
2019

Arkansas Tech University
2013-2016

George Washington University
2014-2016

Valparaiso University
2015

A Divide-and-Conquer Approach to Dicke State Preparation

OPENALEX - Publications

Shamminuj Aktar Andreas Bärtschi Abdel‐Hameed A. Badawy Stephan Eidenbenz

We present a divide-and-conquer approach to deterministically prepare Dicke states |D<sub>k</sub><sup>n</sup>> (i.e. equal-weight superpositions of all n-qubit with Hamming Weight k) on quantum computers. In an experimental evaluation for up n=6 qubits IBM Quantum Sydney and Montreal devices, we achieve significantly higher state fidelity compared previous results [Mukherjee et.al. TQE'2020, Cruz QuTe'2019]. The gains are achieved through several techniques: Our circuits first divide the...

10.1109/tqe.2022.3174547 article EN cc-by-nc-nd IEEE Transactions on Quantum Engineering 2022-01-01

Optical computing

OPENALEX - Publications

Joe Touch Abdel‐Hameed A. Badawy Volker J. Sorger

10.1515/nanoph-2016-0185 article RO cc-by Nanophotonics 2017-05-12

HPC Application Parameter Autotuning on Edge Devices: A Bandit Learning Approach

OPENALEX - Publications

Abrar Hossain Abdel‐Hameed A. Badawy Muhammad Islam Tapasya Patki Kishwar Ahmed

The growing necessity for enhanced processing capabilities in edge devices with limited resources has led us to develop effective methods improving high-performance computing (HPC) applications. In this paper, we introduce LASP (Lightweight Autotuning of Scientific Application Parameters), a novel strategy designed address the parameter search space challenge devices. Our employs multi-armed bandit (MAB) technique focused on online exploration and exploitation. Notably, takes dynamic...

10.48550/arxiv.2501.01057 preprint EN arXiv (Cornell University) 2025-01-01

Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs

OPENALEX - Publications

Nazmus Sakib Tarun Prabhu Nandakishore Santhi John Shalf Abdel‐Hameed A. Badawy

Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability compilers to automat- ically vectorize code is critical effectively using these units. Understanding this capability important for anyone writing compute-intensive, high-performance, and portable code. We tested several on x86 ARM. used TSVC2 suite, with modifications made it more representative real-world On x86, GCC reported 54% loops in suite as...

10.48550/arxiv.2502.11906 preprint EN arXiv (Cornell University) 2025-02-17

The Case for Hybrid Photonic Plasmonic Interconnects (HyPPIs): Low-Latency Energy-and-Area-Efficient On-Chip Interconnects

OPENALEX - Publications

Shuai Sun Abdel‐Hameed A. Badawy Vikram K. Narayana Tarek El‐Ghazawi Volker J. Sorger

Moore's law for traditional electric integrated circuits is facing increasingly more challenges in both physics and economics. Among those the fact that bandwidth per compute on chip dropping, whereas energy needed data movement keeps rising. We benchmark various interconnect technologies, including electrical, photonic, plasmonic options. contrast them with hybrid photonic-plasmonic interconnect(s) [HyPPI(s)], where we consider plasmonics active manipulation devices photonics passive...

10.1109/jphot.2015.2496357 article EN cc-by-nc-nd IEEE photonics journal 2015-10-30

MorphoNoC: Exploring the design space of a configurable hybrid NoC using nanophotonics

OPENALEX - Publications

Vikram K. Narayana Shuai Sun Abdel‐Hameed A. Badawy Volker J. Sorger Tarek El-Ghazawi

10.1016/j.micpro.2017.03.006 article EN Microprocessors and Microsystems 2017-03-11

PPT-GPU: Scalable GPU Performance Modeling

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz

Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, scalable and accurate simulation framework that enables GPU code developers architects predict performance applications in fast, manner on different PPT-GPU part open source project, Prediction Toolkit (PPT) developed at Los Alamos National Laboratory. We extend old model PPT runtimes computational physics codes offer better prediction accuracy, for which, add...

10.1109/lca.2019.2904497 article EN publisher-specific-oa IEEE Computer Architecture Letters 2019-01-01

Verified instruction-level energy consumption measurement for NVIDIA GPUs

OPENALEX - Publications

Yehia Arafa Ammar ElWazir Abdelrahman Elkanishy Youssef Aly Ayatelrahman Elsayed and 4 more

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy these systems. However, vendors do not publish actual cost power/energy overhead their internal microarchitecture. In this paper, we accurately measure consumption various PTX instructions found NVIDIA GPUs. We provide an exhaustive comparison more than 40 for four high-end from different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, show effect CUDA compiler...

10.1145/3387902.3392613 preprint EN 2020-05-11

A Survey on the Security of Wired, Wireless, and 3D Network-on-Chips

OPENALEX - Publications

Amin Sarihi Ahmad Patooghy Ahmed Khalid Mahdi Hasanzadeh Mostafa Said and 1 more

Network-on-Chips (NoCs) have been widely used as a scalable communication solution in the design of multiprocessor system-on-chips (MPSoCs). NoCs enable communications between on-chip Intellectual Property (IP) cores and allow processing to achieve higher performance by outsourcing their tasks. NoC paradigm is based on idea resource sharing which hardware resources, including buffers, links, routers, etc., are shared all IPs MPSoC. In fact, data being routed each router might not be related...

10.1109/access.2021.3100540 article EN cc-by-nc-nd IEEE Access 2021-01-01

Energy-Efficient Ternary Multipliers Using CNT Transistors

OPENALEX - Publications

Sepehr Tabrizchi Atiyeh Panahi Fazel Sharifi Hamid Mahmoodi Abdel‐Hameed A. Badawy

In recent decades, power consumption has become an essential factor in attracting the attention of integrated circuit (IC) designers. Multiple-valued logic (MVL) and approximate computing are some techniques that could be applied to circuits make power-efficient systems. By utilizing MVL-based instead binary logic, information conveyed by digital signals increases, this reduces required interconnections consumption. On other hand, is a class arithmetic used systems where accuracy computation...

10.3390/electronics9040643 article EN Electronics 2020-04-14

Evaluating the impact of memory system performance on software prefetching and locality optimizations

OPENALEX - Publications

Abdel‐Hameed A. Badawy Aneesh Aggarwal Donald Yeung Chau‐Wen Tseng

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor memory. In this paper, we evaluate impact of memory trends on effectiveness software three types applications: regular scientific codes, irregular pointer-chasing codes. We find many applications, outperforms when there is sufficient bandwidth, but outperform under bandwidth-limited conditions. The break-even point (for 1 Ghz processors) occurs at roughly 2.5 GBytes/sec today's...

10.1145/377792.377906 article EN 2001-06-17

High performance, variation-tolerant CNFET ternary full adder a process, voltage, and temperature variation-resilient design

OPENALEX - Publications

Samane Firouzi Sepehr Tabrizchi Fazel Sharifi Abdel‐Hameed A. Badawy

10.1016/j.compeleceng.2019.05.018 article EN Computers & Electrical Engineering 2019-06-06

FQ-AGO: Fuzzy Logic Q-Learning Based Asymmetric Link Aware and Geographic Opportunistic Routing Scheme for MANETs

OPENALEX - Publications

Ali Alshehri Abdel‐Hameed A. Badawy Hongbin Huang

The proliferation of mobile and IoT devices, coupled with the advances in wireless communication capabilities these have urged need for novel paradigms such heterogeneous hybrid networks. Researchers proposed opportunistic routing as a means to leverage potentials offered by While several proposals multiple protocols exist, only few explored fuzzy logic evaluate links status network construct stable faster paths towards destinations. We propose FQ-AGO, Fuzzy Logic Q-learning Based Asymmetric...

10.3390/electronics9040576 article EN Electronics 2020-03-29

Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis

OPENALEX - Publications

Hamdy Abdelkhalik Yehia Arafa Nandakishore Santhi Abdel‐Hameed A. Badawy

Graphics Processing Units (GPUs) are now considered the leading hardware to accelerate general-purpose workloads such as AI, data analytics, and HPC. Over last decade, researchers have focused on demystifying evaluating microarchitecture features of various GPU architectures beyond what vendors reveal. This line work is necessary understand better build more efficient applications. Many works studied recent Nvidia architectures, Volta Turing, comparing them their successor, Ampere. However,...

10.1109/hpec55821.2022.9926299 article EN 2022-09-19

Enabling energy-efficient ternary logic gates using CNFETs

OPENALEX - Publications

Sepehr Tabrizchi Fazel Sharifi Abdel‐Hameed A. Badawy Z. M. Saifullah

Traditional silicon binary circuits continue to face major challenges such as high leakage power dissipation and area of interconnections. Multiple-Valued Logic (MVL) nano-devices are two feasible solutions overcome these problems. In this paper, a novel method is presented design ternary logic based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use the unique properties CNFETs adjusting Nanontube (CNT) diameters have desired threshold voltage having same...

10.1109/nano.2017.8117467 article EN 2017-07-01

Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Gopinath Chennupati Nandakishore Santhi Stephan Eidenbenz

The last decade has seen a shift in the computer systems industry where heterogeneous computing become prevalent. Graphics Processing Units (GPUs) are now present supercomputers to mobile phones and tablets. GPUs used for graphics operations as well general-purpose (GPGPUs) boost performance of compute-intensive applications. However, percentage undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce very low overhead portable analysis exposing...

10.1109/hpec.2019.8916466 article EN 2019-09-01

Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Gopinath Chennupati Atanu Barai Nandakishore Santhi and 1 more

In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache hierarchy (L1 & L2) based on reuse profiles. We extract a trace each GPU kernel once in its lifetime using recently released binary instrumentation tool, NVBIT. The extraction architecture-independent can be done any available...

10.1145/3392717.3392761 article EN 2020-06-29

Hybrid, scalable, trace-driven performance modeling of GPGPUs

OPENALEX - Publications

Yehia Arafa Abdel‐Hameed A. Badawy Ammar ElWazir Atanu Barai Ali Eker and 3 more

In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-GPU achieves scalability through hybrid high-level modeling approach where some computations are extrapolated and multiple parts of the model parallelized. The tool primary models use pre-collected memory instructions traces workloads to accurately capture dynamic behavior kernels.

10.1145/3458817.3476221 article EN 2021-10-21

Securing Network-on-chips Against Fault-injection and Crypto-analysis Attacks via Stochastic Anonymous Routing

OPENALEX - Publications

Ahmad Patooghy Mahdi Hasanzadeh Amin Sarihi Mostafa Abdelrehim Abdel‐Hameed A. Badawy

Network-on-chip (NoC) is widely used as an efficient communication architecture in multi-core and many-core System-on-chips (SoCs). However, the shared resources NoC platform, e.g., channels, buffers, routers, might be to conduct attacks compromising security of NoC-based SoCs. Most proposed encryption-based protection methods literature require leaving some parts packet unencrypted allow routers process/forward packets accordingly. This reveals source/destination information malicious which...

10.1145/3592798 article EN ACM Journal on Emerging Technologies in Computing Systems 2023-04-18

Hardware Trojan Insertion Using Reinforcement Learning

OPENALEX - Publications

Amin Sarihi Ahmad Patooghy Peter Jamieson Abdel‐Hameed A. Badawy

This paper utilizes Reinforcement Learning (RL) as a means to automate the Hardware Trojan (HT) insertion process eliminate inherent human biases that limit development of robust HT detection methods. An RL agent explores design space and finds circuit locations are best for keeping inserted HTs hidden. To achieve this, digital is converted an environment in which inserts such cumulative reward maximized. Our toolset can insert combinational into ISCAS-85 benchmark suite with variations size...

10.1145/3526241.3530379 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2022-06-02

TrojanForge: Generating Adversarial Hardware Trojan Examples Using Reinforcement Learning

OPENALEX - Publications

Amin Sarihi Peter Jamieson Ahmad Patooghy Abdel‐Hameed A. Badawy

10.1145/3670474.3685959 article EN 2024-09-03

TrojanWhisper: Evaluating Pre-trained LLMs to Detect and Localize Hardware Trojans

OPENALEX - Publications

Md. Omar Faruque Peter Jamieson Ahmad Patooghy Abdel‐Hameed A. Badawy

Existing Hardware Trojans (HT) detection methods face several critical limitations: logic testing struggles with scalability and coverage for large designs, side-channel analysis requires golden reference chips, formal verification suffer from state-space explosion. The emergence of Large Language Models (LLMs) offers a promising new direction HT by leveraging their natural language understanding reasoning capabilities. For the first time, this paper explores potential general-purpose LLMs...

10.48550/arxiv.2412.07636 preprint EN arXiv (Cornell University) 2024-12-10

Exploiting Hierarchical Locality in Deep Parallel Architectures

OPENALEX - Publications

Ahmad Anbar Olivier Serres Engin Kayraklioglu Abdel‐Hameed A. Badawy Tarek El‐Ghazawi

Parallel computers are becoming deeply hierarchical. Locality-aware programming models allow programmers to control locality at one level through establishing affinity between data and executing activities. This, however, does not enable exploitation other levels. Therefore, we must conceive an efficient abstraction of hierarchical develop techniques exploit it. Techniques applied directly by programmers, beyond the first level, burden programmer hinder productivity. In this article, propose...

10.1145/2897783 article EN ACM Transactions on Architecture and Code Optimization 2016-06-14

Coming Soon ...