Abdel‐Hameed A. Badawy

ORCID: 0000-0001-8027-1449
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Distributed and Parallel Computing Systems
  • Interconnection Networks and Systems
  • Cloud Computing and Resource Management
  • Low-power high-performance VLSI design
  • Quantum Computing Algorithms and Architecture
  • Embedded Systems Design Techniques
  • Integrated Circuits and Semiconductor Failure Analysis
  • Advanced Memory and Neural Computing
  • Adversarial Robustness in Machine Learning
  • Advancements in Semiconductor Devices and Circuit Design
  • Distributed systems and fault tolerance
  • Advanced Malware Detection Techniques
  • Quantum Information and Cryptography
  • Ferroelectric and Negative Capacitance Devices
  • Electrostatic Discharge in Electronics
  • Quantum and electron transport phenomena
  • Neuroscience and Neural Engineering
  • Radiation Effects in Electronics
  • VLSI and Analog Circuit Testing
  • Experimental Learning in Engineering
  • AI in cancer detection
  • Optical Network Technologies

New Mexico State University
2016-2024

Miami University
2023-2024

Los Alamos National Laboratory
2018-2023

Sandia National Laboratories
2019

Zewail City of Science and Technology
2019

National Tsing Hua University
2019

Hiroshima University of Economics
2019

Arkansas Tech University
2013-2016

George Washington University
2014-2016

Valparaiso University
2015

We present a divide-and-conquer approach to deterministically prepare Dicke states |D<sub>k</sub><sup>n</sup>> (i.e. equal-weight superpositions of all n-qubit with Hamming Weight k) on quantum computers. In an experimental evaluation for up n=6 qubits IBM Quantum Sydney and Montreal devices, we achieve significantly higher state fidelity compared previous results [Mukherjee et.al. TQE'2020, Cruz QuTe'2019]. The gains are achieved through several techniques: Our circuits first divide the...

10.1109/tqe.2022.3174547 article EN cc-by-nc-nd IEEE Transactions on Quantum Engineering 2022-01-01

10.1515/nanoph-2016-0185 article RO cc-by Nanophotonics 2017-05-12

The growing necessity for enhanced processing capabilities in edge devices with limited resources has led us to develop effective methods improving high-performance computing (HPC) applications. In this paper, we introduce LASP (Lightweight Autotuning of Scientific Application Parameters), a novel strategy designed address the parameter search space challenge devices. Our employs multi-armed bandit (MAB) technique focused on online exploration and exploitation. Notably, takes dynamic...

10.48550/arxiv.2501.01057 preprint EN arXiv (Cornell University) 2025-01-01

Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability compilers to automat- ically vectorize code is critical effectively using these units. Understanding this capability important for anyone writing compute-intensive, high-performance, and portable code. We tested several on x86 ARM. used TSVC2 suite, with modifications made it more representative real-world On x86, GCC reported 54% loops in suite as...

10.48550/arxiv.2502.11906 preprint EN arXiv (Cornell University) 2025-02-17

Moore's law for traditional electric integrated circuits is facing increasingly more challenges in both physics and economics. Among those the fact that bandwidth per compute on chip dropping, whereas energy needed data movement keeps rising. We benchmark various interconnect technologies, including electrical, photonic, plasmonic options. contrast them with hybrid photonic-plasmonic interconnect(s) [HyPPI(s)], where we consider plasmonics active manipulation devices photonics passive...

10.1109/jphot.2015.2496357 article EN cc-by-nc-nd IEEE photonics journal 2015-10-30

Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, scalable and accurate simulation framework that enables GPU code developers architects predict performance applications in fast, manner on different PPT-GPU part open source project, Prediction Toolkit (PPT) developed at Los Alamos National Laboratory. We extend old model PPT runtimes computational physics codes offer better prediction accuracy, for which, add...

10.1109/lca.2019.2904497 article EN publisher-specific-oa IEEE Computer Architecture Letters 2019-01-01

GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy these systems. However, vendors do not publish actual cost power/energy overhead their internal microarchitecture. In this paper, we accurately measure consumption various PTX instructions found NVIDIA GPUs. We provide an exhaustive comparison more than 40 for four high-end from different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, show effect CUDA compiler...

10.1145/3387902.3392613 preprint EN 2020-05-11

Network-on-Chips (NoCs) have been widely used as a scalable communication solution in the design of multiprocessor system-on-chips (MPSoCs). NoCs enable communications between on-chip Intellectual Property (IP) cores and allow processing to achieve higher performance by outsourcing their tasks. NoC paradigm is based on idea resource sharing which hardware resources, including buffers, links, routers, etc., are shared all IPs MPSoC. In fact, data being routed each router might not be related...

10.1109/access.2021.3100540 article EN cc-by-nc-nd IEEE Access 2021-01-01

In recent decades, power consumption has become an essential factor in attracting the attention of integrated circuit (IC) designers. Multiple-valued logic (MVL) and approximate computing are some techniques that could be applied to circuits make power-efficient systems. By utilizing MVL-based instead binary logic, information conveyed by digital signals increases, this reduces required interconnections consumption. On other hand, is a class arithmetic used systems where accuracy computation...

10.3390/electronics9040643 article EN Electronics 2020-04-14

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor memory. In this paper, we evaluate impact of memory trends on effectiveness software three types applications: regular scientific codes, irregular pointer-chasing codes. We find many applications, outperforms when there is sufficient bandwidth, but outperform under bandwidth-limited conditions. The break-even point (for 1 Ghz processors) occurs at roughly 2.5 GBytes/sec today's...

10.1145/377792.377906 article EN 2001-06-17

The proliferation of mobile and IoT devices, coupled with the advances in wireless communication capabilities these have urged need for novel paradigms such heterogeneous hybrid networks. Researchers proposed opportunistic routing as a means to leverage potentials offered by While several proposals multiple protocols exist, only few explored fuzzy logic evaluate links status network construct stable faster paths towards destinations. We propose FQ-AGO, Fuzzy Logic Q-learning Based Asymmetric...

10.3390/electronics9040576 article EN Electronics 2020-03-29

Graphics Processing Units (GPUs) are now considered the leading hardware to accelerate general-purpose workloads such as AI, data analytics, and HPC. Over last decade, researchers have focused on demystifying evaluating microarchitecture features of various GPU architectures beyond what vendors reveal. This line work is necessary understand better build more efficient applications. Many works studied recent Nvidia architectures, Volta Turing, comparing them their successor, Ampere. However,...

10.1109/hpec55821.2022.9926299 article EN 2022-09-19

Traditional silicon binary circuits continue to face major challenges such as high leakage power dissipation and area of interconnections. Multiple-Valued Logic (MVL) nano-devices are two feasible solutions overcome these problems. In this paper, a novel method is presented design ternary logic based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use the unique properties CNFETs adjusting Nanontube (CNT) diameters have desired threshold voltage having same...

10.1109/nano.2017.8117467 article EN 2017-07-01

The last decade has seen a shift in the computer systems industry where heterogeneous computing become prevalent. Graphics Processing Units (GPUs) are now present supercomputers to mobile phones and tablets. GPUs used for graphics operations as well general-purpose (GPGPUs) boost performance of compute-intensive applications. However, percentage undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce very low overhead portable analysis exposing...

10.1109/hpec.2019.8916466 article EN 2019-09-01

In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache hierarchy (L1 & L2) based on reuse profiles. We extract a trace each GPU kernel once in its lifetime using recently released binary instrumentation tool, NVBIT. The extraction architecture-independent can be done any available...

10.1145/3392717.3392761 article EN 2020-06-29

In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-GPU achieves scalability through hybrid high-level modeling approach where some computations are extrapolated and multiple parts of the model parallelized. The tool primary models use pre-collected memory instructions traces workloads to accurately capture dynamic behavior kernels.

10.1145/3458817.3476221 article EN 2021-10-21

Network-on-chip (NoC) is widely used as an efficient communication architecture in multi-core and many-core System-on-chips (SoCs). However, the shared resources NoC platform, e.g., channels, buffers, routers, might be to conduct attacks compromising security of NoC-based SoCs. Most proposed encryption-based protection methods literature require leaving some parts packet unencrypted allow routers process/forward packets accordingly. This reveals source/destination information malicious which...

10.1145/3592798 article EN ACM Journal on Emerging Technologies in Computing Systems 2023-04-18

This paper utilizes Reinforcement Learning (RL) as a means to automate the Hardware Trojan (HT) insertion process eliminate inherent human biases that limit development of robust HT detection methods. An RL agent explores design space and finds circuit locations are best for keeping inserted HTs hidden. To achieve this, digital is converted an environment in which inserts such cumulative reward maximized. Our toolset can insert combinational into ISCAS-85 benchmark suite with variations size...

10.1145/3526241.3530379 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2022-06-02

Existing Hardware Trojans (HT) detection methods face several critical limitations: logic testing struggles with scalability and coverage for large designs, side-channel analysis requires golden reference chips, formal verification suffer from state-space explosion. The emergence of Large Language Models (LLMs) offers a promising new direction HT by leveraging their natural language understanding reasoning capabilities. For the first time, this paper explores potential general-purpose LLMs...

10.48550/arxiv.2412.07636 preprint EN arXiv (Cornell University) 2024-12-10

Parallel computers are becoming deeply hierarchical. Locality-aware programming models allow programmers to control locality at one level through establishing affinity between data and executing activities. This, however, does not enable exploitation other levels. Therefore, we must conceive an efficient abstraction of hierarchical develop techniques exploit it. Techniques applied directly by programmers, beyond the first level, burden programmer hinder productivity. In this article, propose...

10.1145/2897783 article EN ACM Transactions on Architecture and Code Optimization 2016-06-14
Coming Soon ...