- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Physical Unclonable Functions (PUFs) and Hardware Security
- Distributed and Parallel Computing Systems
- Interconnection Networks and Systems
- Cloud Computing and Resource Management
- Low-power high-performance VLSI design
- Quantum Computing Algorithms and Architecture
- Embedded Systems Design Techniques
- Integrated Circuits and Semiconductor Failure Analysis
- Advanced Memory and Neural Computing
- Adversarial Robustness in Machine Learning
- Advancements in Semiconductor Devices and Circuit Design
- Distributed systems and fault tolerance
- Advanced Malware Detection Techniques
- Quantum Information and Cryptography
- Ferroelectric and Negative Capacitance Devices
- Electrostatic Discharge in Electronics
- Quantum and electron transport phenomena
- Neuroscience and Neural Engineering
- Radiation Effects in Electronics
- VLSI and Analog Circuit Testing
- Experimental Learning in Engineering
- AI in cancer detection
- Optical Network Technologies
New Mexico State University
2016-2024
Miami University
2023-2024
Los Alamos National Laboratory
2018-2023
Sandia National Laboratories
2019
Zewail City of Science and Technology
2019
National Tsing Hua University
2019
Hiroshima University of Economics
2019
Arkansas Tech University
2013-2016
George Washington University
2014-2016
Valparaiso University
2015
We present a divide-and-conquer approach to deterministically prepare Dicke states |D<sub>k</sub><sup>n</sup>> (i.e. equal-weight superpositions of all n-qubit with Hamming Weight k) on quantum computers. In an experimental evaluation for up n=6 qubits IBM Quantum Sydney and Montreal devices, we achieve significantly higher state fidelity compared previous results [Mukherjee et.al. TQE'2020, Cruz QuTe'2019]. The gains are achieved through several techniques: Our circuits first divide the...
The growing necessity for enhanced processing capabilities in edge devices with limited resources has led us to develop effective methods improving high-performance computing (HPC) applications. In this paper, we introduce LASP (Lightweight Autotuning of Scientific Application Parameters), a novel strategy designed address the parameter search space challenge devices. Our employs multi-armed bandit (MAB) technique focused on online exploration and exploitation. Notably, takes dynamic...
Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability compilers to automat- ically vectorize code is critical effectively using these units. Understanding this capability important for anyone writing compute-intensive, high-performance, and portable code. We tested several on x86 ARM. used TSVC2 suite, with modifications made it more representative real-world On x86, GCC reported 54% loops in suite as...
Moore's law for traditional electric integrated circuits is facing increasingly more challenges in both physics and economics. Among those the fact that bandwidth per compute on chip dropping, whereas energy needed data movement keeps rising. We benchmark various interconnect technologies, including electrical, photonic, plasmonic options. contrast them with hybrid photonic-plasmonic interconnect(s) [HyPPI(s)], where we consider plasmonics active manipulation devices photonics passive...
Performance modeling is a challenging problem due to the complexities of hardware architectures. In this paper, we present PPT-GPU, scalable and accurate simulation framework that enables GPU code developers architects predict performance applications in fast, manner on different PPT-GPU part open source project, Prediction Toolkit (PPT) developed at Los Alamos National Laboratory. We extend old model PPT runtimes computational physics codes offer better prediction accuracy, for which, add...
GPUs are prevalent in modern computing systems at all scales. They consume a significant fraction of the energy these systems. However, vendors do not publish actual cost power/energy overhead their internal microarchitecture. In this paper, we accurately measure consumption various PTX instructions found NVIDIA GPUs. We provide an exhaustive comparison more than 40 for four high-end from different generations (Maxwell, Pascal, Volta, and Turing). Furthermore, show effect CUDA compiler...
Network-on-Chips (NoCs) have been widely used as a scalable communication solution in the design of multiprocessor system-on-chips (MPSoCs). NoCs enable communications between on-chip Intellectual Property (IP) cores and allow processing to achieve higher performance by outsourcing their tasks. NoC paradigm is based on idea resource sharing which hardware resources, including buffers, links, routers, etc., are shared all IPs MPSoC. In fact, data being routed each router might not be related...
In recent decades, power consumption has become an essential factor in attracting the attention of integrated circuit (IC) designers. Multiple-valued logic (MVL) and approximate computing are some techniques that could be applied to circuits make power-efficient systems. By utilizing MVL-based instead binary logic, information conveyed by digital signals increases, this reduces required interconnections consumption. On other hand, is a class arithmetic used systems where accuracy computation...
Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor memory. In this paper, we evaluate impact of memory trends on effectiveness software three types applications: regular scientific codes, irregular pointer-chasing codes. We find many applications, outperforms when there is sufficient bandwidth, but outperform under bandwidth-limited conditions. The break-even point (for 1 Ghz processors) occurs at roughly 2.5 GBytes/sec today's...
The proliferation of mobile and IoT devices, coupled with the advances in wireless communication capabilities these have urged need for novel paradigms such heterogeneous hybrid networks. Researchers proposed opportunistic routing as a means to leverage potentials offered by While several proposals multiple protocols exist, only few explored fuzzy logic evaluate links status network construct stable faster paths towards destinations. We propose FQ-AGO, Fuzzy Logic Q-learning Based Asymmetric...
Graphics Processing Units (GPUs) are now considered the leading hardware to accelerate general-purpose workloads such as AI, data analytics, and HPC. Over last decade, researchers have focused on demystifying evaluating microarchitecture features of various GPU architectures beyond what vendors reveal. This line work is necessary understand better build more efficient applications. Many works studied recent Nvidia architectures, Volta Turing, comparing them their successor, Ampere. However,...
Traditional silicon binary circuits continue to face major challenges such as high leakage power dissipation and area of interconnections. Multiple-Valued Logic (MVL) nano-devices are two feasible solutions overcome these problems. In this paper, a novel method is presented design ternary logic based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use the unique properties CNFETs adjusting Nanontube (CNT) diameters have desired threshold voltage having same...
The last decade has seen a shift in the computer systems industry where heterogeneous computing become prevalent. Graphics Processing Units (GPUs) are now present supercomputers to mobile phones and tablets. GPUs used for graphics operations as well general-purpose (GPGPUs) boost performance of compute-intensive applications. However, percentage undisclosed characteristics beyond what vendors provide is not small. In this paper, we introduce very low overhead portable analysis exposing...
In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache hierarchy (L1 & L2) based on reuse profiles. We extract a trace each GPU kernel once in its lifetime using recently released binary instrumentation tool, NVBIT. The extraction architecture-independent can be done any available...
In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-GPU achieves scalability through hybrid high-level modeling approach where some computations are extrapolated and multiple parts of the model parallelized. The tool primary models use pre-collected memory instructions traces workloads to accurately capture dynamic behavior kernels.
Network-on-chip (NoC) is widely used as an efficient communication architecture in multi-core and many-core System-on-chips (SoCs). However, the shared resources NoC platform, e.g., channels, buffers, routers, might be to conduct attacks compromising security of NoC-based SoCs. Most proposed encryption-based protection methods literature require leaving some parts packet unencrypted allow routers process/forward packets accordingly. This reveals source/destination information malicious which...
This paper utilizes Reinforcement Learning (RL) as a means to automate the Hardware Trojan (HT) insertion process eliminate inherent human biases that limit development of robust HT detection methods. An RL agent explores design space and finds circuit locations are best for keeping inserted HTs hidden. To achieve this, digital is converted an environment in which inserts such cumulative reward maximized. Our toolset can insert combinational into ISCAS-85 benchmark suite with variations size...
Existing Hardware Trojans (HT) detection methods face several critical limitations: logic testing struggles with scalability and coverage for large designs, side-channel analysis requires golden reference chips, formal verification suffer from state-space explosion. The emergence of Large Language Models (LLMs) offers a promising new direction HT by leveraging their natural language understanding reasoning capabilities. For the first time, this paper explores potential general-purpose LLMs...
Parallel computers are becoming deeply hierarchical. Locality-aware programming models allow programmers to control locality at one level through establishing affinity between data and executing activities. This, however, does not enable exploitation other levels. Therefore, we must conceive an efficient abstraction of hierarchical develop techniques exploit it. Techniques applied directly by programmers, beyond the first level, burden programmer hinder productivity. In this article, propose...