- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Neural Networks and Reservoir Computing
- Low-power high-performance VLSI design
- Semiconductor materials and devices
- Parallel Computing and Optimization Techniques
- Advanced Neural Network Applications
- Neuroscience and Neural Engineering
- Phase-change materials and chalcogenides
- VLSI and Analog Circuit Testing
- IoT and Edge/Fog Computing
- Interconnection Networks and Systems
- Energy Harvesting in Wireless Networks
- VLSI and FPGA Design Techniques
- Quantum Computing Algorithms and Architecture
- Neural dynamics and brain function
- Numerical Methods and Algorithms
- Age of Information Optimization
- Distributed and Parallel Computing Systems
- Integrated Circuits and Semiconductor Failure Analysis
- Embedded Systems Design Techniques
University of South Carolina
2021-2023
Chungbuk National University
2020
Nile University
2018-2020
Efficient communication is central to both biological and artificial intelligence (AI) systems. In brains, the challenge of long-range across regions addressed through sparse, spike-based signaling, minimizing energy consumption latency. contrast, modern AI workloads, which keep scaling ever larger distributed compute systems, are increasingly constrained by bandwidth limitations, creating bottlenecks that hinder scalability efficiency. Inspired brain's efficient strategies, we propose SNAP,...
In this paper, a Static Noise Margin (SNM) analysis for 2T2M RRAM cell is investigated. The proposed done using mathematical formulation and verified by SPICE simulations. tested both, write read modes. Moreover, the applied to diverse types of cells, comparison between performance such cells discussed. Additionally, effect exponential memristor model on behaviour in terms switching speed range resistance are discussed detail. circuits design simulations were carried out TSMC 130 nm CMOS...
We propose an approximate tensor processing unit (APTPU), which includes two main components: (1) elements (APEs) consisting of a low-precision multiplier and adder, (2) pre-approximate units (PAUs) are shared among the APEs in APTPU's systolic array, functioning as steering logic to pre-process operands feed them APEs. conduct extensive experiments evaluate performance APTPU across various configurations workloads. The results show that array achieves up <inline-formula...
Transformer models have become a dominant architecture in the world of machine learning. From natural language processing to more recent computer vision applications, Transformers shown remarkable results and established new state-of-the-art many domains. However, this increase performance has come at cost ever-increasing model sizes requiring resources deploy. Machine learning (ML) are used real-world systems, such as robotics, mobile devices, internet things (IoT) that require fast...
Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well tiny ML applications. TPUs offer several improvements and advantages over conventional accelerators, like graphical (GPUs), being designed specifically to perform multiply-accumulate (MAC) operations required matrix-matrix matrix-vector multiplies extensively present throughout execution deep neural networks (DNNs). Such include maximizing reuse...
In this paper, we develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well binarized synapses. First, it is shown the proposed IMAC can be utilized a multilayer perceptron (MLP) classifier achieving orders of magnitude performance improvement compared previous mixed-signal digital...
Convolutional Neural Networks (CNNs) for Artificial Intelligence (AI) algorithms have been widely used in many applications especially image recognition. However, the growth CNN-based recognition raised challenge executing millions of Multiply and Accumulate (MAC) operations state-of-the-art CNNs. Therefore, GPUs, FPGAs, ASICs are feasible solutions balancing processing speed power consumption. In this paper, we propose an efficient hardware architecture CNN that provides high speed, low...
Conventional in-memory computing (IMC) architectures consist of analog memristive crossbars to accelerate matrix-vector multiplication (MVM), and digital functional units realize nonlinear vector (NLV) operations in deep neural networks (DNNs). These designs, however, require energy-hungry signal conversion which can dissipate more than 95% the total power system. Fully-analog IMC circuits remove need for converters by realizing both MVM NLV domain leading significant energy savings....
Fully-analog in-memory computing (IMC) architectures that implement both matrix-vector multiplication and non-linear vector operations within the same memory array have shown promising performance benefits over conventional IMC systems due to removal of energy-hungry signal conversion units. However, maintaining computation in analog domain for entire deep neural network (DNN) comes with potential sensitivity interconnect parasitics. Thus, this paper, we investigate effect wire parasitic...
We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed neuron circuit consumes 1.8-27x less power, occupies 2.5-4931x smaller area, compared to state-of-the-art digital implementations. Moreover, developed can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. architecture-level analyses...
Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in neural networks (CNNs). However, they struggle to maintain the same efficiency fully connected (FC) layers, leading suboptimal utilization. In-memory analog computing (IMAC) architectures, on other hand, demonstrated notable speedup FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and...
Area and power consumption are the main challenges in Network on Chip (NoC). Indeed, First Input Output (FIFO) memory is key element NoC. Increasing FIFO depth, produces an increas performance of NoC but at cost area consumption. This paper proposes a new hybrid CMOS-Memristor based architecture that consumes low has small size compared to conventional CMOS-based FIFOs. The predicted approximately equal half wasted implementation controller module implemented using HDL. Moreover,...
With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems machine learning applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist circuit based on various...
Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in neural networks (CNNs). However, they struggle to maintain the same efficiency fully connected (FC) layers, leading suboptimal utilization. In-memory analog computing (IMAC) architectures, on other hand, demonstrated notable speedup FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and...
With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems machine learning applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist circuit based on various...
Fully-analog in-memory computing (IMC) architectures that implement both matrix-vector multiplication and non-linear vector operations within the same memory array have shown promising performance benefits over conventional IMC systems due to removal of energy-hungry signal conversion units. However, maintaining computation in analog domain for entire deep neural network (DNN) comes with potential sensitivity interconnect parasitics. Thus, this paper, we investigate effect wire parasitic...
With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems data-intensive applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration multi-objective optimization of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist...
Conventional in-memory computing (IMC) architectures consist of analog memristive crossbars to accelerate matrix-vector multiplication (MVM), and digital functional units realize nonlinear vector (NLV) operations in deep neural networks (DNNs). These designs, however, require energy-hungry signal conversion which can dissipate more than 95% the total power system. In-Memory Analog Computing (IMAC) circuits, on other hand, remove need for converters by realizing both MVM NLV domain leading...