NFDI4DS | UHH-SEMS - Publication Details

Learnable Sparsification of Die-to-Die Communication via Spike-Based Encoding

OPENALEX - Publications

Joshua Nardone Ruijie Zhu Joseph Callenes Mohammed Elbtity Ramtin Zand and 1 more

Efficient communication is central to both biological and artificial intelligence (AI) systems. In brains, the challenge of long-range across regions addressed through sparse, spike-based signaling, minimizing energy consumption latency. contrast, modern AI workloads, which keep scaling ever larger distributed compute systems, are increasingly constrained by bandwidth limitations, creating bottlenecks that hinder scalability efficiency. Inspired brain's efficient strategies, we propose SNAP,...

10.48550/arxiv.2501.08645 preprint EN arXiv (Cornell University) 2025-01-15

Design and analysis of 2T2M hybrid CMOS-Memristor based RRAM

OPENALEX - Publications

Noha Shaarawy Ahmed K. Emara Ahmed M. El-Naggar Mohammed Elbtity Maged Ghoneima and 1 more

In this paper, a Static Noise Margin (SNM) analysis for 2T2M RRAM cell is investigated. The proposed done using mathematical formulation and verified by SPICE simulations. tested both, write read modes. Moreover, the applied to diverse types of cells, comparison between performance such cells discussed. Additionally, effect exponential memristor model on behaviour in terms switching speed range resistance are discussed detail. circuits design simulations were carried out TSMC 130 nm CMOS...

10.1016/j.mejo.2018.01.001 article EN Microelectronics Journal 2018-02-03

APTPU: Approximate Computing Based Tensor Processing Unit

OPENALEX - Publications

Mohammed Elbtity Peyton Chandarana Brendan Reidy Jason K. Eshraghian Ramtin Zand

We propose an approximate tensor processing unit (APTPU), which includes two main components: (1) elements (APEs) consisting of a low-precision multiplier and adder, (2) pre-approximate units (PAUs) are shared among the APEs in APTPU's systolic array, functioning as steering logic to pre-process operands feed them APEs. conduct extensive experiments evaluate performance APTPU across various configurations workloads. The results show that array achieves up <inline-formula...

10.1109/tcsi.2022.3206262 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2022-09-23

Work in Progress: Real-time Transformer Inference on Edge AI Accelerators

OPENALEX - Publications

Brendan Reidy Mohammadreza Mohammadi Mohammed Elbtity Heath Smith Z Ramtin

Transformer models have become a dominant architecture in the world of machine learning. From natural language processing to more recent computer vision applications, Transformers shown remarkable results and established new state-of-the-art many domains. However, this increase performance has come at cost ever-increasing model sizes requiring resources deploy. Machine learning (ML) are used real-world systems, such as robotics, mobile devices, internet things (IoT) that require fast...

10.1109/rtas58335.2023.00036 article EN 2023-05-01

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

OPENALEX - Publications

Mohammed Elbtity Peyton Chandarana Ramtin Zand

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well tiny ML applications. TPUs offer several improvements and advantages over conventional accelerators, like graphical (GPUs), being designed specifically to perform multiply-accumulate (MAC) operations required matrix-matrix matrix-vector multiplies extensively present throughout execution deep neural networks (DNNs). Such include maximizing reuse...

10.48550/arxiv.2407.08700 preprint EN arXiv (Cornell University) 2024-07-11

An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices

OPENALEX - Publications

Mohammed Elbtity Abhishek Singh Brendan Reidy Xiaochen Guo Ramtin Zand

In this paper, we develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well binarized synapses. First, it is shown the proposed IMAC can be utilized a multilayer perceptron (MLP) classifier achieving orders of magnitude performance improvement compared previous mixed-signal digital...

10.1109/isvlsi51109.2021.00043 preprint EN 2021-07-01

High Speed, Approximate Arithmetic Based Convolutional Neural Network Accelerator

OPENALEX - Publications

Mohammed Elbtity Hyun-Wook Son Dong-Yeong Lee HyungWon Kim

Convolutional Neural Networks (CNNs) for Artificial Intelligence (AI) algorithms have been widely used in many applications especially image recognition. However, the growth CNN-based recognition raised challenge executing millions of Multiply and Accumulate (MAC) operations state-of-the-art CNNs. Therefore, GPUs, FPGAs, ASICs are feasible solutions balancing processing speed power consumption. In this paper, we propose an efficient hardware architecture CNN that provides high speed, low...

10.1109/isocc50952.2020.9333013 article EN 2020-10-21

Xbar-Partitioning: A Practical Way for Parasitics and Noise Tolerance in Analog IMC Circuits

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

Conventional in-memory computing (IMC) architectures consist of analog memristive crossbars to accelerate matrix-vector multiplication (MVM), and digital functional units realize nonlinear vector (NLV) operations in deep neural networks (DNNs). These designs, however, require energy-hungry signal conversion which can dissipate more than 95% the total power system. Fully-analog IMC circuits remove need for converters by realizing both MVM NLV domain leading significant energy savings....

10.1109/jetcas.2022.3222966 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2022-11-17

Interconnect Parasitics and Partitioning in Fully-Analog In-Memory Computing Architectures

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

Fully-analog in-memory computing (IMC) architectures that implement both matrix-vector multiplication and non-linear vector operations within the same memory array have shown promising performance benefits over conventional IMC systems due to removal of energy-hungry signal conversion units. However, maintaining computation in analog domain for entire deep neural network (DNN) comes with potential sensitivity interconnect parasitics. Thus, this paper, we investigate effect wire parasitic...

10.1109/iscas48785.2022.9937884 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28

MRAM-based Analog Sigmoid Function for In-memory Computing

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Mohammadreza Mohammadi Ramtin Zand

We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed neuron circuit consumes 1.8-27x less power, occupies 2.5-4931x smaller area, compared to state-of-the-art digital implementations. Moreover, developed can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. architecture-level analyses...

10.1145/3526241.3530376 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2022-06-02

Design Automation and Quantitative Analysis of Approximate Arithmetic Circuits

OPENALEX - Publications

Mohammed Elbtity Md Hasibul Amin Hossam Hassan Ramtin Zand

10.36227/techrxiv.172833314.44769251/v1 preprint EN 2024-10-07

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

OPENALEX - Publications

Mohammed Elbtity Brendan Reidy Md Hasibul Amin Ramtin Zand

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in neural networks (CNNs). However, they struggle to maintain the same efficiency fully connected (FC) layers, leading suboptimal utilization. In-memory analog computing (IMAC) architectures, on other hand, demonstrated notable speedup FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and...

10.1145/3583781.3590256 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2023-05-31

Small Area and Low Power Hybrid CMOS-Memristor Based FIFO for NoC

OPENALEX - Publications

Mohammed Elbtity Ahmed G. Radwan

Area and power consumption are the main challenges in Network on Chip (NoC). Indeed, First Input Output (FIFO) memory is key element NoC. Increasing FIFO depth, produces an increas performance of NoC but at cost area consumption. This paper proposes a new hybrid CMOS-Memristor based architecture that consumes low has small size compared to conventional CMOS-based FIFOs. The predicted approximately equal half wasted implementation controller module implemented using HDL. Moreover,...

10.1109/iscas.2018.8351645 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-05-01

IMAC-Sim: A Circuit-level Simulator For In-Memory Analog Computing Architectures

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems machine learning applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist circuit based on various...

10.48550/arxiv.2304.09252 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Heterogeneous Integration of In-Memory Analog Computing Architectures with Tensor Processing Units

OPENALEX - Publications

Mohammed Elbtity Brendan Reidy Md Hasibul Amin Ramtin Zand

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in neural networks (CNNs). However, they struggle to maintain the same efficiency fully connected (FC) layers, leading suboptimal utilization. In-memory analog computing (IMAC) architectures, on other hand, demonstrated notable speedup FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and...

10.48550/arxiv.2304.09258 preprint EN other-oa arXiv (Cornell University) 2023-01-01

IMAC-Sim:

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems machine learning applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist circuit based on various...

10.1145/3583781.3590264 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2023-05-31

Interconnect Parasitics and Partitioning in Fully-Analog In-Memory Computing Architectures

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

Fully-analog in-memory computing (IMC) architectures that implement both matrix-vector multiplication and non-linear vector operations within the same memory array have shown promising performance benefits over conventional IMC systems due to removal of energy-hungry signal conversion units. However, maintaining computation in analog domain for entire deep neural network (DNN) comes with potential sensitivity interconnect parasitics. Thus, this paper, we investigate effect wire parasitic...

10.48550/arxiv.2201.12480 preprint EN other-oa arXiv (Cornell University) 2022-01-01

A Python Framework for SPICE Circuit Simulation of In-Memory Analog Computing Circuits

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems data-intensive applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration multi-objective optimization of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist...

10.48550/arxiv.2210.17410 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Reliability-Aware Deployment of DNNs on In-Memory Analog Computing Architectures

OPENALEX - Publications

Md Hasibul Amin Mohammed Elbtity Ramtin Zand

Conventional in-memory computing (IMC) architectures consist of analog memristive crossbars to accelerate matrix-vector multiplication (MVM), and digital functional units realize nonlinear vector (NLV) operations in deep neural networks (DNNs). These designs, however, require energy-hungry signal conversion which can dissipate more than 95% the total power system. In-Memory Analog Computing (IMAC) circuits, on other hand, remove need for converters by realizing both MVM NLV domain leading...

10.48550/arxiv.2211.00590 preprint EN other-oa arXiv (Cornell University) 2022-01-01