Mohammed Elbtity

ORCID: 0000-0002-3282-0076
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices
  • Neural Networks and Reservoir Computing
  • Low-power high-performance VLSI design
  • Semiconductor materials and devices
  • Parallel Computing and Optimization Techniques
  • Advanced Neural Network Applications
  • Neuroscience and Neural Engineering
  • Phase-change materials and chalcogenides
  • VLSI and Analog Circuit Testing
  • IoT and Edge/Fog Computing
  • Interconnection Networks and Systems
  • Energy Harvesting in Wireless Networks
  • VLSI and FPGA Design Techniques
  • Quantum Computing Algorithms and Architecture
  • Neural dynamics and brain function
  • Numerical Methods and Algorithms
  • Age of Information Optimization
  • Distributed and Parallel Computing Systems
  • Integrated Circuits and Semiconductor Failure Analysis
  • Embedded Systems Design Techniques

University of South Carolina
2021-2023

Chungbuk National University
2020

Nile University
2018-2020

Efficient communication is central to both biological and artificial intelligence (AI) systems. In brains, the challenge of long-range across regions addressed through sparse, spike-based signaling, minimizing energy consumption latency. contrast, modern AI workloads, which keep scaling ever larger distributed compute systems, are increasingly constrained by bandwidth limitations, creating bottlenecks that hinder scalability efficiency. Inspired brain's efficient strategies, we propose SNAP,...

10.48550/arxiv.2501.08645 preprint EN arXiv (Cornell University) 2025-01-15

In this paper, a Static Noise Margin (SNM) analysis for 2T2M RRAM cell is investigated. The proposed done using mathematical formulation and verified by SPICE simulations. tested both, write read modes. Moreover, the applied to diverse types of cells, comparison between performance such cells discussed. Additionally, effect exponential memristor model on behaviour in terms switching speed range resistance are discussed detail. circuits design simulations were carried out TSMC 130 nm CMOS...

10.1016/j.mejo.2018.01.001 article EN Microelectronics Journal 2018-02-03

We propose an approximate tensor processing unit (APTPU), which includes two main components: (1) elements (APEs) consisting of a low-precision multiplier and adder, (2) pre-approximate units (PAUs) are shared among the APEs in APTPU's systolic array, functioning as steering logic to pre-process operands feed them APEs. conduct extensive experiments evaluate performance APTPU across various configurations workloads. The results show that array achieves up <inline-formula...

10.1109/tcsi.2022.3206262 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2022-09-23

Transformer models have become a dominant architecture in the world of machine learning. From natural language processing to more recent computer vision applications, Transformers shown remarkable results and established new state-of-the-art many domains. However, this increase performance has come at cost ever-increasing model sizes requiring resources deploy. Machine learning (ML) are used real-world systems, such as robotics, mobile devices, internet things (IoT) that require fast...

10.1109/rtas58335.2023.00036 article EN 2023-05-01

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well tiny ML applications. TPUs offer several improvements and advantages over conventional accelerators, like graphical (GPUs), being designed specifically to perform multiply-accumulate (MAC) operations required matrix-matrix matrix-vector multiplies extensively present throughout execution deep neural networks (DNNs). Such include maximizing reuse...

10.48550/arxiv.2407.08700 preprint EN arXiv (Cornell University) 2024-07-11

In this paper, we develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well binarized synapses. First, it is shown the proposed IMAC can be utilized a multilayer perceptron (MLP) classifier achieving orders of magnitude performance improvement compared previous mixed-signal digital...

10.1109/isvlsi51109.2021.00043 preprint EN 2021-07-01

Convolutional Neural Networks (CNNs) for Artificial Intelligence (AI) algorithms have been widely used in many applications especially image recognition. However, the growth CNN-based recognition raised challenge executing millions of Multiply and Accumulate (MAC) operations state-of-the-art CNNs. Therefore, GPUs, FPGAs, ASICs are feasible solutions balancing processing speed power consumption. In this paper, we propose an efficient hardware architecture CNN that provides high speed, low...

10.1109/isocc50952.2020.9333013 article EN 2020-10-21

Conventional in-memory computing (IMC) architectures consist of analog memristive crossbars to accelerate matrix-vector multiplication (MVM), and digital functional units realize nonlinear vector (NLV) operations in deep neural networks (DNNs). These designs, however, require energy-hungry signal conversion which can dissipate more than 95% the total power system. Fully-analog IMC circuits remove need for converters by realizing both MVM NLV domain leading significant energy savings....

10.1109/jetcas.2022.3222966 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2022-11-17

Fully-analog in-memory computing (IMC) architectures that implement both matrix-vector multiplication and non-linear vector operations within the same memory array have shown promising performance benefits over conventional IMC systems due to removal of energy-hungry signal conversion units. However, maintaining computation in analog domain for entire deep neural network (DNN) comes with potential sensitivity interconnect parasitics. Thus, this paper, we investigate effect wire parasitic...

10.1109/iscas48785.2022.9937884 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28

We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed neuron circuit consumes 1.8-27x less power, occupies 2.5-4931x smaller area, compared to state-of-the-art digital implementations. Moreover, developed can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. architecture-level analyses...

10.1145/3526241.3530376 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2022-06-02

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in neural networks (CNNs). However, they struggle to maintain the same efficiency fully connected (FC) layers, leading suboptimal utilization. In-memory analog computing (IMAC) architectures, on other hand, demonstrated notable speedup FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and...

10.1145/3583781.3590256 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2023-05-31

Area and power consumption are the main challenges in Network on Chip (NoC). Indeed, First Input Output (FIFO) memory is key element NoC. Increasing FIFO depth, produces an increas performance of NoC but at cost area consumption. This paper proposes a new hybrid CMOS-Memristor based architecture that consumes low has small size compared to conventional CMOS-based FIFOs. The predicted approximately equal half wasted implementation controller module implemented using HDL. Moreover,...

10.1109/iscas.2018.8351645 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2018-05-01

With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems machine learning applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist circuit based on various...

10.48550/arxiv.2304.09252 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Tensor processing units (TPUs), specialized hardware accelerators for machine learning tasks, have shown significant performance improvements when executing convolutional layers in neural networks (CNNs). However, they struggle to maintain the same efficiency fully connected (FC) layers, leading suboptimal utilization. In-memory analog computing (IMAC) architectures, on other hand, demonstrated notable speedup FC layers. This paper introduces a novel, heterogeneous, mixed-signal, and...

10.48550/arxiv.2304.09258 preprint EN other-oa arXiv (Cornell University) 2023-01-01

With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems machine learning applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist circuit based on various...

10.1145/3583781.3590264 article EN Proceedings of the Great Lakes Symposium on VLSI 2022 2023-05-31

Fully-analog in-memory computing (IMC) architectures that implement both matrix-vector multiplication and non-linear vector operations within the same memory array have shown promising performance benefits over conventional IMC systems due to removal of energy-hungry signal conversion units. However, maintaining computation in analog domain for entire deep neural network (DNN) comes with potential sensitivity interconnect parasitics. Thus, this paper, we investigate effect wire parasitic...

10.48550/arxiv.2201.12480 preprint EN other-oa arXiv (Cornell University) 2022-01-01

With the increased attention to memristive-based in-memory analog computing (IMAC) architectures as an alternative for energy-hungry computer systems data-intensive applications, a tool that enables exploring their device- and circuit-level design space can significantly boost research development in this area. Thus, paper, we develop IMAC-Sim, simulator exploration multi-objective optimization of IMAC architectures. IMAC-Sim is Python-based simulation framework, which creates SPICE netlist...

10.48550/arxiv.2210.17410 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Conventional in-memory computing (IMC) architectures consist of analog memristive crossbars to accelerate matrix-vector multiplication (MVM), and digital functional units realize nonlinear vector (NLV) operations in deep neural networks (DNNs). These designs, however, require energy-hungry signal conversion which can dissipate more than 95% the total power system. In-Memory Analog Computing (IMAC) circuits, on other hand, remove need for converters by realizing both MVM NLV domain leading...

10.48550/arxiv.2211.00590 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...