- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Semiconductor materials and devices
- Advanced Neural Network Applications
- Parallel Computing and Optimization Techniques
- CCD and CMOS Imaging Sensors
- Advancements in Semiconductor Devices and Circuit Design
- Low-power high-performance VLSI design
- Neural Networks and Reservoir Computing
- Photonic and Optical Devices
- Neuroscience and Neural Engineering
- Analytic Number Theory Research
- Tensor decomposition and applications
- Advanced Mathematical Identities
- Real-Time Systems Scheduling
- Analog and Mixed-Signal Circuit Design
- Power Systems and Technologies
- Electronic and Structural Properties of Oxides
- Advanced Computational Techniques and Applications
- Proteins in Food Systems
- Advanced Combinatorial Mathematics
- Advancements in PLL and VCO Technologies
- Machine Learning in Materials Science
- Radio Frequency Integrated Circuit Design
- Transition Metal Oxide Nanomaterials
Southeast University
2021-2025
The Synergetic Innovation Center for Advanced Materials
2024
Ningxia University
2022
University of Electronic Science and Technology of China
2019-2022
National Tsing Hua University
2018-2022
Computation-in-memory (CIM) is a promising avenue to improve the energy efficiency of multiply-and-accumulate (MAC) operations in AI chips. Multi-bit CNNs are required for high-inference accuracy many applications [1–5]. There challenges and tradeoffs SRAM-based CIM: (1) between signal margin, cell stability area overhead; (2) high-weighted bit process variation dominates end-result error rate; (3) trade-off input bandwidth, speed area. Previous SRAM CIM macros were limited binary MAC fully...
For deep-neural-network (DNN) processors [1-4], the product-sum (PS) operation predominates computational workload for both convolution (CNVL) and fully-connect (FCNL) neural-network (NN) layers. This hinders adoption of DNN to on edge artificial-intelligence (AI) devices, which require low-power, low-cost fast inference. Binary DNNs [5-6] are used reduce computation hardware costs AI devices; however, a memory bottleneck still remains. In Fig. 31.5.1 conventional PE arrays exploit...
Advanced AI edge chips require multibit input (IN), weight (W), and output (OUT) for CNN multiply-and-accumulate (MAC) operations to achieve an inference accuracy that is sufficient practical applications. Computing-in-memory (CIM) attractive approach improve the energy efficiency <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$(\mathrm{EF}_{\mathrm{MAC}}]$</tex> of MAC under a memory-wall constraint. Previous SRAM-CIM macros demonstrated...
Computation-in-memory (CIM) is a promising candidate to improve the energy efficiency of multiply-and-accumulate (MAC) operations artificial intelligence (AI) chips. This work presents an static random access memory (SRAM) CIM unit-macro using: 1) compact-rule compatible twin-8T (T8T) cells for weighted MAC reduce area overhead and vulnerability process variation; 2) even–odd dual-channel (EODC) input mapping scheme extend bandwidth; 3) two's complement weight (C2WM) enable using positive...
Many Al edge devices require local intelligence to achieve fast computing time (t <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AC</inf> ), high energy efficiency (EF), and privacy. The transfer-learning approach is a popular solution for chips, wherein data used re-train the in cloud fine-tune (re-train) few of neural layers devices. This enables dynamic incorporation from in-situ environments or private information. Computing-in-memory (CIM)...
Recent SRAM-based computation-in-memory (CIM) macros enable mid-to-high precision multiply-and-accumulate (MAC) operations with improved energy efficiency using ultra-small/small capacity (0.4-8KB) memory devices. However, advanced CIM-based edge-AI chips favor multiple mid/large SRAM-CIM macros: high input (IN) and weight (W) to reduce the frequency of data reloads from external DRAM, avoid need for additional SRAM buffers or ultra-large on-chip buffers. enlarging throughput increases delay...
Abstract Non-volatile computing-in-memory macros that are based on two-dimensional arrays of memristors use in the development artificial intelligence edge devices. Scaling such systems to three-dimensional could provide higher parallelism, capacity and density for necessary vector–matrix multiplication operations. However, scaling three dimensions is challenging due manufacturing device variability issues. Here we report a two-kilobit non-volatile macro vertical resistive random-access...
Computing-in-memory (CIM) is a promising approach to reduce the latency and improve energy efficiency of deep neural network (DNN) artificial intelligence (AI) edge processors. However, SRAM-based CIM (SRAM-CIM) faces practical challenges in terms area overhead, performance, efficiency, yield against variations data patterns transistor performance. This paper employed circuit-system co-design methodology develop SRAM-CIM unit-macro for binary-based fully connected (FCNN) layer DNN AI The...
Computing-in-Memory (CIM) is a promising solution for energy-efficient neural network (NN) processors. Previous CIM chips [1], [4] mainly focus on the memory macro itself, lacking insight overall system integration. Recently, CIM-based processor [5] speech recognition demonstrated energy efficiency. No prior work systematically explores sparsity optimization processor. Directly mapping sparse NN models onto regular macros ineffective, since data usually randomly distributed and cannot be...
This article presents a computing-in-memory (CIM) structure aimed at improving the energy efficiency of edge devices running multi-bit multiply-and-accumulate (MAC) operations. The proposed scheme includes 6T SRAM-based CIM (SRAM-CIM) macro capable of: 1) weight-bitwise MAC (WbwMAC) operations to expand sensing margin and improve readout accuracy for high-precision operations; 2) compact local computing cell perform multiplication with suppressed sensitivity process variation; 3) an...
Previous SRAM-based computing-in-memory (SRAM-CIM) macros suffer small read margins for high-precision operations, large cell array area overhead, and limited compatibility with many input weight configurations. This work presents a 1-to-8-bit configurable SRAM CIM unit-macro using: 1) hybrid structure combining 6T-SRAM based in-memory binary product-sum (PS) operations digital near-memory-computing multibit PS accumulation to increase accuracy reduce overhead; 2) column-based...
SRAM-based computing-in-memory (SRAM-CIM) has been intensively studied and developed to improve the energy area efficiency of AI devices. SRAM-CIMs have effectively implemented high integer (INT) precision multiply-and-accumulate (MAC) operations inference accuracy various image classification tasks [1]–[3],[5],[6]. To realize more complex tasks, such as detection segmentation, support on-chip training for better accuracy, floating-point MAC (FP-MAC) with high-energy are required. However,...
Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with significant reduction computation and memory cost. In this paper, we explore two BNNs: hybrid BNN (HBNN) XNOR-BNN, where the weights binarized +1/-1 while neuron activations 1/0 respectively. Two SRAM bit cell designs proposed, namely, 6T for HBNN customized 8T XNOR-BNN. our design, high-precision multiply-and-accumulate (MAC) is...
This paper presents a half-select disturb-free 11T static random access memory (SRAM) cell for ultralow-voltage operations. The proposed SRAM is well suited bit-interleaving architecture, which helps to improve the soft-error immunity with error correction coding. read noise margin (RSNM) and write (WM) are significantly improved due its built-in write/read-assist scheme. experimental results in 40-nm standard CMOS technology indicate that at 0.5-V supply voltage, RSNM of <inline-formula...
Computing-in-memory (CIM) is a promising architecture for energy-efficient neural network (NN) processors. Several CIM macros have demonstrated high energy efficiency, while CIM-based system-on-a-chip not well explored. This work presents NN processor, named STICKER-IM, which implemented with sophisticated system integration. Three key innovations are proposed. First, CIM-friendly block-wise sparsity (BWS) designed, enabling both activation-sparsity-aware acceleration and...
SRAM-based computation-in-memory (CIM) has shown great potential in improving the energy efficiency of edge-AI devices. Most CIM work [3–4] is targeted at MAC operations with a higher input (IN), weight (W) and output (OUT) precision, which suitable for standard-convolution layers fully-connected layers. Edge-AI neural networks tradeoff inference accuracy network parameters. Depthwise (DW) convolution support essential many light-CNN models, such as MobileNet-V2. However, when applying...
Computing-in-memory (CIM) based on SRAM is a promising approach to achieving energy-efficient multiply-and-accumulate (MAC) operations in artificial intelligence (AI) edge devices; however, existing SRAM-CIM chips support only DNN inference. The flow of training data requires that CIM arrays perform convolutional computation using transposed weight matrices. This article presents two-way transpose (TWT) multiply cell with high resistance process variation and novel read scheme uses...
Computing-in-memory (CIM) improves energy efficiency by enabling parallel multiply-and-accumulate (MAC) operations and reducing memory accesses [1-4]. However, today's typical neural networks (NNs) usually exceed on-chip capacity. Thus, a CIM-based processor may encounter bottleneck [5]. Tensor-train (TT) is tensor decomposition method, which decomposes d-dimensional to d 4D tensor-cores (TCs: G <sub xmlns:mml="http://www.w3.org/1998/Math/MathML"...
Resistive memory (RRAM) provides an ideal platform to develop embedded non-volatile computing-in-memory (nvCIM). However, it faces several critical challenges ranging from device non-idealities, large DC currents, and small signal margins. To address these issues, we propose voltage-division (VD) based computing approach its circuit implementation in two-transistor-two-resistor (2T2R) RRAM cell arrays, which can realize energy-efficient, sign-aware, robust deep neural network (DNN)...
Advances in static random access memory (SRAM)-CIM devices are meant to increase capacity while improving energy efficiency (EF) and reducing computing latency ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$T_{\mathrm {AC}}$ </tex-math></inline-formula> ). This work presents a novel SRAM-CIM structure using: 1) segmented-bitline charge-sharing (SBCS) scheme for multiply-and-accumulate (MAC) operations...