NFDI4DS | UHH-SEMS - Publication Details

Kai Li

ORCID: 0000-0003-3251-931X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5090865912

Research Areas

Advanced Memory and Neural Computing
Advanced Neural Network Applications
Ferroelectric and Negative Capacitance Devices
Numerical Methods and Algorithms
Parallel Computing and Optimization Techniques
Low-power high-performance VLSI design
Neural Networks and Applications
CCD and CMOS Imaging Sensors
Advanced Data Storage Technologies
Advancements in Semiconductor Devices and Circuit Design
VLSI and Analog Circuit Testing
Topic Modeling
Time Series Analysis and Forecasting
Integrated Circuits and Semiconductor Failure Analysis
Anomaly Detection Techniques and Applications
Machine Learning and ELM

Southern University of Science and Technology
2022-2025

A High Performance Multi-Bit-Width Booth Vector Systolic Accelerator for NAS Optimized Deep Learning Neural Networks

OPENALEX - Publications

Mingqiang Huang Yucen Liu Changhai Man Kai Li Quan Cheng and 2 more

Multi-bit-width convolutional neural network (CNN) maintains the balance between accuracy and hardware efficiency, thus enlightening a promising method for accurate yet energy-efficient edge computing. In this work, we develop state-of-the-art multi-bit-width accelerator NAS Optimized deep learning networks. To efficiently process inferencing, multi-level optimizations have been proposed. Firstly, differential Neural Architecture Search (NAS) is adopted high generation. Secondly, hybrid...

10.1109/tcsi.2022.3178474 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2022-06-10

EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models

OPENALEX - Publications

Mingqiang Huang Ao Shen Kai Li Haoxiang Peng Boyu Li and 2 more

10.1109/tcsi.2025.3546256 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01

emGene: An Embodied LLM NGS Sequencer for Real-time Precision Diagnostics

OPENALEX - Publications

Shaobo Luo Albert Yu Zhiyuan Xie Hong Huang Mingqiang Huang and 17 more

10.23919/ics.2025.3552542 article EN cc-by Deleted Journal 2025-01-01

A 29.12-TOPS/W Vector Systolic Accelerator With NAS-Optimized DNNs in 28-nm CMOS

OPENALEX - Publications

Kai Li Mingqiang Huang A. M. Li Shuxin Yang Quan Cheng and 1 more

10.1109/jssc.2025.3558287 article EN IEEE Journal of Solid-State Circuits 2025-01-01

A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing

OPENALEX - Publications

Kai Li Wei Mao Junzhuo Zhou Boyu Li Zhengke Yang and 4 more

There is an emerging need to design multi-precision floating-point (FP) accelerators for high-performance-computing (HPC) applications. The commonly-used methods are based on high-precision-split (HPS) and low-precision-combination (LPC) structures, which suffer from low hardware utilization ratio various multiple clock-cycle processing periods. In this brief, a new FP element (PE) developed with proposed bit-partitioning method. Minimized redundant bits operands achieved. PE supports...

10.1109/tcsii.2022.3183007 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2022-06-14

An Energy-Efficient Mixed-Bitwidth Systolic Accelerator for NAS-Optimized Deep Neural Networks

OPENALEX - Publications

Wei Mao Liuyao Dai Kai Li Quan Cheng Yuhang Wang and 4 more

Optimized deep neural network (DNN) models and energy-efficient hardware designs are of great importance in edge-computing applications. The architecture search (NAS) methods employed for DNN model optimization with mixed-bitwidth networks. To satisfy the computation requirements, convolution accelerators highly desired low-power high-throughput performance. There exist several to support multiply-accumulate (MAC) operations accelerator designs. low-bitwidth-combination (LBC) method improves...

10.1109/tvlsi.2022.3210069 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-11-03

Multi-bit-width CNN Accelerator with Systolic-in-Systolic Dataflow and Single DSP Multiple Multiplication Scheme

OPENALEX - Publications

Mingqiang Huang Yucen Liu Sixiao Huang Kai Li Qiuping Wu and 1 more

Multi-bit-width neural network enlightens a promising method for high performance yet energy efficient edge computing due to its balance between software algorithm accuracy and hardware efficiency. To date, FPGA has been one of the core platforms deploying various networks. However, it is still difficult fully make use dedicated digital signal processing (DSP) blocks in accelerating multi-bit-width network. In this work, we develop state-of-the-art convolutional accelerator with novel...

10.1145/3543622.3573209 article EN 2023-02-10

RISC-V based Fully-Parallel SRAM Computing-in-Memory Accelerator with High Hardware Utilization and Data Reuse Rate

OPENALEX - Publications

Haoxiang Zhou Haiqiao Hong Dingbang Liu Hang Liu Yu Xia and 5 more

Computing-In-memory (CIM) accelerators have the characteristics of storage and computing integration, which can effectively improve efficiency convolutional neural network (CNN). To throughput computational energy while maintaining accuracy, this paper proposes an SRAM CIM accelerator with capacitor-coupling method. Charge-domain based accumulation scheme reduce impact multiplication (MAC) unit variations, makes it possible to increase in a fully-parallel manner. Furthermore, array size...

10.1109/aicas57966.2023.10168630 article EN 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS) 2023-06-11

A Low-Power DNN Accelerator With Mean-Error-Minimized Approximate Signed Multiplier

OPENALEX - Publications

Laimin Du Leibin Ni Xiong Liu Guanqi Peng Kai Li and 2 more

Approximate computing is an emerging and effective method for reducing energy consumption in digital circuits, which critical energy-efficient performance improvement of edge-computing devices. In this paper, we propose a low-power DNN accelerator with novel signed approximate multiplier based on probability-optimized compressor error compensation. The customized partial product matrix (PPM) operands, gets the optimal logic circuit after probabilistic analysis optimization. At same time,...

10.1109/ojcas.2023.3279251 article EN cc-by IEEE Open Journal of Circuits and Systems 2024-01-01

A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

OPENALEX - Publications

Boyu Li Kai Li Jiajun Zhou Yuan Ren Wei Mao and 2 more

High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating- fixed-point designs, but most only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point multiply-accumulate (MAC) operations. PE support <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">...

10.1109/tcsii.2023.3322259 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2023-10-05

Coming Soon ...