NFDI4DS | UHH-SEMS - Publication Details

Zhongfeng Wang

ORCID: 0000-0002-7227-4786

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100696999

Research Areas

Error Correcting Code Techniques
Advanced Wireless Communication Techniques
Advanced Neural Network Applications
Coding theory and cryptography
Cooperative Communication and Network Coding
CCD and CMOS Imaging Sensors
Advanced Memory and Neural Computing
Cryptographic Implementations and Security
Cryptography and Residue Arithmetic
Image Processing Techniques and Applications
Advanced Vision and Imaging
Cryptography and Data Security
Parallel Computing and Optimization Techniques
Analog and Mixed-Signal Circuit Design
Low-power high-performance VLSI design
Neural Networks and Applications
Advanced Image and Video Retrieval Techniques
Numerical Methods and Algorithms
Advanced MIMO Systems Optimization
Wireless Communication Networks Research
DNA and Biological Computing
Digital Filter Design and Implementation
Advanced Data Storage Technologies
Algorithms and Data Compression
Topic Modeling

Nanjing University
2016-2025

Sun Yat-sen University
2023-2025

Chinese Academy of Sciences
2012-2024

Shenyang Institute of Automation
2012-2024

University of California, Santa Barbara
2023-2024

China Electronics Technology Group Corporation
2020-2022

Xijing University
2020

Wanfang Data (China)
2019

Hefei University of Technology
2017

Broadcom (United States)
2008-2016

Wireless Multiferroic Memristor with Coupled Giant Impedance and Artificial Synapse Application

OPENALEX - Publications

Yao Wang Rui Xiao Ning Xiao Zhongfeng Wang Lei Chen and 2 more

Abstract Internet of things (IoT) becomes part everyday life across the globe, whose nodes are able to sense, store, and transmit information wirelessly. However, IoT based on von Neumann architectures realize memory, computing communication functions with physical separated devices, which result in severe power consumption computation latency. In this study, a wireless multiferroic memristor consisting Metglas/Pb(Zr 0.3 Ti 0.7 )O 3 ‐1 mol% Mn/Metglas laminate is proposed, integrates...

10.1002/aelm.202200370 article EN Advanced Electronic Materials 2022-07-26

A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning

OPENALEX - Publications

Meiqi Wang Siyuan Lu Danyang Zhu Jun Lin Zhongfeng Wang

Recently, significant improvement has been achieved for hardware architecture design of deep neural networks (DNNs). However, the implementation one widely used softmax function in DNNs not much investigated, which involves expensive division and exponentiation units. This paper performs an efficient function. Mathematical transformations linear fitting are to simplify this Multiple algorithmic strength reduction strategies fast addition methods employed optimize architecture. By using these...

10.1109/apccas.2018.8605654 article EN 2018-10-01

Efficient Hardware Architectures for Deep Convolutional Neural Network

OPENALEX - Publications

Jichen Wang Jun Lin Zhongfeng Wang

Convolutional neural network (CNN) is the state-of-the-art deep learning approach employed in various applications. Real-time CNN implementations resource limited embedded systems are becoming highly desired recently. To ensure programmable flexibility and shorten development period, field gate array appropriate to implement models. However, bandwidth on-chip memory storage bottlenecks of acceleration. In this paper, we propose efficient hardware architectures accelerate The theoretical...

10.1109/tcsi.2017.2767204 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2017-11-22

An Algorithm–Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

OPENALEX - Publications

Chao Fang Aojun Zhou Zhongfeng Wang

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due the immense parameters and operations of models. To relieve this burden, exploiting sparsity effective approach accelerate Transformers. Newly emerging Ampere graphics processing units (GPUs) leverage a 2:4 pattern achieve model acceleration, while can hardly meet diverse algorithm hardware constraints when deploying By contrast,...

10.1109/tvlsi.2022.3197282 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-08-15

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

OPENALEX - Publications

Zhisheng Wang Jun Lin Zhongfeng Wang

Recurrent neural networks (RNNs) have achieved the state-of-the-art performance on various sequence learning tasks due to their powerful modeling capability. However, RNNs usually require a large number of parameters and high computational complexity. Hence, it is quite challenging implement complex embedded devices with stringent memory latency requirement. In this paper, we first present novel hybrid compression method for widely used RNN variant, long-short term (LSTM), tackle these...

10.1109/tvlsi.2017.2717950 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-07-03

An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks

OPENALEX - Publications

Yizhi Wang Jun Lin Zhongfeng Wang

Binary weight convolutional neural networks (BCNNs) can achieve near state-of-the-art classification accuracy and have far less computation complexity compared with traditional CNNs using high-precision weights. Due to their binary weights, BCNNs are well suited for vision-based Internet-of-Things systems being sensitive power consumption. make it possible very high throughput moderate dissipation. In this paper, an energy-efficient architecture is proposed. It fully exploits the weights...

10.1109/tvlsi.2017.2767624 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-11-10

Efficient Precision-Adjustable Architecture for Softmax Function in Deep Learning

OPENALEX - Publications

Danyang Zhu Siyuan Lu Meiqi Wang Jun Lin Zhongfeng Wang

The softmax function has been widely used in deep neural networks (DNNs), and studies on efficient hardware accelerators for DNN have also attracted tremendous attention. However, it is very challenging to design architectures because of the expensive exponentiation division calculations it. In this brief, firstly simplified by exploring algorithmic strength reductions. Afterwards, a hardware-friendly precision-adjustable calculation method proposed, which can meet different precision...

10.1109/tcsii.2020.3002564 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2020-06-16

Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer

OPENALEX - Publications

Siyuan Lu Meiqi Wang Shuang Liang Jun Lin Zhongfeng Wang

Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing are built either convolutional (CNNs) or recurrent (RNNs). Recently, the Transformer model is replacing RNN in natural language processing (NLP) area. However, because intensive matrix computations and complicated data flow being involved, design never reported. In this paper, we propose first accelerator two key components, i.e., multi-head attention (MHA) ResBlock...

10.1109/socc49529.2020.9524802 article EN 2020-09-08

E-LSTM: An Efficient Hardware Architecture for Long Short-Term Memory

OPENALEX - Publications

Meiqi Wang Zhisheng Wang Jinming Lu Jun Lin Zhongfeng Wang

Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition machine translation. Significant accuracy improvements can be achieved using complex LSTM model with a large memory requirement high computational complexity, which is time-consuming energy demanding. The low-latency energy-efficiency requirements of the real-world applications make compression hardware acceleration for an urgent need. In this paper, several...

10.1109/jetcas.2019.2911739 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2019-04-17

Evaluations on Deep Neural Networks Training Using Posit Number System

OPENALEX - Publications

Jinming Lu Chao Fang Mingyang Xu Jun Lin Zhongfeng Wang

The training of Deep Neural Networks (DNNs) brings enormous memory requirements and computational complexity, which makes it a challenge to train DNN models on resource-constrained devices. Training DNNs with reduced-precision data representation is crucial mitigate this problem. In article, we conduct thorough investigation low-bit posit numbers, Type-III universal number (Unum). Through comprehensive analysis quantization various formats, demonstrated that the format shows great potential...

10.1109/tc.2020.2985971 article EN IEEE Transactions on Computers 2020-04-14

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs

OPENALEX - Publications

Xiao Wu Yufei Ma Meiqi Wang Zhongfeng Wang

To enable efficient deployment of convolutional neural networks (CNNs) on embedded platforms for different computer vision applications, several convolution variants have been introduced, such as depthwise (DWCV), transposed (TPCV), and dilated (DLCV). address the utilization degradation issue occurred in a general engine these emerging operators, highly flexible reconfigurable hardware accelerator is proposed to efficiently support various CNN-based tasks. Firstly, avoid workload imbalance...

10.1109/tcsi.2021.3131581 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-12-07

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

OPENALEX - Publications

Jyotikrishna Dass Shang Wu Huihong Shi Chaojian Li Zhifan Ye and 2 more

Vision Transformer (ViT) has emerged as a competitive alternative to convolutional neural networks for various computer vision applications. Specifically, ViTs' multi-head attention layers make it possible embed information globally across the overall image. Nevertheless, computing and storing such matrices incurs quadratic cost dependency on number of patches, limiting its achievable efficiency scalability prohibiting more extensive real-world ViT applications resource-constrained devices....

10.1109/hpca56546.2023.10071081 article EN 2023-02-01

Error correction for multi-level NAND flash memory using Reed-Solomon codes

OPENALEX - Publications

Bainan Chen Xinmiao Zhang Zhongfeng Wang

Prior research efforts have been focusing on using BCH codes for error correction in multi-level cell (MLC) NAND flash memory. However, often require highly parallel implementations to meet the throughput requirement. As a result, large area is needed. In this paper, we propose use Reed-Solomon (RS) MLC A (828, 820) RS code has almost same rate and length terms of bits as (8248, 8192) code. Moreover, it at least error-correcting performance memory applications. Nevertheless, with 70% area,...

10.1109/sips.2008.4671744 article EN 2008-10-01

Design of Sequential Elements for Low Power Clocking System

OPENALEX - Publications

Peiyi Zhao Jason McNeely Weidong Kuang Nan Wang Zhongfeng Wang

Power consumption is a major bottleneck of system performance and listed as one the top three challenges in International Technology Roadmap for Semiconductor 2008. In practice, large portion on chip power consumed by clock which made distribution network flop-flops. this paper, various design techniques low clocking are surveyed. Among them an effective way to reduce capacity load minimizing number clocked transistors. To approach this, we propose novel pair shared flip-flop reduces local...

10.1109/tvlsi.2009.2038705 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2010-01-22

TIE

OPENALEX - Publications

Chunhua Deng Fangxuan Sun Xuehai Qian Jun Lin Zhongfeng Wang and 1 more

In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as most important and powerful AI technique. However, large DNN models are both storage computation intensive, posing significant challenges for adopting DNNs in resource-constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment DNNs.

10.1145/3307650.3322258 article EN 2019-06-14

Generalized Hyperbolic CORDIC and Its Logarithmic and Exponential Computation With Arbitrary Fixed Base

OPENALEX - Publications

Yuanyong Luo Yuxuan Wang Yajun Ha Zhongfeng Wang Siyuan Chen and 1 more

This paper proposes a generalized hyperbolic COordinate Rotation Digital Computer (GH CORDIC) to directly compute logarithms and exponentials with an arbitrary fixed base. In hardware implementation, it is more efficient than the state of art which requires both CORDIC constant multiplier. More specifically, we develop theory GH by adding new parameter called base conventional CORDIC. can be used specify respect computation exponentials. As result, multiplier no longer needed convert e...

10.1109/tvlsi.2019.2919557 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019-06-18

An Efficient and Flexible Accelerator Design for Sparse Convolutional Neural Networks

OPENALEX - Publications

Xiaoru Xie Jun Lin Zhongfeng Wang Jinghe Wei

Designing hardware accelerators for convolutional neural networks (CNNs) has recently attracted tremendous attention. Plenty of existing are built dense CNNs or structured sparse CNNs. By contrast, unstructured can achieve higher compression ratio with equivalent accuracy. However, their corresponding implementations generally suffer from load imbalance and conflict access to on-chip buffers, which results in under utilization processing elements (PEs). To tackle these issues, we propose a...

10.1109/tcsi.2021.3074300 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-05-20

An Efficient Accelerator Based on Lightweight Deformable 3D-CNN for Video Super-Resolution

OPENALEX - Publications

Siyu Zhang Wendong Mao Zhongfeng Wang

Deformable convolutional networks (DCNs) have shown outstanding potential in video super-resolution with their powerful inter-frame feature alignment. However, deploying DCNs on resource-limited devices is challenging, due to high computational complexity and irregular memory accesses. In this work, an algorithm-hardware co-optimization framework proposed accelerate the field-programmable gate array (FPGA). Firstly, at algorithm level, anchor-based lightweight deformable network (ALDNet)...

10.1109/tcsi.2023.3258446 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2023-03-23

A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme Edge

OPENALEX - Publications

Longwei Huang Chao Fang Qiong Li Jun Lin Zhongfeng Wang

Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts energy, memory, and computing resources. However, many devices struggle boost inference throughput various DNNs due the varying quantization levels, these lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle challenges...

10.1109/asp-dac58780.2024.10473817 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

SWAT: An Efficient Swin Transformer Accelerator Based on FPGA

OPENALEX - Publications

Qiwei Dong Xiaoru Xie Zhongfeng Wang

Swin Transformer achieves greater efficiency than Vision by utilizing local self-attention and shifted windows. However, existing hardware accelerators designed for have not been optimized the unique computation flow data reuse property in Transformer, resulting lower utilization extra memory accesses. To address this issue, we develop SWAT, an efficient Accelerator based on FPGA. Firstly, to eliminate redundant computations windows, a novel tiling strategy is employed, which helps developed...

10.1109/asp-dac58780.2024.10473931 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

A Unified Accelerator for All-in-One Image Restoration Based on Prompt Degradation Learning

OPENALEX - Publications

Siyu Zhang Qiwei Dong Wendong Mao Zhongfeng Wang

All-in-one image restoration (IR) recovers images from various unknown distortions by a single model, such as rain, haze, and blur. Transformer-based IR methods have significantly improved the visual effects of restored images. However, deploying complex models on edge devices is challenging due to massive parameters intensive computations. Moreover, existing accelerators are typically customized for task, resulting in severe resource underutilization when executing multiple tasks....

10.1109/tcsi.2024.3519532 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01

M2-ViT: Accelerating Hybrid Vision Transformers With Two-Level Mixed Quantization

OPENALEX - Publications

Yichao Liang Huihong Shi Zhongfeng Wang

10.1109/tvlsi.2024.3525184 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2025-01-01

An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer

OPENALEX - Publications

Zhengke Li Wendong Mao Siyu Zhang Qiwei Dong Zhongfeng Wang

Recently, large models, such as Vision Transformer and BERT, have garnered significant attention due to their exceptional performance. However, extensive computational requirements lead considerable power hardware resource consumption. Brain-inspired computing, characterized by its spike-driven methods, has emerged a promising approach for low-power implementation. In this paper, we propose an efficient sparse accelerator Spike-driven Transformer. We first design novel encoding method that...

10.48550/arxiv.2501.07825 preprint EN arXiv (Cornell University) 2025-01-13

An Efficient Window-Based Vision Transformer Accelerator via Mixed-Granularity Sparsity

OPENALEX - Publications

Qiwei Dong Siyu Zhang Zhongfeng Wang

10.1109/tcsi.2025.3527541 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01

Fast Hardware Architecture With Efficient Matrix Computations for the Key Generation of Classic McEliece

OPENALEX - Publications

H. C. Zhang Xinyuan Qiao Jing Tian Suwen Song Zhongfeng Wang

10.1109/tcsi.2025.3528119 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01

Coming Soon ...