Zhongfeng Wang

ORCID: 0000-0002-7227-4786
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Error Correcting Code Techniques
  • Advanced Wireless Communication Techniques
  • Advanced Neural Network Applications
  • Coding theory and cryptography
  • Cooperative Communication and Network Coding
  • CCD and CMOS Imaging Sensors
  • Advanced Memory and Neural Computing
  • Cryptographic Implementations and Security
  • Cryptography and Residue Arithmetic
  • Image Processing Techniques and Applications
  • Advanced Vision and Imaging
  • Cryptography and Data Security
  • Parallel Computing and Optimization Techniques
  • Analog and Mixed-Signal Circuit Design
  • Low-power high-performance VLSI design
  • Neural Networks and Applications
  • Advanced Image and Video Retrieval Techniques
  • Numerical Methods and Algorithms
  • Advanced MIMO Systems Optimization
  • Wireless Communication Networks Research
  • DNA and Biological Computing
  • Digital Filter Design and Implementation
  • Advanced Data Storage Technologies
  • Algorithms and Data Compression
  • Topic Modeling

Nanjing University
2016-2025

Sun Yat-sen University
2023-2025

Chinese Academy of Sciences
2012-2024

Shenyang Institute of Automation
2012-2024

University of California, Santa Barbara
2023-2024

China Electronics Technology Group Corporation
2020-2022

Xijing University
2020

Wanfang Data (China)
2019

Hefei University of Technology
2017

Broadcom (United States)
2008-2016

Abstract Internet of things (IoT) becomes part everyday life across the globe, whose nodes are able to sense, store, and transmit information wirelessly. However, IoT based on von Neumann architectures realize memory, computing communication functions with physical separated devices, which result in severe power consumption computation latency. In this study, a wireless multiferroic memristor consisting Metglas/Pb(Zr 0.3 Ti 0.7 )O 3 ‐1 mol% Mn/Metglas laminate is proposed, integrates...

10.1002/aelm.202200370 article EN Advanced Electronic Materials 2022-07-26

Recently, significant improvement has been achieved for hardware architecture design of deep neural networks (DNNs). However, the implementation one widely used softmax function in DNNs not much investigated, which involves expensive division and exponentiation units. This paper performs an efficient function. Mathematical transformations linear fitting are to simplify this Multiple algorithmic strength reduction strategies fast addition methods employed optimize architecture. By using these...

10.1109/apccas.2018.8605654 article EN 2018-10-01

Convolutional neural network (CNN) is the state-of-the-art deep learning approach employed in various applications. Real-time CNN implementations resource limited embedded systems are becoming highly desired recently. To ensure programmable flexibility and shorten development period, field gate array appropriate to implement models. However, bandwidth on-chip memory storage bottlenecks of acceleration. In this paper, we propose efficient hardware architectures accelerate The theoretical...

10.1109/tcsi.2017.2767204 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2017-11-22

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due the immense parameters and operations of models. To relieve this burden, exploiting sparsity effective approach accelerate Transformers. Newly emerging Ampere graphics processing units (GPUs) leverage a 2:4 pattern achieve model acceleration, while can hardly meet diverse algorithm hardware constraints when deploying By contrast,...

10.1109/tvlsi.2022.3197282 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-08-15

Recurrent neural networks (RNNs) have achieved the state-of-the-art performance on various sequence learning tasks due to their powerful modeling capability. However, RNNs usually require a large number of parameters and high computational complexity. Hence, it is quite challenging implement complex embedded devices with stringent memory latency requirement. In this paper, we first present novel hybrid compression method for widely used RNN variant, long-short term (LSTM), tackle these...

10.1109/tvlsi.2017.2717950 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-07-03

Binary weight convolutional neural networks (BCNNs) can achieve near state-of-the-art classification accuracy and have far less computation complexity compared with traditional CNNs using high-precision weights. Due to their binary weights, BCNNs are well suited for vision-based Internet-of-Things systems being sensitive power consumption. make it possible very high throughput moderate dissipation. In this paper, an energy-efficient architecture is proposed. It fully exploits the weights...

10.1109/tvlsi.2017.2767624 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-11-10

The softmax function has been widely used in deep neural networks (DNNs), and studies on efficient hardware accelerators for DNN have also attracted tremendous attention. However, it is very challenging to design architectures because of the expensive exponentiation division calculations it. In this brief, firstly simplified by exploring algorithmic strength reductions. Afterwards, a hardware-friendly precision-adjustable calculation method proposed, which can meet different precision...

10.1109/tcsii.2020.3002564 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2020-06-16

Designing hardware accelerators for deep neural networks (DNNs) has been much desired. Nonetheless, most of these existing are built either convolutional (CNNs) or recurrent (RNNs). Recently, the Transformer model is replacing RNN in natural language processing (NLP) area. However, because intensive matrix computations and complicated data flow being involved, design never reported. In this paper, we propose first accelerator two key components, i.e., multi-head attention (MHA) ResBlock...

10.1109/socc49529.2020.9524802 article EN 2020-09-08

Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition machine translation. Significant accuracy improvements can be achieved using complex LSTM model with a large memory requirement high computational complexity, which is time-consuming energy demanding. The low-latency energy-efficiency requirements of the real-world applications make compression hardware acceleration for an urgent need. In this paper, several...

10.1109/jetcas.2019.2911739 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2019-04-17

The training of Deep Neural Networks (DNNs) brings enormous memory requirements and computational complexity, which makes it a challenge to train DNN models on resource-constrained devices. Training DNNs with reduced-precision data representation is crucial mitigate this problem. In article, we conduct thorough investigation low-bit posit numbers, Type-III universal number (Unum). Through comprehensive analysis quantization various formats, demonstrated that the format shows great potential...

10.1109/tc.2020.2985971 article EN IEEE Transactions on Computers 2020-04-14

To enable efficient deployment of convolutional neural networks (CNNs) on embedded platforms for different computer vision applications, several convolution variants have been introduced, such as depthwise (DWCV), transposed (TPCV), and dilated (DLCV). address the utilization degradation issue occurred in a general engine these emerging operators, highly flexible reconfigurable hardware accelerator is proposed to efficiently support various CNN-based tasks. Firstly, avoid workload imbalance...

10.1109/tcsi.2021.3131581 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-12-07

Vision Transformer (ViT) has emerged as a competitive alternative to convolutional neural networks for various computer vision applications. Specifically, ViTs' multi-head attention layers make it possible embed information globally across the overall image. Nevertheless, computing and storing such matrices incurs quadratic cost dependency on number of patches, limiting its achievable efficiency scalability prohibiting more extensive real-world ViT applications resource-constrained devices....

10.1109/hpca56546.2023.10071081 article EN 2023-02-01

Prior research efforts have been focusing on using BCH codes for error correction in multi-level cell (MLC) NAND flash memory. However, often require highly parallel implementations to meet the throughput requirement. As a result, large area is needed. In this paper, we propose use Reed-Solomon (RS) MLC A (828, 820) RS code has almost same rate and length terms of bits as (8248, 8192) code. Moreover, it at least error-correcting performance memory applications. Nevertheless, with 70% area,...

10.1109/sips.2008.4671744 article EN 2008-10-01

Power consumption is a major bottleneck of system performance and listed as one the top three challenges in International Technology Roadmap for Semiconductor 2008. In practice, large portion on chip power consumed by clock which made distribution network flop-flops. this paper, various design techniques low clocking are surveyed. Among them an effective way to reduce capacity load minimizing number clocked transistors. To approach this, we propose novel pair shared flip-flop reduces local...

10.1109/tvlsi.2009.2038705 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2010-01-22

In the era of artificial intelligence (AI), deep neural networks (DNNs) have emerged as most important and powerful AI technique. However, large DNN models are both storage computation intensive, posing significant challenges for adopting DNNs in resource-constrained scenarios. Thus, model compression becomes a crucial technique to ensure wide deployment DNNs.

10.1145/3307650.3322258 article EN 2019-06-14

This paper proposes a generalized hyperbolic COordinate Rotation Digital Computer (GH CORDIC) to directly compute logarithms and exponentials with an arbitrary fixed base. In hardware implementation, it is more efficient than the state of art which requires both CORDIC constant multiplier. More specifically, we develop theory GH by adding new parameter called base conventional CORDIC. can be used specify respect computation exponentials. As result, multiplier no longer needed convert e...

10.1109/tvlsi.2019.2919557 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019-06-18

Designing hardware accelerators for convolutional neural networks (CNNs) has recently attracted tremendous attention. Plenty of existing are built dense CNNs or structured sparse CNNs. By contrast, unstructured can achieve higher compression ratio with equivalent accuracy. However, their corresponding implementations generally suffer from load imbalance and conflict access to on-chip buffers, which results in under utilization processing elements (PEs). To tackle these issues, we propose a...

10.1109/tcsi.2021.3074300 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2021-05-20

Deformable convolutional networks (DCNs) have shown outstanding potential in video super-resolution with their powerful inter-frame feature alignment. However, deploying DCNs on resource-limited devices is challenging, due to high computational complexity and irregular memory accesses. In this work, an algorithm-hardware co-optimization framework proposed accelerate the field-programmable gate array (FPGA). Firstly, at algorithm level, anchor-based lightweight deformable network (ALDNet)...

10.1109/tcsi.2023.3258446 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2023-03-23

Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts energy, memory, and computing resources. However, many devices struggle boost inference throughput various DNNs due the varying quantization levels, these lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle challenges...

10.1109/asp-dac58780.2024.10473817 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

Swin Transformer achieves greater efficiency than Vision by utilizing local self-attention and shifted windows. However, existing hardware accelerators designed for have not been optimized the unique computation flow data reuse property in Transformer, resulting lower utilization extra memory accesses. To address this issue, we develop SWAT, an efficient Accelerator based on FPGA. Firstly, to eliminate redundant computations windows, a novel tiling strategy is employed, which helps developed...

10.1109/asp-dac58780.2024.10473931 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2024-01-22

All-in-one image restoration (IR) recovers images from various unknown distortions by a single model, such as rain, haze, and blur. Transformer-based IR methods have significantly improved the visual effects of restored images. However, deploying complex models on edge devices is challenging due to massive parameters intensive computations. Moreover, existing accelerators are typically customized for task, resulting in severe resource underutilization when executing multiple tasks....

10.1109/tcsi.2024.3519532 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01

10.1109/tvlsi.2024.3525184 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2025-01-01

Recently, large models, such as Vision Transformer and BERT, have garnered significant attention due to their exceptional performance. However, extensive computational requirements lead considerable power hardware resource consumption. Brain-inspired computing, characterized by its spike-driven methods, has emerged a promising approach for low-power implementation. In this paper, we propose an efficient sparse accelerator Spike-driven Transformer. We first design novel encoding method that...

10.48550/arxiv.2501.07825 preprint EN arXiv (Cornell University) 2025-01-13

10.1109/tcsi.2025.3527541 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01
Coming Soon ...