Peng Cao

ORCID: 0000-0003-2039-9031
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Embedded Systems Design Techniques
  • Low-power high-performance VLSI design
  • Interconnection Networks and Systems
  • VLSI and FPGA Design Techniques
  • Parallel Computing and Optimization Techniques
  • Advancements in Semiconductor Devices and Circuit Design
  • VLSI and Analog Circuit Testing
  • Advanced Data Compression Techniques
  • Video Coding and Compression Technologies
  • Wireless Communication Networks Research
  • CCD and CMOS Imaging Sensors
  • Image and Signal Denoising Methods
  • Semiconductor materials and devices
  • Cryptographic Implementations and Security
  • IPv6, Mobility, Handover, Networks, Security
  • Advanced Algorithms and Applications
  • Advanced Memory and Neural Computing
  • Advanced Computational Techniques and Applications
  • Additive Manufacturing Materials and Processes
  • Network Packet Processing and Optimization
  • Advancements in PLL and VCO Technologies
  • Analog and Mixed-Signal Circuit Design
  • Adsorption and Cooling Systems
  • Advanced Vision and Imaging
  • Network Traffic and Congestion Control

Southeast University
2015-2025

South China Normal University
2010-2025

Hunan University
2024

Fuzhou University
2014-2024

Wuxi Institute of Technology
2023

Jiangsu University
2022-2023

Yangzhou University
2019-2020

Zhejiang University of Technology
2019

Southeast University
2012-2017

Guilin University of Technology
2015-2016

To improve the precision of keyword spotting (KWS) for individual users on edge devices, we propose an on-chip-training KWS (OCT-KWS) chip private data protection while also achieving ultralow -power inference. Our main contributions are: 1) identity interchange and interleaved pipeline methods during backpropagation (BP), enabling pipelined execution operations that traditionally had to be performed sequentially, reducing cache requirements loss values by 95.8%; 2) all-digital...

10.1109/tvlsi.2025.3525740 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2025-01-01

Placement is crucial in physical design flow with significant impact on later routability and ultimate manufacturability terms of performance, power, area (PPA), which may deviate from finding the optimal solution and/or lead to unnecessary iterations suffering interleaved optimization steps inaccurate PPA estimation. To solve this issue, we propose a physical- timing-related placement guidance framework provides candidate gate sizing buffer insertion solutions as well path group for...

10.3390/electronics14020329 article EN Electronics 2025-01-15

10.1109/tcad.2025.3547806 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2025-01-01

A coarse-grained reconfigurable processing unit (RPU) consisting of 16 ×16 multi-functional elements (PEs) interconnected by an area-efficient line-switched mesh connect (LSMC) routing is implemented on a 5.4 mm ×3.1 die in TSMC 65 nm LP1P8M CMOS technology. hierarchical configuration context (HCC) organization scheme proposed to reduce the implementation overhead and energy dissipation spent fast reconfiguration. The RPU integrated into two system-on-a-chips (SoCs), targeting...

10.1109/tmm.2015.2463735 article EN IEEE Transactions on Multimedia 2015-08-03

Timing estimation prior to routing is of vital importance for optimization at placement stage and timing closure. Existing wire- or net-oriented learning-based methods limits the accuracy efficiency prediction due neglect delay correlation along path computational complexity accumulation. In this paper, an efficient accurate pre-routing framework proposed by employing transformer network residual model, where physical information extracted as sequence features while modeled calibrate...

10.1109/asp-dac52403.2022.9712484 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2022-01-17

In this paper, we introduce a coarse-grained dynamically reconfigurable fabric, named Reconfigurable Processing Unit (RPU), which is implemented on 5.4×3.1 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> silicon with TSMC 65 nm LP1P8M technology. This fabric consists of 16×16 multi-functional Elements (PEs) interconnected by an area-efficient Line-Switched Mesh Connect (LSMC) routing. A Hierarchical Configuration Context (HCC)...

10.1109/cicc.2013.6658434 article EN 2013-09-01

The coarse-grained reconfigurable architecture (CGRA) is proven to be energy efficient in several specific domains. In CGRAs, the on-chip memory hierarchy, which contains context and data organizations, should well considered achieve appropriate tradeoffs among three aspects: 1) performance; 2) area; 3) power. this paper, two techniques called hierarchical configuration (HCC) lifetime-based data-memory organization (LDO) focusing on organizations are proposed compress space reduce...

10.1109/tvlsi.2013.2263155 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2013-07-04

This paper proposes a novel sub-architecture to optimize the data flow of REMUS-II (REconfigurable MUltimedia System 2), dynamically coarse grain reconfigurable architecture. consists µPU (Micro-Processor Unit) and two RPUs (Reconfigurable Processor Unit), which are used speeds up control-intensive tasks data-intensive respectively. The parallel computing capability flexibility makes itself an excellent candidate process multimedia applications, require large amount memory accesses. In this...

10.1587/transinf.e95.d.374 article EN IEICE Transactions on Information and Systems 2012-01-01

Voltage scaling technique is widely employed in state-of-the-art low power circuits with excellent reduction. However, voltage to sub-threshold (STV) and near-threshold (NTV) domain introduces performance degradation high process variation sensitivity. Accurate modeling of the statistical characteristics especially probability distribution function (PDF) cumulative (CDF) urgently required consideration. In this paper, a novel analytical model derived based on log-skew-normal (LSN) precisely...

10.1109/access.2019.2955091 article EN cc-by IEEE Access 2019-01-01

Differential power analyses (DPA) have become great threats to cryptographic chips. However, the DPA resistance evaluation is difficult during circuit design time. In this paper, a simulation test platform at time and an experimental measurement are built evaluate resistant capability of The security obtained by dynamic taking timing behavior into account, which uses time-based mode PrimeTime Power Extension (PTPX) accurate characterization. effects both platforms verified on unprotected...

10.1109/tim.2013.2259754 article EN IEEE Transactions on Instrumentation and Measurement 2013-06-10

This paper presents a comprehensive review of near-threshold wide-voltage designs on memory, resilient logic designs, low voltage Radio Frequency (RF) circuits, and timing analysis. With the prosperous development wearable applications, power consumption has become one primary challenges for IC designs. To improve efficiency, prefer scheme is to operate at an ultra Near Threshold Voltage (NTV). For performance variation degradation, self-adaptive margin assignment technique proposed in...

10.26599/tst.2022.9010064 article EN Tsinghua Science & Technology 2023-01-06

Critical path generation poses significant challenge to integrated circuit (IC) design flow in terms of huge computational complexity and crucial impact optimization, whose early prediction is vital importance for accelerating the closure, especially under multiple process-voltage-temperature (PVT) corners. In this work, a post-routing critical framework proposed based on Bidirectional Long Short-Term Memory (BiLSTM) network Multi-Layer Perceptron (MLP) learn from sequential features global...

10.1109/dac56929.2023.10247984 article EN 2023-07-09

Timing mismatch between different stages of physical design poses great challenges for circuit optimization to achieve the desired performance, power, and area (PPA) tradeoff. The inaccurate timing estimation prior routing may lead over-design with unwanted power consumption or iterating back cell placement at cost turn-around time. Existing learning models could not predict post-routing satisfying accuracy efficiency due limitations ignorance delay correlation along path empirical feature...

10.1109/tcad.2022.3216752 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2022-11-03

This paper presents a novel architecture design to optimize the reconfiguration process of coarse-grained reconfigurable (CGRA) called Reconfigurable Multimedia System II (REMUS-II). In REMUS-II, tasks in multi-media applications are divided into two parts: computing-intensive and control-intensive tasks. Two Processor Units (RPUs) for accelerating Micro-Processor Unit (µPU) contained REMUS-II. As large-scale CGRA, REMUS-II can provide satisfying solutions terms both efficiency flexibility....

10.1587/transinf.e95.d.1858 article EN IEICE Transactions on Information and Systems 2012-01-01

Combined experimental and numerical studies are conducted to study ice storage performance of an tank with finned tube. Axially arranged fins on the refrigerant tube applied enhance solidification. The evolution solid–liquid interface variation temperature typical position is examined. effect natural convection discussed in detail. In addition, effects initial water analyzed. results indicate that enhanced by metal remarkably. defection poor heat transfer after formed can be solved applying...

10.3390/pr7050266 article EN Processes 2019-05-07

Laser cladding is mainly applied in components renovation, coating, stacking forming, etc.At present, the degree of automation laser not very high.Machine vision technology gradually becoming a good approach to improve automaticity.Utilizing machine measure and control molten pool has become an important research interest.Clear image was acquired with appropriate grabbing installation system together processing methods, area calculated.Analyzed variation as parameters changed.Discussed...

10.2991/meic-14.2014.141 article EN cc-by-nc Advances in engineering research/Advances in Engineering Research 2014-01-01

This paper presents a configuration compression approach for coarse-grain reconfigurable architectures (CGRA) to reduce the context size in caches, and therefore improve reconfiguration efficiency of CGRAs. Firstly, some kernel sub-algorithms radar signal processing including FFT, FIR Matrix Inversion are analyzed, explore features that contexts consist repetition same blocks Then, is proposed redundancies when they loaded into cache. The experimental results show can drastically context,...

10.1109/cyberc.2014.83 article EN 2014-10-01

Wide voltage design provides the tremendous benefits for state-of-the-art circuit in terms of power consumption reduction and energy efficiency enhancement. The traditional verification flow depends on standard cell libraries, which are only available from foundries limited PVT (Process-Voltage-Temperature) corners near nominal voltages, leading to remarkable characterization effort storage overhead. In this paper, a learning-based framework is proposed predict path delays across multiple...

10.1145/3386263.3406918 article EN 2020-09-04
Coming Soon ...