Chen Yang

ORCID: 0000-0002-8221-7670
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Parallel Computing and Optimization Techniques
  • Advanced Memory and Neural Computing
  • Interconnection Networks and Systems
  • Cryptography and Data Security
  • Coding theory and cryptography
  • CCD and CMOS Imaging Sensors
  • Embedded Systems Design Techniques
  • Cryptography and Residue Arithmetic
  • Low-power high-performance VLSI design
  • Cryptographic Implementations and Security
  • Adversarial Robustness in Machine Learning
  • Machine Fault Diagnosis Techniques
  • Nanopore and Nanochannel Transport Studies
  • Advanced Data Storage Technologies
  • Video Surveillance and Tracking Methods
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Fault Detection and Control Systems
  • Cloud Computing and Resource Management
  • Ferroelectric and Negative Capacitance Devices
  • Anomaly Detection Techniques and Applications
  • Chaos-based Image/Signal Encryption
  • VLSI and Analog Circuit Testing
  • VLSI and FPGA Design Techniques
  • Numerical Methods and Algorithms

Xi'an Jiaotong University
2017-2025

Northeastern University
2023

Dalian University of Technology
2022-2023

Donghua University
2023

Xi’an University of Posts and Telecommunications
2023

Guilin University of Electronic Technology
2023

Nanjing University of Science and Technology
2023

North China University of Water Resources and Electric Power
2022

University of Minnesota
2021

Harbin Institute of Technology
2021

FPGA-based CNN accelerators have advantages in flexibility and power efficiency so are being deployed by a number of cloud computing service providers, including Microsoft, Amazon, Tencent, Alibaba. Given the increasing complexity neural networks, however, it is becoming challenging to efficiently map CNNs multi-FPGA platforms. In this work, we present scalable framework, FPDeep, which helps engineers specific CNN's training logic cluster or build RTL implementations for target network. With...

10.1109/fccm.2018.00021 article EN 2018-04-01

In order to deploy a secure WLAN mesh network, authentication of both users and APs is needed, mechanism should be employed. However, some additional configurations trusted third party agencies are still needed on-site system. This paper proposes new block chain-based protocol for security access, reduce the deployment costs resolve issues requiring key delivery central server during IEEE 802.11X authentication. method takes user's request as transaction, considers all records in network...

10.32604/cmc.2019.03863 article EN Computers, materials & continua/Computers, materials & continua (Print) 2019-01-01

The ring learning with error (RLWE)-based fully homomorphic encryption (FHE) scheme has become one of the most promising FHE schemes. However, its performance is limited by multiplication, especially polynomial multiplication which occupies major computing resources. Therefore, efficient implementation crucial for high-performance applications. In this article, we present an area-efficient and highly unified reconfigurable multicore number theoretic transform (NTT)/inverse NTT (INTT)...

10.1109/tvlsi.2022.3166355 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-04-22

To improve flexibility and energy efficiency of Convolutional Neural Networks, a number cloud computing service providers-including Microsoft, Amazon, Alibaba-are using FPGA-based CNN accelerators. However, the growing size complexity neural networks, coupled with communication off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization performance, especially when training. In this work, we present new results scalable framework,...

10.1109/fpl.2018.00074 article EN 2018-08-01

The implementation of Molecular Dynamics (MD) on FPGAs has received substantial attention. Previous work, however, consisted either proof-of-concept implementations components, usually the range-limited force; full systems, but with much work shared by host CPU; or prototype demonstrations, e.g., using OpenCL, that neither implement a whole system nor have competitive performance. In this paper, we present what believe to be first full-scale FPGA-based simulation engine, and show its...

10.1145/3295500.3356179 article EN 2019-11-07

Convolutional neural networks (CNNs) have demonstrated significant superiority in modern artificial intelligence (AI) applications. To accelerate the inference process of CNNs, reconfigurable CNN accelerators that support diverse are widely employed for AI systems. Given ubiquitous deployment these systems, there is a growing concern regarding security and potential attacks they may face, including hardware Trojans. This paper proposes Trojan designed to attack crucial component FPGA-based...

10.3390/mi15010149 article EN cc-by Micromachines 2024-01-19

10.1145/3658617.3697775 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

The architecture of the Microsoft Catapult II cloud places accelerator (FPGA) as a bump-in-the-wire on way to network and thus promises dramatic reduction in latency layers hardware software are avoided. We demonstrate this capability with an implementation 3D FFT. Next we examine phased application elasticity, i.e., use reduced set nodes for some phases HPC application. find that, FFT phase within Molecular Dynamics, such contraction is beneficial 13%–14% performance improvement. Turning...

10.23919/fpl.2017.8056853 article EN 2017-09-01

As convolutional neural networks (CNNs) become more and diverse complicated, acceleration of CNNs increasingly encounters a bottleneck balancing performance, energy efficiency, flexibility in unified architecture. This paper proposed Winograd-based highly efficient dynamically Reconfigurable Accelerator (named WRA) for quickly evolving CNN models. A cost-effective convolution decomposition method (CDW) was proposed, it extends the application fast Winograd algorithm. Based on CDW,...

10.1109/tcsi.2019.2928682 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2019-07-29

High inference latency seriously limits the deployment of DNNs in real-time domains such as autonomous driving, robotic control, and many others. To address this emerging challenge, researchers have proposed approximate with reduced precision, e.g., Binarized Neural Networks (BNNs). While BNNs can be built to little loss accuracy, reduction still has much room for improvement. In paper, we propose a single-FPGA-based BNN accelerator that achieves microsecond-level ultra-low-latency ImageNet,...

10.1109/asap.2019.00-43 article EN 2019-07-01

To reduce multiplication operations in convolution of convolutional neural networks (CNNs), there are three widely used acceleration algorithms, i.e., Winograd, FFT and FFA. However, current accelerators based on these algorithms have issues flexibility efficiency. Firstly, some utilized a combination employed multiple types computational units to achieve their respective advantages. As result, left unused when the best-performing unit is working, which causes much area inefficiency....

10.1109/tcsi.2020.2985727 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2020-04-22

Fully homomorphic encryption (FHE) allows arbitrary computation on encrypted data and has great potential in privacy-preserving cloud computing securely outsource computational tasks. However, the excessive complexity is key limitation that restricting practical application of FHE. In this paper we proposed a FPGA-based high parallelism architecture to accelerate FHE schemes based ring learning with errors (RLWE) problem, specifically, presented fast implementation leveled fully scheme BGV....

10.1109/access.2020.3023255 article EN cc-by IEEE Access 2020-01-01

Fully homomorphic encryption (FHE) allows arbitrary computation on encrypted data and thus has potential in privacy-preserving computing. However, efficiency is still the bottleneck. In this paper we present an area-efficient highly unified reconfigurable multi-core architecture (named ReMCA) for full Residue Number System (RNS) variant of Fan-Vercauteren Brakerski's scheme (RNS-BFV), which employs a variable number processing elements (PEs) RNS channels. The PE unit can be flexibly...

10.1109/tcsi.2022.3163970 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2022-04-11

In this brief, a hybrid-grained reconfigurable architecture (HReA) is introduced to process 13-Dwarfs. The proposed dynamically fabric consists of four 4 × multi-functional processing elements array, where structure combine 32-bit data path with 1-bit accommodate multiple computing granularities in Aiming at the flexibility 13-Dwarfs calculation, directional broadcasting scheme for multi-bank memory, cache partitioning mechanism, and prefetching methods are further improve HReA performance...

10.1109/tcsii.2017.2728814 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2017-07-18

Deep learning architecture has achieved amazing success in many areas with the recent advancements convolutional neural networks (CNNs). However, real-time applications of CNNs are seriously hindered by significant storage and computational costs. Structured pruning is a promising method to compress accelerate does not need special hardware or software for an auxiliary calculation. Here simple strategy structured approach proposed crop unimportant filters neurons automatically during...

10.1109/access.2019.2933032 article EN cc-by IEEE Access 2019-01-01

Simultaneous localization and mapping (SLAM) is considered as a key technique in augmented reality (AR), robotics unmanned driving. In the field of SLAM, solutions based on monocular sensors have gradually become important due to their ability recognize more environmental information with simple structures low costs. Feature-based ORB-SLAM popular many applications, but it has limitations complex indoor scenes. Firstly, camera pose estimation images greatly affected by environment; secondly,...

10.1109/access.2022.3144845 article EN cc-by-nc-nd IEEE Access 2022-01-01

FPGA-centric clouds and clusters provide direct programmable interconnects with obvious benefits for communication latency bandwidth. One rarely studied aspect of DPI is that they facilitate application-aware routing: if patterns are static known a priori, as usually the case, then judicious routing can reduce congestion, latency, hardware required. In this study we explore applying method offline/static to collective operations, in particular, multicast reduction. An entirely new...

10.1145/3039902.3039904 article EN ACM SIGARCH Computer Architecture News 2017-01-11

Neural network pruning, which can be divided into unstructured pruning and structured strategies, has been proven to an efficient method substantially reduce the number of computations convolutional neural networks (CNNs). However, it remains difficult combine advantages these two strategies. This article proposes a high-performance accelerator for sparse CNNs. First, convolution-based filter selection clustering (FSCM) is proposed reorder filters uniform-size dense filters, eliminating...

10.1109/tvlsi.2022.3211665 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-11-03
Coming Soon ...