NFDI4DS | UHH-SEMS - Publication Details

Chen Yang

ORCID: 0000-0002-8221-7670

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100719382

Research Areas

Advanced Neural Network Applications
Parallel Computing and Optimization Techniques
Advanced Memory and Neural Computing
Interconnection Networks and Systems
Cryptography and Data Security
Coding theory and cryptography
CCD and CMOS Imaging Sensors
Embedded Systems Design Techniques
Cryptography and Residue Arithmetic
Low-power high-performance VLSI design
Cryptographic Implementations and Security
Adversarial Robustness in Machine Learning
Machine Fault Diagnosis Techniques
Nanopore and Nanochannel Transport Studies
Advanced Data Storage Technologies
Video Surveillance and Tracking Methods
Physical Unclonable Functions (PUFs) and Hardware Security
Fault Detection and Control Systems
Cloud Computing and Resource Management
Ferroelectric and Negative Capacitance Devices
Anomaly Detection Techniques and Applications
Chaos-based Image/Signal Encryption
VLSI and Analog Circuit Testing
VLSI and FPGA Design Techniques
Numerical Methods and Algorithms

Xi'an Jiaotong University
2017-2025

Northeastern University
2023

Dalian University of Technology
2022-2023

Donghua University
2023

Xi’an University of Posts and Telecommunications
2023

Guilin University of Electronic Technology
2023

Nanjing University of Science and Technology
2023

North China University of Water Resources and Electric Power
2022

University of Minnesota
2021

Harbin Institute of Technology
2021

Trusted multi-source information fusion for fault diagnosis of electromechanical system with modified graph convolution network

OPENALEX - Publications

Kongliang Zhang Hongkun Li Shunxin Cao Shai Lv Chen Yang and 1 more

10.1016/j.aei.2023.102088 article EN Advanced Engineering Informatics 2023-07-12

FPDeep: Acceleration and Load Balancing of CNN Training on FPGA Clusters

OPENALEX - Publications

Tong Geng Tianqi Wang Ahmed Sanaullah Chen Yang Rui Xu and 2 more

FPGA-based CNN accelerators have advantages in flexibility and power efficiency so are being deployed by a number of cloud computing service providers, including Microsoft, Amazon, Tencent, Alibaba. Given the increasing complexity neural networks, however, it is becoming challenging to efficiently map CNNs multi-FPGA platforms. In this work, we present scalable framework, FPDeep, which helps engineers specific CNN's training logic cluster or build RTL implementations for target network. With...

10.1109/fccm.2018.00021 article EN 2018-04-01

A Blockchain-Based Authentication Protocol for WLAN Mesh Security Access

OPENALEX - Publications

Xin Jiang Mingzhe Liu Chen Yang Yanhua Liu Ruili Wang

In order to deploy a secure WLAN mesh network, authentication of both users and APs is needed, mechanism should be employed. However, some additional configurations trusted third party agencies are still needed on-site system. This paper proposes new block chain-based protocol for security access, reduce the deployment costs resolve issues requiring key delivery central server during IEEE 802.11X authentication. method takes user's request as transaction, considers all records in network...

10.32604/cmc.2019.03863 article EN Computers, materials & continua/Computers, materials & continua (Print) 2019-01-01

Integrated detection of citrus fruits and branches using a convolutional neural network

OPENALEX - Publications

Chen Yang Linyun Xiong Zengfu Wang Yinfan Wang Guo-yang Shi and 3 more

10.1016/j.compag.2020.105469 article EN Computers and Electronics in Agriculture 2020-05-29

Motor current signal analysis using hypergraph neural networks for fault diagnosis of electromechanical system

OPENALEX - Publications

Kongliang Zhang Hongkun Li Shunxin Cao Chen Yang Fubiao Sun and 1 more

10.1016/j.measurement.2022.111697 article EN Measurement 2022-08-04

A Highly Unified Reconfigurable Multicore Architecture to Speed Up NTT/INTT for Homomorphic Polynomial Multiplication

OPENALEX - Publications

Yang Su Bailong Yang Chen Yang Ze-Peng Yang Yiwei Liu

The ring learning with error (RLWE)-based fully homomorphic encryption (FHE) scheme has become one of the most promising FHE schemes. However, its performance is limited by multiplication, especially polynomial multiplication which occupies major computing resources. Therefore, efficient implementation crucial for high-performance applications. In this article, we present an area-efficient and highly unified reconfigurable multicore number theoretic transform (NTT)/inverse NTT (INTT)...

10.1109/tvlsi.2022.3166355 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-04-22

A Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Work and Weight Load Balancing

OPENALEX - Publications

Tong Geng Tianqi Wang Ahmed Sanaullah Chen Yang Rushi Patel and 1 more

To improve flexibility and energy efficiency of Convolutional Neural Networks, a number cloud computing service providers-including Microsoft, Amazon, Alibaba-are using FPGA-based CNN accelerators. However, the growing size complexity neural networks, coupled with communication off-chip memory bottlenecks, make it increasingly difficult for multi-FPGA designs to achieve high resource utilization performance, especially when training. In this work, we present new results scalable framework,...

10.1109/fpl.2018.00074 article EN 2018-08-01

Fully integrated FPGA molecular dynamics simulations

OPENALEX - Publications

Chen Yang Tong Geng Tianqi Wang Rushi Patel Qingqing Xiong and 7 more

The implementation of Molecular Dynamics (MD) on FPGAs has received substantial attention. Previous work, however, consisted either proof-of-concept implementations components, usually the range-limited force; full systems, but with much work shared by host CPU; or prototype demonstrations, e.g., using OpenCL, that neither implement a whole system nor have competitive performance. In this paper, we present what believe to be first full-scale FPGA-based simulation engine, and show its...

10.1145/3295500.3356179 article EN 2019-11-07

Hardware Trojan Attacks on the Reconfigurable Interconnections of Field-Programmable Gate Array-Based Convolutional Neural Network Accelerators and a Physically Unclonable Function-Based Countermeasure Detection Technique

OPENALEX - Publications

Jia Hou Zichu Liu Zepeng Yang Chen Yang

Convolutional neural networks (CNNs) have demonstrated significant superiority in modern artificial intelligence (AI) applications. To accelerate the inference process of CNNs, reconfigurable CNN accelerators that support diverse are widely employed for AI systems. Given ubiquitous deployment these systems, there is a growing concern regarding security and potential attacks they may face, including hardware Trojans. This paper proposes Trojan designed to attack crucial component FPGA-based...

10.3390/mi15010149 article EN cc-by Micromachines 2024-01-19

Low Multiplicative Depth Polynomial Evaluation Architectures for Homomorphic Encrypted Data

OPENALEX - Publications

Jianfei Wang Jia Hou F. Zhang Yishuo Meng Yang Su and 1 more

10.1145/3658617.3697775 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

Rethinking the Designing of Convolution Engine for Reconfigurable CNN Accelerator Using Sparse-Based Design Scheme

OPENALEX - Publications

Yishuo Meng Jianfei Wang Siwei Xiang Jia Hou Zhijie Lin and 2 more

10.1109/tcsi.2025.3554332 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2025-01-01

HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics

OPENALEX - Publications

Jiayi Sheng Chen Yang Ahmed Sanaullah Michael Papamichael Adrian M. Caulfield and 1 more

The architecture of the Microsoft Catapult II cloud places accelerator (FPGA) as a bump-in-the-wire on way to network and thus promises dramatic reduction in latency layers hardware software are avoided. We demonstrate this capability with an implementation 3D FFT. Next we examine phased application elasticity, i.e., use reduced set nodes for some phases HPC application. find that, FFT phase within Molecular Dynamics, such contraction is beneficial 13%–14% performance improvement. Turning...

10.23919/fpl.2017.8056853 article EN 2017-09-01

WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks

OPENALEX - Publications

Chen Yang Yizhou Wang Xiaoli Wang Li Geng

As convolutional neural networks (CNNs) become more and diverse complicated, acceleration of CNNs increasingly encounters a bottleneck balancing performance, energy efficiency, flexibility in unified architecture. This paper proposed Winograd-based highly efficient dynamically Reconfigurable Accelerator (named WRA) for quickly evolving CNN models. A cost-effective convolution decomposition method (CDW) was proposed, it extends the application fast Winograd algorithm. Based on CDW,...

10.1109/tcsi.2019.2928682 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2019-07-29

LP-BNN: Ultra-low-Latency BNN Inference with Layer Parallelism

OPENALEX - Publications

Tong Geng Tianqi Wang Chunshu Wu Chen Yang Shuaiwen Leon Song and 2 more

High inference latency seriously limits the deployment of DNNs in real-time domains such as autonomous driving, robotic control, and many others. To address this emerging challenge, researchers have proposed approximate with reduced precision, e.g., Binarized Neural Networks (BNNs). While BNNs can be built to little loss accuracy, reduction still has much room for improvement. In paper, we propose a single-FPGA-based BNN accelerator that achieves microsecond-level ultra-low-latency ImageNet,...

10.1109/asap.2019.00-43 article EN 2019-07-01

A Stride-Based Convolution Decomposition Method to Stretch CNN Acceleration Algorithms for Efficient and Flexible Hardware Implementation

OPENALEX - Publications

Chen Yang Yizhou Wang Xiaoli Wang Li Geng

To reduce multiplication operations in convolution of convolutional neural networks (CNNs), there are three widely used acceleration algorithms, i.e., Winograd, FFT and FFA. However, current accelerators based on these algorithms have issues flexibility efficiency. Firstly, some utilized a combination employed multiple types computational units to achieve their respective advantages. As result, left unused when the best-performing unit is working, which causes much area inefficiency....

10.1109/tcsi.2020.2985727 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2020-04-22

FPGA-Based Hardware Accelerator for Leveled Ring-LWE Fully Homomorphic Encryption

OPENALEX - Publications

Yang Su Bailong Yang Chen Yang Luogeng Tian

Fully homomorphic encryption (FHE) allows arbitrary computation on encrypted data and has great potential in privacy-preserving cloud computing securely outsource computational tasks. However, the excessive complexity is key limitation that restricting practical application of FHE. In this paper we proposed a FPGA-based high parallelism architecture to accelerate FHE schemes based ring learning with errors (RLWE) problem, specifically, presented fast implementation leveled fully scheme BGV....

10.1109/access.2020.3023255 article EN cc-by IEEE Access 2020-01-01

ReMCA: A Reconfigurable Multi-Core Architecture for Full RNS Variant of BFV Homomorphic Evaluation

OPENALEX - Publications

Yang Su Bailong Yang Chen Yang Song-Yin Zhao

Fully homomorphic encryption (FHE) allows arbitrary computation on encrypted data and thus has potential in privacy-preserving computing. However, efficiency is still the bottleneck. In this paper we present an area-efficient highly unified reconfigurable multi-core architecture (named ReMCA) for full Residue Number System (RNS) variant of Fan-Vercauteren Brakerski's scheme (RNS-BFV), which employs a variable number processing elements (PEs) RNS channels. The PE unit can be flexibly...

10.1109/tcsi.2022.3163970 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2022-04-11

HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing

OPENALEX - Publications

Leibo Liu Zhaoshi Li Chen Yang Chenchen Deng Shouyi Yin and 1 more

In this brief, a hybrid-grained reconfigurable architecture (HReA) is introduced to process 13-Dwarfs. The proposed dynamically fabric consists of four 4 × multi-functional processing elements array, where structure combine 32-bit data path with 1-bit accommodate multiple computing granularities in Aiming at the flexibility 13-Dwarfs calculation, directional broadcasting scheme for multi-bank memory, cache partitioning mechanism, and prefetching methods are further improve HReA performance...

10.1109/tcsii.2017.2728814 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2017-07-18

Structured Pruning of Convolutional Neural Networks via L1 Regularization

OPENALEX - Publications

Chen Yang Zhenghong Yang Abdul Mateen Khattak Yang Liu Wenxin Zhang and 2 more

Deep learning architecture has achieved amazing success in many areas with the recent advancements convolutional neural networks (CNNs). However, real-time applications of CNNs are seriously hindered by significant storage and computational costs. Structured pruning is a promising method to compress accelerate does not need special hardware or software for an auxiliary calculation. Here simple strategy structured approach proposed crop unimportant filters neurons automatically during...

10.1109/access.2019.2933032 article EN cc-by IEEE Access 2019-01-01

Domain adversarial-based multi-source deep transfer network for cross-production-line time series forecasting

OPENALEX - Publications

Lei Chen Chuang Peng Chen Yang Huiyuan Peng Kuangrong Hao

10.1007/s10489-023-04729-8 article EN Applied Intelligence 2023-07-03

A Compact and Efficient Hardware Accelerator for RNS-CKKS En/Decoding and En/Decryption

OPENALEX - Publications

Jianfei Wang Chen Yang Jia Hou Fahong Zhang Yishuo Meng and 2 more

10.1109/tcsii.2024.3454024 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2024-01-01

SDF-SLAM: A Deep Learning Based Highly Accurate SLAM Using Monocular Camera Aiming at Indoor Map Reconstruction With Semantic and Depth Fusion

OPENALEX - Publications

Chen Yang Qi Chen Yaoyao Yang Jingyu Zhang Minshun Wu and 1 more

Simultaneous localization and mapping (SLAM) is considered as a key technique in augmented reality (AR), robotics unmanned driving. In the field of SLAM, solutions based on monocular sensors have gradually become important due to their ability recognize more environmental information with simple structures low costs. Feature-based ORB-SLAM popular many applications, but it has limitations complex indoor scenes. Firstly, camera pose estimation images greatly affected by environment; secondly,...

10.1109/access.2022.3144845 article EN cc-by-nc-nd IEEE Access 2022-01-01

A New Precipitation Prediction Method Based on CEEMDAN-IWOA-BP Coupling

OPENALEX - Publications

Fuping Liu Ying Liu Chen Yang Ruixun Lai

10.1007/s11269-022-03277-z article EN Water Resources Management 2022-08-10

Collective Communication on FPGA Clusters with Static Scheduling

OPENALEX - Publications

Jiayi Sheng Qingqing Xiong Chen Yang Martin C. Herbordt

FPGA-centric clouds and clusters provide direct programmable interconnects with obvious benefits for communication latency bandwidth. One rarely studied aspect of DPI is that they facilitate application-aware routing: if patterns are static known a priori, as usually the case, then judicious routing can reduce congestion, latency, hardware required. In this study we explore applying method offline/static to collective operations, in particular, multicast reduction. An entirely new...

10.1145/3039902.3039904 article EN ACM SIGARCH Computer Architecture News 2017-01-11

A Sparse CNN Accelerator for Eliminating Redundant Computations in Intra- and Inter-Convolutional/Pooling Layers

OPENALEX - Publications

Chen Yang Yishuo Meng Kaibo Huo Jiawei Xi Kuizhi Mei

Neural network pruning, which can be divided into unstructured pruning and structured strategies, has been proven to an efficient method substantially reduce the number of computations convolutional neural networks (CNNs). However, it remains difficult combine advantages these two strategies. This article proposes a high-performance accelerator for sparse CNNs. First, convolution-based filter selection clustering (FSCM) is proposed reorder filters uniform-size dense filters, eliminating...

10.1109/tvlsi.2022.3211665 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2022-11-03

Coming Soon ...