NFDI4DS | UHH-SEMS - Publication Details

Jie Wang

ORCID: 0009-0005-4657-7977

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100440055

Research Areas

Embedded Systems Design Techniques
Interconnection Networks and Systems
Parallel Computing and Optimization Techniques
Advanced Neural Network Applications
VLSI and FPGA Design Techniques
CCD and CMOS Imaging Sensors
VLSI and Analog Circuit Testing
Real-Time Systems Scheduling
Low-power high-performance VLSI design
Advanced Image and Video Retrieval Techniques
Robotics and Sensor-Based Localization
Plasma Diagnostics and Applications
Cellular Automata and Applications
Advanced Signal Processing Techniques
Petri Nets in System Modeling
Robotics and Automated Systems
Innovation Diffusion and Forecasting
Economic and Technological Innovation
Network Security and Intrusion Detection
Network Packet Processing and Optimization
Sparse and Compressive Sensing Techniques
Machine Learning and Data Classification
Domain Adaptation and Few-Shot Learning
Fusion materials and technologies
Network Time Synchronization Technologies

China Telecom (China)
2025

China Telecom
2025

University of South China
2025

University of California, Los Angeles
2018-2023

Amazon (United States)
2023

Beijing Institute of Technology
2022

Laboratoire d'Analyse et d'Architecture des Systèmes
2021

University of California System
2020

East China University of Technology
2020

Hebei University
2019

Going Deeper with Embedded FPGA Platform for Convolutional Neural Network

OPENALEX - Publications

Jiantao Qiu Jie Wang Song Yao Kaiyuan Guo Boxun Li and 7 more

In recent years, convolutional neural network (CNN) based methods have achieved great success in a large number of applications and been among the most powerful widely used techniques computer vision. However, CNN-based are com-putational-intensive resource-consuming, thus hard to be integrated into embedded systems such as smart phones, glasses, robots. FPGA is one promising platforms for accelerating CNN, but limited bandwidth on-chip memory size limit performance accelerator CNN.

10.1145/2847263.2847265 article EN 2016-02-04

HeteroCL

OPENALEX - Publications

Yi‐Hsiang Lai Yuze Chi Yuwei Hu Jie Wang Cody Hao Yu and 3 more

With the pursuit of improving compute performance under strict power constraints, there is an increasing need for deploying applications to heterogeneous hardware architectures with accelerators, such as GPUs and FPGAs. However, although these computing platforms are becoming widely available, they very difficult program especially As a result, use has been limited small subset programmers specialized knowledge. To tackle this challenge, we introduce HeteroCL, programming infrastructure...

10.1145/3289602.3293910 article EN 2019-02-20

PolySA

OPENALEX - Publications

Jason Cong Jie Wang

Automatic systolic array generation has long been an interesting topic due to the need reduce lengthy development cycles of manual designs. Existing automatic approach builds dependency graphs from algorithms, and iteratively maps computation nodes in graph into processing elements (PEs) with time stamps that specify sequences operate within PE. There are a number previous works implemented idea generated designs for ASICs. However, all these relied on human intervention usually inferior...

10.1145/3240765.3240838 article EN 2018-11-05

AutoSA

OPENALEX - Publications

Jie Wang Licheng Guo Jason Cong

While systolic array architectures have the potential to deliver tremendous performance, it is notoriously challenging customize an efficient processor for a target application. Designing arrays requires knowledge both high-level characteristics of application and low-level hardware details, thus making demanding inefficient process. To relieve users from manual iterative trial-and-error process, we present AutoSA, end-to-end compilation framework generating on FPGA. AutoSA based polyhedral...

10.1145/3431920.3439292 article EN 2021-02-17

AutoBridge

OPENALEX - Publications

Licheng Guo Yuze Chi Jie Wang Jason Lau Weikang Qiao and 3 more

Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable clock frequency between HLS-generated and handcrafted RTL one. A key factor that limits timing quality HLS outputs is difficulty accurately estimating interconnect delay at level. Unfortunately, this problem becomes even worse when large designs are implemented on latest multi-die FPGAs, where die-crossing interconnects incur high penalty.

10.1145/3431920.3439289 article EN 2021-02-17

FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

OPENALEX - Publications

Suhail Basalama Atefeh Sohrabizadeh Jie Wang Licheng Guo Jason Cong

With reduced data reuse and parallelism, recent convolutional neural networks (CNNs) create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable architectures layers, but without proper optimizations, their efficiency drops dramatically reasons: (1) the different dimensions within same-type (2) convolution layers especially transposed dilated convolutions, (3) CNN’s complex dataflow graph. Furthermore, significant overheads arise when integrating FPGAs into...

10.1145/3570928 article EN ACM Transactions on Reconfigurable Technology and Systems 2022-12-20

RapidStream

OPENALEX - Publications

Licheng Guo Pongstorn Maidee Yun Zhou Christopher Lavin Jie Wang and 5 more

FPGAs require a much longer compilation cycle than conventional computing platforms like CPUs. In this paper, we shorten the overall time by co-optimizing HLS (C-to-RTL) and back-end physical implementation (RTL-to-bitstream). We propose split approach based on pipelining flexibility at level, which allows us to partition designs for parallel placement routing then stitch separate partitions together. outline number of technical challenges address them breaking boundaries between different...

10.1145/3490422.3502361 article EN 2022-02-11

End-to-End Optimization of Deep Learning Applications

OPENALEX - Publications

Atefeh Sohrabizadeh Jie Wang Jason Cong

The irregularity of recent Convolutional Neural Network (CNN) models such as less data reuse and parallelism due to the extensive network pruning simplification creates new challenges for FPGA acceleration. Furthermore, without proper optimization, there could be significant overheads when integrating FPGAs into existing machine learning frameworks like TensorFlow. Such a problem is mostly overlooked by previous studies. However, our study shows that naive integration TensorFlow lead up...

10.1145/3373087.3375321 article EN 2020-02-23

SuSy

OPENALEX - Publications

Yi‐Hsiang Lai Hongbo Rong Size Zheng Weihao Zhang Xiuping Cui and 11 more

Systolic algorithms are one of the killer applications on spatial architectures such as FPGAs and CGRAs. However, it requires a tremendous amount human effort to design implement high-performance systolic array for given algorithm using traditional RTL-based methodology. On other hand, existing high-level synthesis (HLS) tools either (1) force programmers do "micro-coding" where too many optimizations must be carried out through tedious code restructuring insertion vendor-specific pragmas,...

10.1145/3400302.3415644 article EN 2020-11-02

TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical Design

OPENALEX - Publications

Licheng Guo Yuze Chi Jason Lau Linghao Song Xingyu Tian and 7 more

In this article, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, provides set of convenient APIs allows users easily express flexible and complex inter-task communication structures. Second, adopts coarse-grained floorplanning step during HLS compilation for accurate pipelining potential critical paths. addition, implements several...

10.1145/3609335 article EN cc-by ACM Transactions on Reconfigurable Technology and Systems 2023-09-18

Simplified Method for Estimating Minority Ions Effective Temperature during Ion Cyclotron Resonance Frequency Heating in a Two-Component Plasma under Pitch Angle Averaging

OPENALEX - Publications

Wankun Ma Jie Wang Lan Yin

10.7566/jpsj.94.024501 article EN Journal of the Physical Society of Japan 2025-01-23

Deep Learning-Based Spectrum Sensing for TV White Space in 5G-MBMS Networks

OPENALEX - Publications

Fenghua Xu Yukun Zhu Hongyuan Zhu Junsheng Mu Jie Wang and 2 more

10.1109/tbc.2025.3553296 article EN IEEE Transactions on Broadcasting 2025-01-01

Extending High-Level Synthesis for Task-Parallel Programs

OPENALEX - Publications

Yuze Chi Licheng Guo Jason Lau Young Choi Jie Wang and 1 more

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and popular for field-programmable gate array (FPGA) accelerators in many application domains recent years, thanks to its competitive quality of results (QoR) short development cycles compared with the traditional register-transfer level design approach. Yet, limited by sequential C semantics, it remains challenging adopt same highly productive programming approach other domains, where coarse-grained tasks run parallel communicate...

10.1109/fccm51124.2021.00032 article EN 2021-05-01

Partial wave analysis ofJ/ψ→pp¯π0

OPENALEX - Publications

F. De Mori J. Z. Bai Y. Bai Y. Ban X. Cai and 95 more

Using a sample of 58 million $J/\ensuremath{\psi}$ events collected with the BESII detector at BEPC, more than 100 000 $J/\ensuremath{\psi}\ensuremath{\rightarrow}p\overline{p}{\ensuremath{\pi}}^{0}$ are selected, and detailed partial wave analysis is performed. The branching fraction determined to be...

10.1103/physrevd.80.052004 article EN Physical review. D. Particles, fields, gravitation, and cosmology/Physical review. D, Particles, fields, gravitation, and cosmology 2009-09-17

Cardiovascular disease in rheumatoid arthritis: medications and risk factors in China

OPENALEX - Publications

Chun Li Xin Wang Hanhua Ji Xinqiu Zhang Xurui Li and 46 more

10.1007/s10067-017-3596-7 article EN Clinical Rheumatology 2017-03-24

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

OPENALEX - Publications

Young Choi Yuze Chi Jie Wang Licheng Guo Jason Cong

With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from acceleration. However, we found that it is not easy fully utilize available bandwidth when developing some with high-level synthesis (HLS) tools. due limitation existing HLS tools accessing HBM board's large number independent channels. In this paper, measure performance three representative...

10.48550/arxiv.2010.06075 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency

OPENALEX - Publications

Licheng Guo Jason Lau Yuze Chi Jie Wang Cody Hao Yu and 3 more

Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. In this work, we study the timing issues in diverse set of realistic and complex FPGA HLS (1) We observe that almost all cases degradation is caused broadcast structures compiler. (2) classify three major types broadcasts HLS-generated designs, including high-fanout data signals, pipeline flow control signals synchronization for concurrent modules. (3) reveal number...

10.1109/dac18072.2020.9218718 article EN 2020-07-01

Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency

OPENALEX - Publications

Licheng Guo Jason Lau Yuze Chi Jie Wang Cody Hao Yu and 3 more

Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. We study the timing issues in diverse set of nine realistic HLS designs and observe that most cases degradation is related signal broadcast structures. In this work, we classify common types designs, including data two control broadcast: pipeline synchronization broadcast. further identify several limitations current tools, which lead improper handling broadcasts. First,...

10.1145/3373087.3375332 article EN 2020-02-23

Extending High-Level Synthesis for Task-Parallel Programs

OPENALEX - Publications

Yuze Chi Licheng Guo Young Choi Jie Wang Jason Cong

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and popular for field-programmable gate array (FPGA) accelerators in many application domains recent years, thanks to its competitive quality of result (QoR) short development cycle compared with the traditional register-transfer level (RTL) design approach. Yet, limited by sequential C semantics, it remains challenging adopt same highly productive programming approach other domains, where coarse-grained tasks run parallel...

10.1145/3431920.3439470 preprint EN 2021-02-17

A Comprehensive Automated Exploration Framework for Systolic Array Designs

OPENALEX - Publications

Suhail Basalama Jie Wang Jason Cong

Many researchers studying the performance tuning of systolic arrays have based their works on oversimplified assumptions like considering only divisors for loop tiling or pruning off-chip data communication to reduce design space. In this paper, we present a comprehensive space exploration tool named Odyssey array optimization. results show that limiting factors problem size can cause up 39% loss, and movement miss optimal designs. We tested using various matrix multiplication convolution...

10.1109/dac56929.2023.10248016 article EN 2023-07-09

The Design and Implementation of an Augmented Reality Scheme Based on Netty Communication

OPENALEX - Publications

Yongping Gao Jie Wang Jingzhong Zhang

Abstract This paper introduces a whole project of augmented reality interaction mode based on netty communication method, determines the main technical difficulties in implementation process, and finds solution by means system architecture design protocol development. In this paper, framework is adopted to deal with network IO business logic, between each terminal device, virtual environment server established through long connection, adopts establishes device so as realize fast natural...

10.1088/1742-6596/1575/1/012015 article EN Journal of Physics Conference Series 2020-06-01

RAF: Holistic Compilation for Deep Learning Model Training

OPENALEX - Publications

Cody Hao Yu Haozheng Fan Guangtai Huang Zhen Jia Yizhi Liu and 7 more

As deep learning is pervasive in modern applications, many frameworks are presented for practitioners to develop and train DNN models rapidly. Meanwhile, as training large becomes a trend recent years, the throughput memory footprint getting crucial. Accordingly, optimizing workloads with compiler optimizations inevitable more attentions. However, existing compilers (DLCs) mainly target inference do not incorporate holistic optimizations, such automatic differentiation mixed precision,...

10.48550/arxiv.2303.04759 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Coming Soon ...