NFDI4DS | UHH-SEMS - Publication Details

Mapping and Optimization Method of SpMV on Multi-DSP Accelerator

OPENALEX - Publications

Sheng Liu Yasong Cao Shuwei Sun

Sparse matrix-vector multiplication (SpMV) solves the product of a sparse matrix and dense vector, sparseness is often more than 90%. Usually, compressed to save storage resources, but this causes irregular access vectors in algorithm, which takes lot time degrades SpMV performance system. In study, we design dedicated channel DMA implement an indirect memory process speed up operation. On basis, propose six algorithm schemes map them optimize SpMV. The results show that M processor’s...

10.3390/electronics11223699 article EN Electronics 2022-11-11

BP-Im2col: Implicit Im2col Supporting AI Backpropagation on Systolic Arrays

OPENALEX - Publications

Shuicheng Yan Mei Wen Junzhong Shen Yasong Cao Minjin Tang and 3 more

State-of-the-art systolic array-based accelerators adopt the traditional im2col algorithm to accelerate inference of convolutional layers. However, cannot efficiently support AI backpropagation. Backpropagation in layers involves performing transposed convolution and dilated convolution, which usually introduces plenty zero-spaces into feature map or kernel. The zero-space data reorganization interfere with continuity training incur additional non-negligible overhead terms off- on-chip...

10.1109/iccd56317.2022.00068 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2022-10-01

Mentha: Enabling Sparse-Packing Computation on Systolic Arrays

OPENALEX - Publications

Minjin Tang Mei Wen Yasong Cao Junzhong Shen Shuicheng Yan and 3 more

Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a critical kernel in domains like graph analytic and scientific computation. As kind of classical special-purpose architecture, systolic arrays were first used for complex computing problems, e.g., matrix multiplication. However, are not efficient enough when handling sparse matrices due to the fact that PEs containing zero-valued entries perform unnecessary operations do contribute result. Accordingly, this paper, we propose...

10.1145/3545008.3545053 article EN 2022-08-29

ABS: Accumulation Bit-Width Scaling Method for Designing Low-Precision Tensor Core

OPENALEX - Publications

Yasong Cao Mei Wen Zhongdi Luo Xin Ju Haolan Huang and 2 more

10.1109/tvlsi.2024.3414260 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2024-06-25

BitShare: An Efficient Precision-Scalable Accelerator with Combining-Like-Terms GEMM

OPENALEX - Publications

Yasong Cao Mei Wen Junzhong Shen Zhongxing Li

10.1109/asap61560.2024.00020 article EN 2024-07-24

Isomer Effect Study of Pyridinium-type Cationic Fluorophores: Multiple Functions and Internal Optical Mechanism

OPENALEX - Publications

Yasong Cao Caili Zhang Zhonghua Zhao Haowen Huang Jiatong Xu and 5 more

Cationic fluorophore (CF) with highly twisted conformation is a kind of very important functional materials in the field optical sensing and imaging. In this paper, isomers pyridinium-type CFs...

10.1039/d4qm00578c article EN Materials Chemistry Frontiers 2024-01-01

Integration of Single-Port Memory (ISPM) for Multiprecision Computation in Systolic-Array-Based Accelerators

OPENALEX - Publications

Renyu Yang Junzhong Shen Mei Wen Yasong Cao Yuhang Li

On-chip memory is one of the core components deep learning accelerators. In general, area used by on-chip accounts for around 30% total chip area. With increasing complexity algorithms, it will become a challenge accelerators to integrate much larger responding algorithm needs, whereas multiprecision computation required different precision (such as FP32, FP16) computations in training and inference. To solve it, this paper explores use single-port (SPM) systolic-array-based We propose...

10.3390/electronics11101587 article EN Electronics 2022-05-16

TILE-SIM: A Systematic Approach to Systolic Array-based Accelerator Evaluation

OPENALEX - Publications

Yuhang Li Mei Wen Jiawei Fei Junzhong Shen Yasong Cao

The systolic array provides extremely high efficiency for running matrix multiplication, and is one of the mainstream architectures today's deep learning accelerators. In order to develop efficient accelerators, people usually employ simulators make design trade-offs. However, current suffer from coarse-grained modeling methods ideal assumptions, which limits their ability describing structural characteristics arrays. addition, they do not support exploration microarchitecture. This paper...

10.1109/ispass55109.2022.00016 article EN 2022-05-01

A Fine-Grained Modeling Approach for Systolic Array-Based Accelerator

OPENALEX - Publications

Yuhang Li Mei Wen Jiawei Fei Junzhong Shen Yasong Cao

The systolic array provides extremely high efficiency for running matrix multiplication and is one of the mainstream architectures today’s deep learning accelerators. In order to develop efficient accelerators, people usually employ simulators make design trade-offs. However, current suffer from coarse-grained modeling methods ideal assumptions, which limits their ability describe structural characteristics arrays. addition, they do not support exploration microarchitecture. This paper...

10.3390/electronics11182928 article EN Electronics 2022-09-15

MZ Core: An Enhanced Matrix Acceleration Engine for HPC/ AI Applications

OPENALEX - Publications

Yasong Cao Mei Wen Junzhong Shen Sheng Liu Zhi Wang and 6 more

The convergence of High-Performance Computing (HPC) and Artificial Intelligence (AI) has become a promising trend. Due to the different computation patterns HPC AI applications, it's challenging design an appropriate architecture balance their demand. To address this, we propose Matrix Zone (MZ), enhanced Systolic Array-based matrix engine that accelerates General Multiplication (GEMM) for both applications. We develop semi-memory hierarchy reduce on-chip area consumption data stitching...

10.1109/hpcc-dss-smartcity-dependsys57074.2022.00050 article EN 2022-12-01

Embrace the Conflicts: Exploring the Integration of Single Port Memory in Systolic Array-based Accelerators

OPENALEX - Publications

Renyu Yang Junzhong Shen Mei Wen Yasong Cao Yuhang Li

On-chip memory is one of the core components deep learning accelerators. In general, area overhead on-chip accounts for over 25 % total chip area. With increasing complexity algorithms, it will become a challenge accelerators to integrate much larger responding algorithm needs. To solve it, this paper explores use Single Port memory(SPM) in systolic array based We propose an efficient address transformation method avoid conflict simultaneous read and write requests on SPM. addition,...

10.1109/hpcc-dss-smartcity-dependsys53884.2021.00044 article EN 2021-12-01

Research on Vegetable Pricing Strategy of Supermarket Based on Statistical Analysis and Optimization Algorithm

OPENALEX - Publications

Xinpeng Yu Xiaoqiao Qin Yasong Cao Ninghui Wu Yaqi Tu and 1 more

With the development of social economy, consumers' demand for quality vegetables is increasing, and vegetable goods changes over time. In this paper, we address issue stocking volume pricing strategy in superstores adopt statistical methods programming language data preprocessing, including searching, cleaning, transforming, integrating statute, aiming to optimize vegetables. First, study analyzes relationship between category single product sales time through Pearson coefficient reveal...

10.25236/ajcis.2023.061315 article EN Academic Journal of Computing & Information Science 2023-01-01

BP-Im2col: Implicit Im2col Supporting AI Backpropagation on Systolic Arrays

OPENALEX - Publications

Shuicheng Yan Mei Wen Junzhong Shen Yasong Cao Minjin Tang and 3 more

State-of-the-art systolic array-based accelerators adopt the traditional im2col algorithm to accelerate inference of convolutional layers. However, cannot efficiently support AI backpropagation. Backpropagation in layers involves performing transposed convolution and dilated convolution, which usually introduces plenty zero-spaces into feature map or kernel. The zero-space data reorganization interfere with continuity training incur additional non-negligible overhead terms off- on-chip...

10.48550/arxiv.2209.09434 preprint EN other-oa arXiv (Cornell University) 2022-01-01

S-SIM: A Simulator for Systolic Array-based DNN Accelerators with Tile Access Awareness

OPENALEX - Publications

Yuhang Li Mei Wen Renyu Yang Junzhong Shen Yasong Cao and 1 more

As NN accelerators emerging, many analytical models are presented to help designers carry out hardware design space exploration. However, these cannot accurately simulate the systolic array-based accelerator due their pervasiveness or abstraction. In this paper, we propose a compute-centric simulator driven by execution of events from tiles mapping matrix, which can model accelerator. The focuses on conflicts when tile is used for data access, various interruptions caused resource...

10.1109/iscas48785.2022.9937624 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28