NFDI4DS | UHH-SEMS - Publication Details

Zhuofu Tao

ORCID: 0000-0003-0951-1811

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5005340074

Research Areas

Ferroelectric and Negative Capacitance Devices
Advanced Graph Neural Networks
Embedded Systems Design Techniques
Graph Theory and Algorithms
Multimodal Machine Learning Applications
Parallel Computing and Optimization Techniques
Topic Modeling
Natural Language Processing Techniques
VLSI and FPGA Design Techniques
Interconnection Networks and Systems
Advanced Memory and Neural Computing

University of California, Los Angeles
2022-2025

Carnegie Mellon University
2021

LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

OPENALEX - Publications

Zhuofu Tao C.-L. Wu Yuan Liang Kun Wang Lei He

Graph convolutional networks (GCNs) have been introduced to effectively process non-Euclidean graph data. However, GCNs incur large amounts of irregularity in computation and memory access, which prevents efficient use traditional neural network accelerators. Moreover, existing dedicated GCN accelerators demand high volumes are difficult implement onto resource limited edge devices. In this work, we propose LW-GCN, a lightweight FPGA-based accelerator with software-hardware co-designed...

10.1145/3550075 article EN ACM Transactions on Reconfigurable Technology and Systems 2022-08-04

Productively Generating a High-Performance Linear Algebra Library on FPGAs

OPENALEX - Publications

X. Q. Hao Mingzhe Zhang Ce Sun Zhuofu Tao Hongbo Rong and 5 more

Linear algebra computations can be greatly accelerated using spatial accelerators on FPGAs. As a standard building block of linear applications, BLAS covers wide range compute patterns that vary vastly in data reuse, bottleneck resources, matrix storage layouts, and types. However, existing implementations routines FPGAs are stuck the dilemma productivity performance. They either require extensive human effort or fail to leverage properties for acceleration. We introduce Lasa, framework...

10.1145/3723046 article EN ACM Transactions on Reconfigurable Technology and Systems 2025-03-11

FPGA Overlay处理器加速AI计算

OPENALEX - Publications

Lei He Kun Wang Chen Wu Zhuofu Tao Siyuan Miao and 1 more

10.1360/ssi-2024-0351 article TR Scientia Sinica Informationis 2025-02-01

SkeletonGCN: A Simple Yet Effective Accelerator For GCN Training

OPENALEX - Publications

Chen Wu Zhuofu Tao Kun Wang Lei He

Graph Convolutional Networks (GCNs) have shown great results but come with large computation costs and memory overhead. Recently, sampling-based approaches been proposed to alter input sizes, which allows GCN workloads align hardware constraints. Motivated by this flexibility, we propose an FPGA-based accelerator, named SkeletonGCN, along multiple software-hardware co-optimizations improve training efficiency. We first quantize all feature adjacency matrices of from FP32 SINT16. then...

10.1109/fpl57034.2022.00073 article EN 2022-08-01

Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models

OPENALEX - Publications

Steven Y. Feng Kevin Lü Zhuofu Tao Malihe Alikhani Teruko Mitamura and 2 more

We investigate the use of multimodal information contained in images as an effective method for enhancing commonsense Transformer models text generation. perform experiments using BART and T5 on concept-to-text generation, specifically task generative reasoning, or CommonGen. call our approach VisCTG: Visually Grounded Concept-to-Text Generation. VisCTG involves captioning representing appropriate everyday scenarios, these captions to enrich steer generation process. Comprehensive evaluation...

10.1609/aaai.v36i10.21306 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Lasa: Abstraction and Specialization for Productive and Performant Linear Algebra on FPGAs

OPENALEX - Publications

Xiaochen Hao Mingzhe Zhang Ce Sun Zhuofu Tao Hongbo Rong and 5 more

Linear algebra can often be significantly expedited by spatial accelerators on FPGAs. As a broadly-adopted linear library, BLAS requires extensive optimizations for routines that vary vastly in data reuse, bottleneck resources, matrix storage layouts, and types. Existing solutions are stuck the dilemma of productivity performance. We introduce Lasa, framework composed programming model compiler, addresses abstracting (for productivity) specializing performance) architecture accelerator. Lasa...

10.1109/fccm57271.2023.00013 article EN 2023-05-01

LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

OPENALEX - Publications

Zhuofu Tao Chen Wu Liang Yuan Lei He

Graph convolutional networks (GCNs) have been introduced to effectively process non-euclidean graph data. However, GCNs incur large amounts of irregularity in computation and memory access, which prevents efficient use traditional neural network accelerators. Moreover, existing dedicated GCN accelerators demand high volumes are difficult implement onto resource limited edge devices. In this work, we propose LW-GCN, a lightweight FPGA-based accelerator with software-hardware co-designed...

10.48550/arxiv.2111.03184 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Coming Soon ...