NFDI4DS | UHH-SEMS - Publication Details

Xitian Fan

ORCID: 0000-0001-8698-6584

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5066019284

Research Areas

Advanced Neural Network Applications
CCD and CMOS Imaging Sensors
Advanced Image and Video Retrieval Techniques
Advanced Memory and Neural Computing
Brain Tumor Detection and Classification
Algorithms and Data Compression
Embedded Systems Design Techniques
Network Security and Intrusion Detection
Parallel Computing and Optimization Techniques
Industrial Vision Systems and Defect Detection
Network Packet Processing and Optimization
Domain Adaptation and Few-Shot Learning
Advanced Vision and Imaging
Robotics and Sensor-Based Localization
VLSI and Analog Circuit Testing
Advanced Computing and Algorithms

Shanghai Institute of Computing Technology
2022-2023

Fudan University
2013-2022

State Key Laboratory of ASIC and System
2013

A high performance FPGA-based accelerator for large-scale convolutional neural networks

OPENALEX - Publications

Huimin Li Xitian Fan Jiao Li Wei Cao Xuegong Zhou and 1 more

In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based accelerator with all layers mapped on one chip so that different can concurrently a pipelined structure increase throughput. A methodology which find...

10.1109/fpl.2016.7577308 article EN 2016-08-01

FPGA-Based Large-Scale Sorting with Optimized Bandwidth Utilization

OPENALEX - Publications

Mingqian Sun Guangwei Xie Fan Zhang Wei Guo Xitian Fan and 2 more

Fast sorting of large-scale data is an essential task for centers. In previous works, the existing computational model kernel still results in lower bandwidth utilization on external memory bus. And execution merge operations sort circuit FPGAs depends control commands from host CPU. this case, not fully offloaded to hardware layer acceleration, resulting a performance loss. We design on-chip controller efficiently command process. The proposed has ability schedule multiple computing kernels...

10.1145/3716392 article EN ACM Transactions on Reconfigurable Technology and Systems 2025-02-05

PassRecover: A Multi-FPGA System for End-to-End Offline Password Recovery Acceleration

OPENALEX - Publications

Guangwei Xie Xitian Fan Zhongchen Huang Wei Cao Fan Zhang

In the domain of password recovery, deep learning has emerged as a pivotal technology for enhancing recovery efficiency. Despite its effectiveness, inherent computation complexity learning-based generation algorithms poses substantial challenges, particularly in achieving synergistic acceleration between inference, and plaintext encryption process. this paper, we introduce PassRecover, multi-FPGA-based computing system that can simultaneously accelerate learning-driven an end-to-end manner....

10.3390/electronics14071415 article EN Electronics 2025-03-31

MRI-based brain tumor segmentation using FPGA-accelerated neural network

OPENALEX - Publications

Siyu Xiong Guoqing Wu Xitian Fan Xuan Feng Zhongcheng Huang and 6 more

Abstract Background Brain tumor segmentation is a challenging problem in medical image processing and analysis. It very time-consuming error-prone task. In order to reduce the burden on physicians improve accuracy, computer-aided detection (CAD) systems need be developed. Due powerful feature learning ability of deep technology, many learning-based methods have been applied brain CAD achieved satisfactory accuracy. However, neural networks high computational complexity, process consumes...

10.1186/s12859-021-04347-6 article EN cc-by BMC Bioinformatics 2021-09-07

SWM: A High-Performance Sparse-Winograd Matrix Multiplication CNN Accelerator

OPENALEX - Publications

Di Wu Xitian Fan Wei Cao Lingli Wang

Many convolutional neural network (CNN) accelerators are proposed to exploit the sparsity of networks recently enjoy benefits both computation and memory reduction. However, most cannot activations weights. For those works that opportunities, they achieve stable load balance through a static scheduling (SS) strategy, which is vulnerable distribution. In this work, balanced compressed sparse row format dynamic strategy improve balance. A set-associate structure also presented tradeoff...

10.1109/tvlsi.2021.3060041 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2021-03-09

Stream Processing Dual-Track CGRA for Object Inference

OPENALEX - Publications

Xitian Fan Di Wu Wei Cao Wayne Luk Lingli Wang

With the development of machine learning technology, exploration energy-efficient and flexible architectures for object inference algorithms is growing interest in recent years. However, not many publications concentrate on a coarsegrained reconfigurable architecture (CGRA) algorithms. This paper provides stream processing, dual-track programming CGRA-based approach to address inherent computing characteristics inference. Based proposed approach, an called CGRA (SDT-CGRA) presented as...

10.1109/tvlsi.2018.2797600 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2018-02-12

A high speed reconfigurable architecture for softmax and GELU in vision transformer

OPENALEX - Publications

Tianyang Li Fan Zhang Guangwei Xie Xitian Fan Yanzhao Gao and 1 more

Abstract Transformers have been widely used in various computer vision applications. Compared to traditional convolutional neural networks (CNNs), transformer's inference includes plenty of non‐linear operations, such as softmax and Gaussian error linear units (GELU). As the scale transformers grows, an efficient hardware implementation these operations is significant. However, current works network accelerators focus on CNN less attention paid transformer. In addition, most FPGA‐based or...

10.1049/ell2.12751 article EN cc-by-nc-nd Electronics Letters 2023-03-01

Implementation of high performance hardware architecture of OpenSURF algorithm on FPGA

OPENALEX - Publications

Xitian Fan Chenlu Wu Wei Cao Xuegong Zhou Shengye Wang and 1 more

This paper proposes a high performance hardware architecture of Speeded Up Robust Features (SURF) algorithm based on OpenSURF. In order to achieve processing frame rate, the is designed with several characteristics. Firstly, sliding window method proposed extract feature points in parallel at selected scale levels. As result, time cost extraction can be greatly reduced. Secondly, data reuse strategy orientation generation and descriptor reduce memory access times. this way, 3.87x 2.25X...

10.1109/fpt.2013.6718346 article EN 2013-12-01

PTME: A Regular Expression Matching Engine Based on Speculation and Enumerative Computation on FPGA

OPENALEX - Publications

Mingqian Sun Guangwei Xie Fan Zhang Wei Guo Xitian Fan and 3 more

Fast regular expression matching is an essential task for deep packet inspection. In previous works, the engine on FPGA struggled to achieve ideal balance between resource consumption and throughput. Speculation enumerative computation exploits statistical properties of deterministic finite automata, allowing more efficient pattern matching. Existing related designs mostly revolve around vector instructions multiple processors/cores or SIMD instruction sets, with a lack implementation...

10.1145/3655626 article EN ACM Transactions on Reconfigurable Technology and Systems 2024-04-01

Unified Accelerator for Attention and Convolution in Inference Based on FPGA

OPENALEX - Publications

Tianyang Li Fan Zhang Xitian Fan Jianliang Shen Wei Guo and 1 more

Many models combining Transformers with convolutional neural networks (CNNs) for computer vision tasks have achieved state-of-the-art results. However, due to the different computation patterns between attention and convolution, using a dedicated Transformer or CNN accelerator will inevitably reduce computing efficiency of other. To overcome this problem, we propose unified architecture convolution on FPGA. We runtime overhead by offloading part self-attention computations offline before...

10.1109/iscas46773.2023.10182145 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2023-05-21

High Throughput CNN Accelerator Design Based on FPGA

OPENALEX - Publications

Liang Xie Xitian Fan Wei Cao Lingli Wang

Due to the fact that FPGA on-chip memory capacity increases significantly, feature maps and weights of convolutional layers can be stored on chip, which reduce data movement between off-chip memory. Hence, bottleneck shift from bandwidth computing resources in layers, will improve performance dramatically. Under this circumstance, paper quantitatively analyzes how design hardware architecture based roofline model optimize under constraints available propose an efficient architecture. Our...

10.1109/fpt.2018.00052 article EN 2018-12-01

DT-CGRA: Dual-track coarse-grained reconfigurable architecture for stream applications

OPENALEX - Publications

Xitian Fan Huimin Li Wei Cao Lingli Wang

This paper presents a new type of coarse-grained reconfigurable architecture (CGRA) for the object inference domain in machine learning. The proposed CGRA is optimized stream processing and correspondent programming model called dual-track proposed. realized Verilog HDL implemented SMIC 55 nm process, with footprint 3.79 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> consuming 1.79 W at 500 MHz. To evaluate performance, eight...

10.1109/fpl.2016.7577309 article EN 2016-08-01

Acceleration of Rotated Object Detection on FPGA

OPENALEX - Publications

Xitian Fan Guangwei Xie Zhongchen Huang Wei Cao Lingli Wang

In this brief, an FPGA-based solution is proposed to show the computing efficiency on rotated object detection based R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> Det algorithm. The key idea of our approach firstly design reconfigurable neural processing units (NPU) for convolutional networks (CNN) and a specific architecture spatial operations, then adopt novel scheduling scheme deal with data dependency these modules. When...

10.1109/tcsii.2022.3142807 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2022-01-13

LETA: A lightweight exchangeable-track accelerator for efficientnet based on FPGA

OPENALEX - Publications

Jingbo Gao Yu Qian Yihan Hu Xitian Fan Wai-Shing Luk and 2 more

Lightweight convolutional neural networks (CNNs) have become increasingly popular due to their lower computational complexity and fewer memory accesses with equivalent accuracy compared previous CNN models. However, the newly proposed bring new challenges efficient hardware design, such as, in EfficientNet, depthwise convolution, squeeze-and-excitation (SE) module, swish/sigmoid functions. Although individual engine architecture could achieve a high computing efficiency for standard...

10.1109/icfpt52863.2021.9609919 article EN 2021-11-23

A hardware implementation of Bag of Words and Simhash for image recognition

OPENALEX - Publications

Shengye Wang Liang Chen Xuegong Zhou Wei Cao Chenlu Wu and 2 more

Algorithms such as Bag of Words and Simhash have been widely used in image recognition. To achieve better performance well energy-efficiency, a hardware implementation these two algorithms is proposed this paper. the best our knowledge, it first time that implemented on for recognition purpose. The able to generate fingerprint an find closest match database accurately. It Xilinx's Virtex-6 SX475T FPGA. Tradeoffs between high low overhead are obtained through proper parallelization....

10.1109/fpt.2013.6718403 article EN 2013-12-01

Coming Soon ...