Xitian Fan

ORCID: 0000-0001-8698-6584
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • CCD and CMOS Imaging Sensors
  • Advanced Image and Video Retrieval Techniques
  • Advanced Memory and Neural Computing
  • Brain Tumor Detection and Classification
  • Algorithms and Data Compression
  • Embedded Systems Design Techniques
  • Network Security and Intrusion Detection
  • Parallel Computing and Optimization Techniques
  • Industrial Vision Systems and Defect Detection
  • Network Packet Processing and Optimization
  • Domain Adaptation and Few-Shot Learning
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • VLSI and Analog Circuit Testing
  • Advanced Computing and Algorithms

Shanghai Institute of Computing Technology
2022-2023

Fudan University
2013-2022

State Key Laboratory of ASIC and System
2013

In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based accelerator with all layers mapped on one chip so that different can concurrently a pipelined structure increase throughput. A methodology which find...

10.1109/fpl.2016.7577308 article EN 2016-08-01

Fast sorting of large-scale data is an essential task for centers. In previous works, the existing computational model kernel still results in lower bandwidth utilization on external memory bus. And execution merge operations sort circuit FPGAs depends control commands from host CPU. this case, not fully offloaded to hardware layer acceleration, resulting a performance loss. We design on-chip controller efficiently command process. The proposed has ability schedule multiple computing kernels...

10.1145/3716392 article EN ACM Transactions on Reconfigurable Technology and Systems 2025-02-05

In the domain of password recovery, deep learning has emerged as a pivotal technology for enhancing recovery efficiency. Despite its effectiveness, inherent computation complexity learning-based generation algorithms poses substantial challenges, particularly in achieving synergistic acceleration between inference, and plaintext encryption process. this paper, we introduce PassRecover, multi-FPGA-based computing system that can simultaneously accelerate learning-driven an end-to-end manner....

10.3390/electronics14071415 article EN Electronics 2025-03-31

Abstract Background Brain tumor segmentation is a challenging problem in medical image processing and analysis. It very time-consuming error-prone task. In order to reduce the burden on physicians improve accuracy, computer-aided detection (CAD) systems need be developed. Due powerful feature learning ability of deep technology, many learning-based methods have been applied brain CAD achieved satisfactory accuracy. However, neural networks high computational complexity, process consumes...

10.1186/s12859-021-04347-6 article EN cc-by BMC Bioinformatics 2021-09-07

Many convolutional neural network (CNN) accelerators are proposed to exploit the sparsity of networks recently enjoy benefits both computation and memory reduction. However, most cannot activations weights. For those works that opportunities, they achieve stable load balance through a static scheduling (SS) strategy, which is vulnerable distribution. In this work, balanced compressed sparse row format dynamic strategy improve balance. A set-associate structure also presented tradeoff...

10.1109/tvlsi.2021.3060041 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2021-03-09

With the development of machine learning technology, exploration energy-efficient and flexible architectures for object inference algorithms is growing interest in recent years. However, not many publications concentrate on a coarsegrained reconfigurable architecture (CGRA) algorithms. This paper provides stream processing, dual-track programming CGRA-based approach to address inherent computing characteristics inference. Based proposed approach, an called CGRA (SDT-CGRA) presented as...

10.1109/tvlsi.2018.2797600 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2018-02-12

Abstract Transformers have been widely used in various computer vision applications. Compared to traditional convolutional neural networks (CNNs), transformer's inference includes plenty of non‐linear operations, such as softmax and Gaussian error linear units (GELU). As the scale transformers grows, an efficient hardware implementation these operations is significant. However, current works network accelerators focus on CNN less attention paid transformer. In addition, most FPGA‐based or...

10.1049/ell2.12751 article EN cc-by-nc-nd Electronics Letters 2023-03-01

This paper proposes a high performance hardware architecture of Speeded Up Robust Features (SURF) algorithm based on OpenSURF. In order to achieve processing frame rate, the is designed with several characteristics. Firstly, sliding window method proposed extract feature points in parallel at selected scale levels. As result, time cost extraction can be greatly reduced. Secondly, data reuse strategy orientation generation and descriptor reduce memory access times. this way, 3.87x 2.25X...

10.1109/fpt.2013.6718346 article EN 2013-12-01

Fast regular expression matching is an essential task for deep packet inspection. In previous works, the engine on FPGA struggled to achieve ideal balance between resource consumption and throughput. Speculation enumerative computation exploits statistical properties of deterministic finite automata, allowing more efficient pattern matching. Existing related designs mostly revolve around vector instructions multiple processors/cores or SIMD instruction sets, with a lack implementation...

10.1145/3655626 article EN ACM Transactions on Reconfigurable Technology and Systems 2024-04-01

Many models combining Transformers with convolutional neural networks (CNNs) for computer vision tasks have achieved state-of-the-art results. However, due to the different computation patterns between attention and convolution, using a dedicated Transformer or CNN accelerator will inevitably reduce computing efficiency of other. To overcome this problem, we propose unified architecture convolution on FPGA. We runtime overhead by offloading part self-attention computations offline before...

10.1109/iscas46773.2023.10182145 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2023-05-21

Due to the fact that FPGA on-chip memory capacity increases significantly, feature maps and weights of convolutional layers can be stored on chip, which reduce data movement between off-chip memory. Hence, bottleneck shift from bandwidth computing resources in layers, will improve performance dramatically. Under this circumstance, paper quantitatively analyzes how design hardware architecture based roofline model optimize under constraints available propose an efficient architecture. Our...

10.1109/fpt.2018.00052 article EN 2018-12-01

This paper presents a new type of coarse-grained reconfigurable architecture (CGRA) for the object inference domain in machine learning. The proposed CGRA is optimized stream processing and correspondent programming model called dual-track proposed. realized Verilog HDL implemented SMIC 55 nm process, with footprint 3.79 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> consuming 1.79 W at 500 MHz. To evaluate performance, eight...

10.1109/fpl.2016.7577309 article EN 2016-08-01

In this brief, an FPGA-based solution is proposed to show the computing efficiency on rotated object detection based R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> Det algorithm. The key idea of our approach firstly design reconfigurable neural processing units (NPU) for convolutional networks (CNN) and a specific architecture spatial operations, then adopt novel scheduling scheme deal with data dependency these modules. When...

10.1109/tcsii.2022.3142807 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2022-01-13

Lightweight convolutional neural networks (CNNs) have become increasingly popular due to their lower computational complexity and fewer memory accesses with equivalent accuracy compared previous CNN models. However, the newly proposed bring new challenges efficient hardware design, such as, in EfficientNet, depthwise convolution, squeeze-and-excitation (SE) module, swish/sigmoid functions. Although individual engine architecture could achieve a high computing efficiency for standard...

10.1109/icfpt52863.2021.9609919 article EN 2021-11-23

Algorithms such as Bag of Words and Simhash have been widely used in image recognition. To achieve better performance well energy-efficiency, a hardware implementation these two algorithms is proposed this paper. the best our knowledge, it first time that implemented on for recognition purpose. The able to generate fingerprint an find closest match database accurately. It Xilinx's Virtex-6 SX475T FPGA. Tradeoffs between high low overhead are obtained through proper parallelization....

10.1109/fpt.2013.6718403 article EN 2013-12-01
Coming Soon ...