NFDI4DS | UHH-SEMS - Publication Details

Xuefei Ning

ORCID: 0000-0003-2209-8312

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5086217226

Research Areas

Advanced Neural Network Applications
Adversarial Robustness in Machine Learning
Advanced Memory and Neural Computing
Topic Modeling
Advanced Image and Video Retrieval Techniques
Anomaly Detection Techniques and Applications
Generative Adversarial Networks and Image Synthesis
Natural Language Processing Techniques
CCD and CMOS Imaging Sensors
Ferroelectric and Negative Capacitance Devices
Autonomous Vehicle Technology and Safety
Machine Learning and Data Classification
Machine Learning in Materials Science
Neural Networks and Applications
Advanced Vision and Imaging
Software System Performance and Reliability
Image Retrieval and Classification Techniques
Domain Adaptation and Few-Shot Learning
Robotics and Sensor-Based Localization
Software Engineering Research
Tensor decomposition and applications
Privacy-Preserving Technologies in Data
Speech Recognition and Synthesis
Advanced Malware Detection Techniques
Model Reduction and Neural Networks

Tsinghua University
2017-2025

Huawei Technologies (China)
2022

Center for Information Technology
2018

Machine Learning for Electronic Design Automation: A Survey

OPENALEX - Publications

Guyue Huang Jingbo Hu Yifan He Jialong Liu Mingyuan Ma and 11 more

With the down-scaling of CMOS technology, design complexity very large-scale integrated is increasing. Although application machine learning (ML) techniques in electronic automation (EDA) can trace its history back to 1990s, recent breakthrough ML and increasing EDA tasks have aroused more interest incorporating solve tasks. In this article, we present a comprehensive review existing for studies, organized following hierarchy.

10.1145/3451179 article EN ACM Transactions on Design Automation of Electronic Systems 2021-06-05

Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems

OPENALEX - Publications

Lixue Xia Mengyun Liu Xuefei Ning Krishnendu Chakrabarty Yu Wang

An RRAM-based computing system (RCS) is an attractive hardware platform for implementing neural algorithms. Online training RCS enables hardware-based learning a given application and reduces the additional error caused by device parameter variations. However, high occurrence rate of hard faults due to immature fabrication processes limited write endurance restrict applicability on-line RCS. We propose fault-tolerant method that alternates between fault-detection phase phase. In phase,...

10.1145/3061639.3062248 article EN 2017-06-13

FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning

OPENALEX - Publications

Minxue Tang Xuefei Ning Yitu Wang Jingwei Sun Yu Wang and 2 more

Client-wise data heterogeneity is one of the major issues that hinder effective training in federated learning (FL). Since distribution on each client may vary dramatically, selection strategy can significantly influence convergence rate FL process. Active strategies are popularly proposed recent studies. However, they neglect loss correlations between clients and achieve only marginal improvement compared to uniform strategy. In this work, we propose FedCoran FLframework built a...

10.1109/cvpr52688.2022.00986 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAs

OPENALEX - Publications

Shulin Zeng J Liu Guohao Dai Xinhao Yang Tianyu Fu and 12 more

Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification quantization are commonly used to mitigate the gap between LLM's computation/memory overheads hardware capacity. existing GPU transformer-based accelerators cannot efficiently process compressed LLMs, due following unresolved challenges: low computational efficiency,...

10.1145/3626202.3637562 article EN cc-by 2024-04-01

A Survey on Efficient Inference for Large Language Models

OPENALEX - Publications

Zixuan Zhou Xuefei Ning Ke Hong Tianyu Fu Jiaming Xu and 10 more

Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within field been directed towards developing techniques aimed at enhancing efficiency inference. This paper presents a comprehensive survey existing literature on efficient We start by analyzing primary causes...

10.48550/arxiv.2404.14294 preprint EN arXiv (Cornell University) 2024-04-22

Hu-Fu: Hardware and Software Collaborative Attack Framework Against Neural Networks

OPENALEX - Publications

Wenshuo Li Jincheng Yu Xuefei Ning Pengjun Wang Qi Wei and 2 more

Recently, Deep Learning (DL), especially Convolutional Neural Network (CNN), develops rapidly and is applied to many tasks, such as image classification, face recognition, segmentation, human detection. Due its superior performance, DL-based models have a wide range of application in areas, some which are extremely safety-critical, e.g. intelligent surveillance autonomous driving. the latency privacy problem cloud computing, embedded accelerators popular these safety-critical areas. However,...

10.1109/isvlsi.2018.00093 article EN 2018-07-01

Fault-Tolerant Training Enabled by On-Line Fault Detection for RRAM-Based Neural Computing Systems

OPENALEX - Publications

Lixue Xia Mengyun Liu Xuefei Ning Krishnendu Chakrabarty Yu Wang

An resistive random-access memory (RRAM)-based computing system (RCS) is an attractive hardware platform for implementing neural algorithms. On-line training RCS enables hardware-based learning a given application and reduces the additional error caused by device parameter variations. However, high occurrence rate of hard faults due to immature fabrication processes limited write endurance restrict applicability on-line RCS. We propose fault-tolerant method that alternates between...

10.1109/tcad.2018.2855145 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018-07-12

Physical Adversarial Attack on Vehicle Detector in the Carla Simulator

OPENALEX - Publications

Tong Wu Xuefei Ning Wenshuo Li Ranran Huang Huazhong Yang and 1 more

In this paper, we tackle the issue of physical adversarial examples for object detectors in wild. Specifically, proposed to generate patterns be applied on vehicle surface so that it's not recognizable by photo-realistic Carla simulator. Our approach contains two main techniques, an Enlarge-and-Repeat process and a Discrete Searching method, craft mosaic-like textures without access neither model weight detector nor differential rendering procedure. The experimental results demonstrate...

10.48550/arxiv.2007.16118 preprint EN other-oa arXiv (Cornell University) 2020-01-01

FMC-LLM: Enabling FPGAs for Efficient Batched Decoding of 70B+ LLMs with a Memory-Centric Streaming Architecture

OPENALEX - Publications

Wenheng Ma Xinhao Yang Shulin Zeng T.Q. Liu Libo Shen and 12 more

For large language model (LLM) acceleration, FPGAs face two challenges: insufficient peak computing performance and unacceptable accuracy loss of compression. This paper proposes FMC-LLM to enable for efficient batched decoding 70B+ LLMs.

10.1145/3706628.3708863 article EN 2025-02-26

TB-STC: Transposable Block-wise N:M Structured Sparse Tensor Core

OPENALEX - Publications

Jun Liu Shulin Zeng Junbo Zhao Li Ding Zeyu Wang and 6 more

10.1109/hpca61900.2025.00075 article EN 2025-03-01

Training-Free and Hardware-Friendly Acceleration for Diffusion Models via Similarity-based Token Pruning

OPENALEX - Publications

Enming Zhang Jiayi Tang Xuefei Ning Linfeng Zhang

The excellent performance of diffusion models in image generation is always accompanied by overlarge computation costs, which have prevented the application edge devices and interactive applications. Previous works mainly focus on using fewer sampling steps compressing denoising network models, while this paper proposes to accelerate introducing SiTo, a similarity-based token pruning method that adaptive prunes redundant tokens input data. SiTo designed maximize similarity between model...

10.1609/aaai.v39i9.33071 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

FTT-NAS: Discovering Fault-Tolerant Neural Architecture

OPENALEX - Publications

Wenshuo Li Xuefei Ning Guangjun Ge Xiaohong Chen Yu Wang and 1 more

With the fast evolvement of deep-learning specific embedded computing systems, applications powered by deep learning are moving from cloud to edge. When deploying NNs onto edge devices under complex environments, there various types possible faults: soft errors caused atmospheric neutrons and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers. Thus safety risk neural networks at in safety-critic is now drawing much attention. In this paper, we...

10.1109/asp-dac47756.2020.9045324 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2020-01-01

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

OPENALEX - Publications

Tianchen Zhao Xuefei Ning Ke Hong Zhongyuan Qiu Pu Lu and 6 more

Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge application to resource-constrained vehicles. One reason this high resource consumption is the presence of large number redundant background points Lidar point clouds, resulting spatial redundancy both voxel BEV map representations. To address issue, we propose an adaptive inference framework called Ada3D,...

10.1109/iccv51070.2023.01625 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA

OPENALEX - Publications

Jincheng Yu Yiming Hu Xuefei Ning Jiantao Qiu Kaiyuan Guo and 2 more

In recent years, Convolutional Neural Network (CNN) has been widely applied in computer vision tasks. FPGAs have explored to accelerate CNNs due its high performance, energy efficiency, and flexibility. By fusing multiple layers CNN, the intermediate data transfer can be reduced. With a faster algorithm using Winograd transformation, computation of convolution further accelerated. However, previous accelerators with cross-layer or are designed for particular CNN model. The FPGA should...

10.1109/fpt.2017.8280147 article EN 2017-12-01

Evaluating Efficient Performance Estimators of Neural Architectures

OPENALEX - Publications

Xuefei Ning Changcheng Tang Wenshuo Li Zixuan Zhou Shuang Liang and 2 more

Conducting efficient performance estimations of neural architectures is a major challenge in architecture search (NAS). To reduce the training costs NAS, one-shot estimators (OSEs) amortize by sharing parameters one "supernet" between all architectures. Recently, zero-shot (ZSEs) that involve no are proposed to further evaluation cost. Despite high efficiency these estimators, quality such has not been thoroughly studied. In this paper, we conduct an extensive and organized assessment OSEs...

10.48550/arxiv.2008.03064 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Explainable liver tumor delineation in surgical specimens using hyperspectral imaging and deep learning

OPENALEX - Publications

Yating Zhang Si Yu Xueyu Zhu Xuefei Ning Wei Liu and 5 more

Surgical removal is the primary treatment for liver cancer, but frequent recurrence caused by residual malignant tissue remains an important challenge, as leads to high mortality. It unreliable distinguish tumors from normal tissues merely under visual inspection. Hyperspectral imaging (HSI) has been proved be a promising technology intra-operative use capturing spatial and spectral information of in fast, non-contact label-free manner. In this work, we investigated feasibility HSI tumor...

10.1364/boe.432654 article EN cc-by Biomedical Optics Express 2021-06-18

Real-time object detection towards high power efficiency

OPENALEX - Publications

Jincheng Yu Kaiyuan Guo Yiming Hu Xuefei Ning Jiantao Qiu and 6 more

In recent years, Convolutional Neural Network (CNN) has been widely applied in computer vision tasks and achieved significant improvement image object detection. The CNN methods consume more computation as well storage, so GPU is introduced for real-time However, due to the high power consumption of GPU, it difficult adopt mobile applications like automatic driving. previous work proposes some optimizing techniques lower detection on or FPGA. first Low-Power Image Recognition Challenge...

10.23919/date.2018.8342100 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

Compressed CNN Training with FPGA-based Accelerator

OPENALEX - Publications

Kaiyuan Guo Shuang Liang Jincheng Yu Xuefei Ning Wenshuo Li and 2 more

Training convolutional neural network (CNN) usually requires large amount of computation resource, time and power. Researchers cloud service providers in this region needs fast efficient training system. GPU is currently the best candidate for CNN training. But FPGAs have already shown good performance energy efficiency as inference accelerators. In work, we design a compressed process together with an FPGA-based accelerator We adopt two widely used model compression methods, quantization...

10.1145/3289602.3293977 article EN 2019-02-20

A Generic Graph-based Neural Architecture Encoding Scheme for Predictor-based NAS

OPENALEX - Publications

Xuefei Ning Yin Zheng Tianchen Zhao Yu Wang Huazhong Yang

This work proposes a novel Graph-based neural ArchiTecture Encoding Scheme, a.k.a. GATES, to improve the predictor-based architecture search. Specifically, different from existing graph-based schemes, GATES models operations as transformation of propagating information, which mimics actual data processing architecture. is more reasonable modeling architectures, and can encode architectures both "operation on node" edge" cell search spaces consistently. Experimental results various confirm...

10.48550/arxiv.2004.01899 preprint EN other-oa arXiv (Cornell University) 2020-01-01

FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture

OPENALEX - Publications

Xuefei Ning Guangjun Ge Wenshuo Li Zhenhua Zhu Yin Zheng and 4 more

With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from cloud to edge. When deploying neural networks (NNs) onto devices under complex environments, there various types possible faults: soft errors caused cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers, so on. Thus, safety risk NNs is now drawing much attention. In this article, after analysis faults in NN...

10.1145/3460288 article EN ACM Transactions on Design Automation of Electronic Systems 2021-08-12

Instruction Driven Cross-layer CNN Accelerator for Fast Detection on FPGA

OPENALEX - Publications

Jincheng Yu Guangjun Ge Yiming Hu Xuefei Ning Jiantao Qiu and 3 more

In recent years, Convolutional Neural Networks (CNNs) have been widely applied in computer vision and achieved significant improvements object detection tasks. Although there are many optimizing methods to speed up CNN-based algorithms, it is still difficult deploy algorithms on real-time low-power systems. Field-Programmable Gate Array (FPGA) has explored as a platform for accelerating CNN due its promising performance, high energy efficiency, flexibility. Previous works show that the...

10.1145/3283452 article EN ACM Transactions on Reconfigurable Technology and Systems 2018-09-30

OMS-DPM: Optimizing the Model Schedule for Diffusion Probabilistic Models

OPENALEX - Publications

Enshu Liu Xuefei Ning Zinan Lin Huazhong Yang Yu Wang

Diffusion probabilistic models (DPMs) are a new class of generative that have achieved state-of-the-art generation quality in various domains. Despite the promise, one major drawback DPMs is slow speed due to large number neural network evaluations required process. In this paper, we reveal an overlooked dimension -- model schedule for optimizing trade-off between and speed. More specifically, observe small models, though having worse when used alone, could outperform certain steps....

10.48550/arxiv.2306.08860 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...