Xuefei Ning

ORCID: 0000-0003-2209-8312
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Adversarial Robustness in Machine Learning
  • Advanced Memory and Neural Computing
  • Topic Modeling
  • Advanced Image and Video Retrieval Techniques
  • Anomaly Detection Techniques and Applications
  • Generative Adversarial Networks and Image Synthesis
  • Natural Language Processing Techniques
  • CCD and CMOS Imaging Sensors
  • Ferroelectric and Negative Capacitance Devices
  • Autonomous Vehicle Technology and Safety
  • Machine Learning and Data Classification
  • Machine Learning in Materials Science
  • Neural Networks and Applications
  • Advanced Vision and Imaging
  • Software System Performance and Reliability
  • Image Retrieval and Classification Techniques
  • Domain Adaptation and Few-Shot Learning
  • Robotics and Sensor-Based Localization
  • Software Engineering Research
  • Tensor decomposition and applications
  • Privacy-Preserving Technologies in Data
  • Speech Recognition and Synthesis
  • Advanced Malware Detection Techniques
  • Model Reduction and Neural Networks

Tsinghua University
2017-2025

Huawei Technologies (China)
2022

Center for Information Technology
2018

With the down-scaling of CMOS technology, design complexity very large-scale integrated is increasing. Although application machine learning (ML) techniques in electronic automation (EDA) can trace its history back to 1990s, recent breakthrough ML and increasing EDA tasks have aroused more interest incorporating solve tasks. In this article, we present a comprehensive review existing for studies, organized following hierarchy.

10.1145/3451179 article EN ACM Transactions on Design Automation of Electronic Systems 2021-06-05

An RRAM-based computing system (RCS) is an attractive hardware platform for implementing neural algorithms. Online training RCS enables hardware-based learning a given application and reduces the additional error caused by device parameter variations. However, high occurrence rate of hard faults due to immature fabrication processes limited write endurance restrict applicability on-line RCS. We propose fault-tolerant method that alternates between fault-detection phase phase. In phase,...

10.1145/3061639.3062248 article EN 2017-06-13

Client-wise data heterogeneity is one of the major issues that hinder effective training in federated learning (FL). Since distribution on each client may vary dramatically, selection strategy can significantly influence convergence rate FL process. Active strategies are popularly proposed recent studies. However, they neglect loss correlations between clients and achieve only marginal improvement compared to uniform strategy. In this work, we propose FedCoran FLframework built a...

10.1109/cvpr52688.2022.00986 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification quantization are commonly used to mitigate the gap between LLM's computation/memory overheads hardware capacity. existing GPU transformer-based accelerators cannot efficiently process compressed LLMs, due following unresolved challenges: low computational efficiency,...

10.1145/3626202.3637562 article EN cc-by 2024-04-01

Large Language Models (LLMs) have attracted extensive attention due to their remarkable performance across various tasks. However, the substantial computational and memory requirements of LLM inference pose challenges for deployment in resource-constrained scenarios. Efforts within field been directed towards developing techniques aimed at enhancing efficiency inference. This paper presents a comprehensive survey existing literature on efficient We start by analyzing primary causes...

10.48550/arxiv.2404.14294 preprint EN arXiv (Cornell University) 2024-04-22

Recently, Deep Learning (DL), especially Convolutional Neural Network (CNN), develops rapidly and is applied to many tasks, such as image classification, face recognition, segmentation, human detection. Due its superior performance, DL-based models have a wide range of application in areas, some which are extremely safety-critical, e.g. intelligent surveillance autonomous driving. the latency privacy problem cloud computing, embedded accelerators popular these safety-critical areas. However,...

10.1109/isvlsi.2018.00093 article EN 2018-07-01

An resistive random-access memory (RRAM)-based computing system (RCS) is an attractive hardware platform for implementing neural algorithms. On-line training RCS enables hardware-based learning a given application and reduces the additional error caused by device parameter variations. However, high occurrence rate of hard faults due to immature fabrication processes limited write endurance restrict applicability on-line RCS. We propose fault-tolerant method that alternates between...

10.1109/tcad.2018.2855145 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018-07-12

In this paper, we tackle the issue of physical adversarial examples for object detectors in wild. Specifically, proposed to generate patterns be applied on vehicle surface so that it's not recognizable by photo-realistic Carla simulator. Our approach contains two main techniques, an Enlarge-and-Repeat process and a Discrete Searching method, craft mosaic-like textures without access neither model weight detector nor differential rendering procedure. The experimental results demonstrate...

10.48550/arxiv.2007.16118 preprint EN other-oa arXiv (Cornell University) 2020-01-01

For large language model (LLM) acceleration, FPGAs face two challenges: insufficient peak computing performance and unacceptable accuracy loss of compression. This paper proposes FMC-LLM to enable for efficient batched decoding 70B+ LLMs.

10.1145/3706628.3708863 article EN 2025-02-26

The excellent performance of diffusion models in image generation is always accompanied by overlarge computation costs, which have prevented the application edge devices and interactive applications. Previous works mainly focus on using fewer sampling steps compressing denoising network models, while this paper proposes to accelerate introducing SiTo, a similarity-based token pruning method that adaptive prunes redundant tokens input data. SiTo designed maximize similarity between model...

10.1609/aaai.v39i9.33071 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

With the fast evolvement of deep-learning specific embedded computing systems, applications powered by deep learning are moving from cloud to edge. When deploying NNs onto edge devices under complex environments, there various types possible faults: soft errors caused atmospheric neutrons and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers. Thus safety risk neural networks at in safety-critic is now drawing much attention. In this paper, we...

10.1109/asp-dac47756.2020.9045324 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2020-01-01

Voxel-based methods have achieved state-of-the-art performance for 3D object detection in autonomous driving. However, their significant computational and memory costs pose a challenge application to resource-constrained vehicles. One reason this high resource consumption is the presence of large number redundant background points Lidar point clouds, resulting spatial redundancy both voxel BEV map representations. To address issue, we propose an adaptive inference framework called Ada3D,...

10.1109/iccv51070.2023.01625 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

In recent years, Convolutional Neural Network (CNN) has been widely applied in computer vision tasks. FPGAs have explored to accelerate CNNs due its high performance, energy efficiency, and flexibility. By fusing multiple layers CNN, the intermediate data transfer can be reduced. With a faster algorithm using Winograd transformation, computation of convolution further accelerated. However, previous accelerators with cross-layer or are designed for particular CNN model. The FPGA should...

10.1109/fpt.2017.8280147 article EN 2017-12-01

Conducting efficient performance estimations of neural architectures is a major challenge in architecture search (NAS). To reduce the training costs NAS, one-shot estimators (OSEs) amortize by sharing parameters one "supernet" between all architectures. Recently, zero-shot (ZSEs) that involve no are proposed to further evaluation cost. Despite high efficiency these estimators, quality such has not been thoroughly studied. In this paper, we conduct an extensive and organized assessment OSEs...

10.48550/arxiv.2008.03064 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Surgical removal is the primary treatment for liver cancer, but frequent recurrence caused by residual malignant tissue remains an important challenge, as leads to high mortality. It unreliable distinguish tumors from normal tissues merely under visual inspection. Hyperspectral imaging (HSI) has been proved be a promising technology intra-operative use capturing spatial and spectral information of in fast, non-contact label-free manner. In this work, we investigated feasibility HSI tumor...

10.1364/boe.432654 article EN cc-by Biomedical Optics Express 2021-06-18

In recent years, Convolutional Neural Network (CNN) has been widely applied in computer vision tasks and achieved significant improvement image object detection. The CNN methods consume more computation as well storage, so GPU is introduced for real-time However, due to the high power consumption of GPU, it difficult adopt mobile applications like automatic driving. previous work proposes some optimizing techniques lower detection on or FPGA. first Low-Power Image Recognition Challenge...

10.23919/date.2018.8342100 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

Training convolutional neural network (CNN) usually requires large amount of computation resource, time and power. Researchers cloud service providers in this region needs fast efficient training system. GPU is currently the best candidate for CNN training. But FPGAs have already shown good performance energy efficiency as inference accelerators. In work, we design a compressed process together with an FPGA-based accelerator We adopt two widely used model compression methods, quantization...

10.1145/3289602.3293977 article EN 2019-02-20

This work proposes a novel Graph-based neural ArchiTecture Encoding Scheme, a.k.a. GATES, to improve the predictor-based architecture search. Specifically, different from existing graph-based schemes, GATES models operations as transformation of propagating information, which mimics actual data processing architecture. is more reasonable modeling architectures, and can encode architectures both "operation on node" edge" cell search spaces consistently. Experimental results various confirm...

10.48550/arxiv.2004.01899 preprint EN other-oa arXiv (Cornell University) 2020-01-01

With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from cloud to edge. When deploying neural networks (NNs) onto devices under complex environments, there various types possible faults: soft errors caused cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers, so on. Thus, safety risk NNs is now drawing much attention. In this article, after analysis faults in NN...

10.1145/3460288 article EN ACM Transactions on Design Automation of Electronic Systems 2021-08-12

In recent years, Convolutional Neural Networks (CNNs) have been widely applied in computer vision and achieved significant improvements object detection tasks. Although there are many optimizing methods to speed up CNN-based algorithms, it is still difficult deploy algorithms on real-time low-power systems. Field-Programmable Gate Array (FPGA) has explored as a platform for accelerating CNN due its promising performance, high energy efficiency, flexibility. Previous works show that the...

10.1145/3283452 article EN ACM Transactions on Reconfigurable Technology and Systems 2018-09-30

Diffusion probabilistic models (DPMs) are a new class of generative that have achieved state-of-the-art generation quality in various domains. Despite the promise, one major drawback DPMs is slow speed due to large number neural network evaluations required process. In this paper, we reveal an overlooked dimension -- model schedule for optimizing trade-off between and speed. More specifically, observe small models, though having worse when used alone, could outperform certain steps....

10.48550/arxiv.2306.08860 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...