Yuxian Qiu

ORCID: 0000-0003-4040-0159
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Adversarial Robustness in Machine Learning
  • Integrated Circuits and Semiconductor Failure Analysis
  • Domain Adaptation and Few-Shot Learning
  • CCD and CMOS Imaging Sensors
  • Visual Attention and Saliency Detection
  • Stochastic Gradient Optimization Techniques
  • Brain Tumor Detection and Classification
  • Physical Unclonable Functions (PUFs) and Hardware Security
  • Neural Networks and Applications
  • Anomaly Detection Techniques and Applications
  • Advanced Neuroimaging Techniques and Applications
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Tensor decomposition and applications
  • Multimodal Machine Learning Applications
  • Advanced Optical Sensing Technologies
  • Handwritten Text Recognition Techniques
  • Bacillus and Francisella bacterial research
  • Geophysical Methods and Applications
  • Lattice Boltzmann Simulation Studies
  • Machine Learning and Data Classification
  • Parallel Computing and Optimization Techniques
  • Advanced Memory and Neural Computing
  • Ferroelectric and Negative Capacitance Devices

Shanghai Jiao Tong University
2018-2024

ShangHai JiAi Genetics & IVF Institute
2022-2024

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading irregular computations. Consequently, cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix As such, prior works usually modify or design completely new sparsity-optimized architectures exploiting sparsity. We propose an algorithm-software co-designed method that...

10.1109/sc41405.2020.00020 article EN 2020-11-01

Recently, researchers have started decomposing deep neural network models according to their semantics or functions. Recent work has shown the effectiveness of decomposed functional blocks for defending adversarial attacks, which add small input perturbation image fool DNN models. This proposes a profiling-based method decompose different blocks, lead effective path as new approach exploring DNNs' internal organization. Specifically, per-image can be aggregated class-level path, through we...

10.1109/cvpr.2019.00491 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading irregular computations. Consequently, unstructured cannot achieve meaningful speedup on commodity hardware built for dense matrix Accelerators are usually modified or designed with structured sparsity-optimized architectures exploiting sparsity. For example, Ampere architecture introduces a tensor core, which...

10.1109/tc.2024.3365942 article EN IEEE Transactions on Computers 2024-02-14

Deep learning is vulnerable to adversarial attacks, where carefully-crafted input perturbations could mislead a well-trained Neural Network (DNN) produce incorrect results. Adversarial attacks jeopardize the safety, security, and privacy of DNN-enabled systems. Today's countermeasures either do not have capability detect samples at inference-time, or introduce prohibitively high overhead be practical inference-time.We propose Ptolemy, an algorithm-architecture co-designed system that detects...

10.1109/micro50266.2020.00031 article EN 2020-10-01

Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive confidential scenarios. However, current DFQ solutions degrade accuracy, need synthetic data to calibrate networks, are time-consuming costly. This paper proposes an on-the-fly framework with sub-second time, called SQuant, which can quantize on inference-only devices low...

10.48550/arxiv.2202.07471 preprint EN cc-by arXiv (Cornell University) 2022-01-01

The success of deep neural networks (DNNs) has sparked efforts to analyze (e.g., tracing) and optimize pruning) them. These tasks have specific requirements ad-hoc implementations in current execution backends like TensorFlow/PyTorch, which require developers manage fragmented interfaces adapt their codes diverse models. In this study, we propose a new framework called Amanda streamline the development these tasks. We formalize implementation as network instrumentation, involves introducing...

10.1145/3617232.3624864 article EN cc-by-nc-nd 2024-04-17

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading irregular computations. Consequently, cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix As such, prior works usually modify or design completely new sparsity-optimized architectures exploiting sparsity. We propose an algorithm-software co-designed method that...

10.48550/arxiv.2008.13006 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Parallel computers now start to adopt Bandwidth-Asymmetric Memory architecture that consists of traditional DRAM memory and new High Bandwidth (HBM) for high bandwidth. However, existing task schedulers suffer from low bandwidth usage poor data locality problems in bandwidth-asymmetric architectures. To solve the two problems, we propose a Locality Aware Task-stealing (BATS) system, which an HBM-aware allocator, bandwidth-aware traffic balancer, hierarchical task-stealing scheduler....

10.1145/3291058 article EN ACM Transactions on Architecture and Code Optimization 2018-12-08

Continuous vision is the cornerstone of a diverse range intelligent applications found on emerging computing platforms such as autonomous machines and Augmented Reality glasses. A critical issue in today's continuous systems their long end-to-end frame latency, which significantly impacts system agility user experience. We find that latency fundamentally caused by serialized execution model pipeline, whose key stages, including sensing, imaging, computations, execute sequentially, leading to latency.

10.1145/3410463.3414650 article EN 2020-09-30

An activation function is an element-wise mathematical and plays a crucial role in deep neural networks (DNN). Many novel sophisticated functions have been proposed to improve the DNN accuracy but also consume massive memory training process with back-propagation. In this study, we propose nested forward automatic differentiation (Forward-AD), specifically for memory-efficient training. We deploy Forward-AD two widely-used learning frameworks, TensorFlow PyTorch, which support static dynamic...

10.1109/iccd56317.2022.00113 article EN 2022 IEEE 40th International Conference on Computer Design (ICCD) 2022-10-01

Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading irregular computations. Consequently, unstructured cannot achieve meaningful speedup on commodity hardware built for dense matrix Accelerators are usually modified or designed with structured sparsity-optimized architectures exploiting sparsity. For example, Ampere architecture introduces a tensor core, which...

10.48550/arxiv.2402.10876 preprint EN arXiv (Cornell University) 2024-02-16

Recently, researchers have started decomposing deep neural network models according to their semantics or functions. Recent work has shown the effectiveness of decomposed functional blocks for defending adversarial attacks, which add small input perturbation image fool DNN models. This proposes a profiling-based method decompose different blocks, lead effective path as new approach exploring DNNs' internal organization. Specifically, per-image can be aggregated class-level path, through we...

10.48550/arxiv.1904.08089 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Deep learning is vulnerable to adversarial attacks, where carefully-crafted input perturbations could mislead a well-trained Neural Network produce incorrect results. Today's countermeasures attacks either do not have capability detect samples at inference time, or introduce prohibitively high overhead be practical time. We propose Ptolemy, an algorithm-architecture co-designed system that detects time with low and accuracy.We exploit the synergies between DNN imperative program execution:...

10.48550/arxiv.2008.09954 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Post-training quantization attracts increasing attention due to its convenience in deploying quantized neural networks. Although rounding-to-nearest remains the prevailing method for DNN quantization, prior research has demonstrated suboptimal nature when applied weight quantization. They propose optimizing rounding schemes by leveraging output error rather than traditional error. Our study reveals that similar challenges also extend activation Despite easy generalization, lie dynamic of...

10.48550/arxiv.2208.11945 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Frame latency in continuous vision significantly impacts the agility of intelligent machines that interact with environment via cameras. However, today's systems limit frame due to their fundamental sequential execution model. We propose a speculative model along two mechanisms enable practical speculation. present SVSoC, new mobile Systems-on-a-chip (SoC) architecture augments conventional SoCs speculation capability. Under same energy budget, SVSoC achieves 14.3 35.4 percent reduction...

10.1109/lca.2019.2903241 article EN IEEE Computer Architecture Letters 2019-01-01

An activation function is an element-wise mathematical and plays a crucial role in deep neural networks (DNN). Many novel sophisticated functions have been proposed to improve the DNN accuracy but also consume massive memory training process with back-propagation. In this study, we propose nested forward automatic differentiation (Forward-AD), specifically for memory-efficient training. We deploy Forward-AD two widely-used learning frameworks, TensorFlow PyTorch, which support static dynamic...

10.48550/arxiv.2209.10778 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...