- Advanced Neural Network Applications
- Adversarial Robustness in Machine Learning
- Integrated Circuits and Semiconductor Failure Analysis
- Domain Adaptation and Few-Shot Learning
- CCD and CMOS Imaging Sensors
- Visual Attention and Saliency Detection
- Stochastic Gradient Optimization Techniques
- Brain Tumor Detection and Classification
- Physical Unclonable Functions (PUFs) and Hardware Security
- Neural Networks and Applications
- Anomaly Detection Techniques and Applications
- Advanced Neuroimaging Techniques and Applications
- Cloud Computing and Resource Management
- Distributed and Parallel Computing Systems
- Tensor decomposition and applications
- Multimodal Machine Learning Applications
- Advanced Optical Sensing Technologies
- Handwritten Text Recognition Techniques
- Bacillus and Francisella bacterial research
- Geophysical Methods and Applications
- Lattice Boltzmann Simulation Studies
- Machine Learning and Data Classification
- Parallel Computing and Optimization Techniques
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
Shanghai Jiao Tong University
2018-2024
ShangHai JiAi Genetics & IVF Institute
2022-2024
Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading irregular computations. Consequently, cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix As such, prior works usually modify or design completely new sparsity-optimized architectures exploiting sparsity. We propose an algorithm-software co-designed method that...
Recently, researchers have started decomposing deep neural network models according to their semantics or functions. Recent work has shown the effectiveness of decomposed functional blocks for defending adversarial attacks, which add small input perturbation image fool DNN models. This proposes a profiling-based method decompose different blocks, lead effective path as new approach exploring DNNs' internal organization. Specifically, per-image can be aggregated class-level path, through we...
Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading irregular computations. Consequently, unstructured cannot achieve meaningful speedup on commodity hardware built for dense matrix Accelerators are usually modified or designed with structured sparsity-optimized architectures exploiting sparsity. For example, Ampere architecture introduces a tensor core, which...
Deep learning is vulnerable to adversarial attacks, where carefully-crafted input perturbations could mislead a well-trained Neural Network (DNN) produce incorrect results. Adversarial attacks jeopardize the safety, security, and privacy of DNN-enabled systems. Today's countermeasures either do not have capability detect samples at inference-time, or introduce prohibitively high overhead be practical inference-time.We propose Ptolemy, an algorithm-architecture co-designed system that detects...
Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive confidential scenarios. However, current DFQ solutions degrade accuracy, need synthetic data to calibrate networks, are time-consuming costly. This paper proposes an on-the-fly framework with sub-second time, called SQuant, which can quantize on inference-only devices low...
The success of deep neural networks (DNNs) has sparked efforts to analyze (e.g., tracing) and optimize pruning) them. These tasks have specific requirements ad-hoc implementations in current execution backends like TensorFlow/PyTorch, which require developers manage fragmented interfaces adapt their codes diverse models. In this study, we propose a new framework called Amanda streamline the development these tasks. We formalize implementation as network instrumentation, involves introducing...
Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading irregular computations. Consequently, cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix As such, prior works usually modify or design completely new sparsity-optimized architectures exploiting sparsity. We propose an algorithm-software co-designed method that...
Parallel computers now start to adopt Bandwidth-Asymmetric Memory architecture that consists of traditional DRAM memory and new High Bandwidth (HBM) for high bandwidth. However, existing task schedulers suffer from low bandwidth usage poor data locality problems in bandwidth-asymmetric architectures. To solve the two problems, we propose a Locality Aware Task-stealing (BATS) system, which an HBM-aware allocator, bandwidth-aware traffic balancer, hierarchical task-stealing scheduler....
Continuous vision is the cornerstone of a diverse range intelligent applications found on emerging computing platforms such as autonomous machines and Augmented Reality glasses. A critical issue in today's continuous systems their long end-to-end frame latency, which significantly impacts system agility user experience. We find that latency fundamentally caused by serialized execution model pipeline, whose key stages, including sensing, imaging, computations, execute sequentially, leading to latency.
An activation function is an element-wise mathematical and plays a crucial role in deep neural networks (DNN). Many novel sophisticated functions have been proposed to improve the DNN accuracy but also consume massive memory training process with back-propagation. In this study, we propose nested forward automatic differentiation (Forward-AD), specifically for memory-efficient training. We deploy Forward-AD two widely-used learning frameworks, TensorFlow PyTorch, which support static dynamic...
Network pruning can reduce the computation cost of deep neural network (DNN) models. However, sparse models often produce randomly-distributed weights to maintain accuracy, leading irregular computations. Consequently, unstructured cannot achieve meaningful speedup on commodity hardware built for dense matrix Accelerators are usually modified or designed with structured sparsity-optimized architectures exploiting sparsity. For example, Ampere architecture introduces a tensor core, which...
Recently, researchers have started decomposing deep neural network models according to their semantics or functions. Recent work has shown the effectiveness of decomposed functional blocks for defending adversarial attacks, which add small input perturbation image fool DNN models. This proposes a profiling-based method decompose different blocks, lead effective path as new approach exploring DNNs' internal organization. Specifically, per-image can be aggregated class-level path, through we...
Deep learning is vulnerable to adversarial attacks, where carefully-crafted input perturbations could mislead a well-trained Neural Network produce incorrect results. Today's countermeasures attacks either do not have capability detect samples at inference time, or introduce prohibitively high overhead be practical time. We propose Ptolemy, an algorithm-architecture co-designed system that detects time with low and accuracy.We exploit the synergies between DNN imperative program execution:...
Post-training quantization attracts increasing attention due to its convenience in deploying quantized neural networks. Although rounding-to-nearest remains the prevailing method for DNN quantization, prior research has demonstrated suboptimal nature when applied weight quantization. They propose optimizing rounding schemes by leveraging output error rather than traditional error. Our study reveals that similar challenges also extend activation Despite easy generalization, lie dynamic of...
Frame latency in continuous vision significantly impacts the agility of intelligent machines that interact with environment via cameras. However, today's systems limit frame due to their fundamental sequential execution model. We propose a speculative model along two mechanisms enable practical speculation. present SVSoC, new mobile Systems-on-a-chip (SoC) architecture augments conventional SoCs speculation capability. Under same energy budget, SVSoC achieves 14.3 35.4 percent reduction...
An activation function is an element-wise mathematical and plays a crucial role in deep neural networks (DNN). Many novel sophisticated functions have been proposed to improve the DNN accuracy but also consume massive memory training process with back-propagation. In this study, we propose nested forward automatic differentiation (Forward-AD), specifically for memory-efficient training. We deploy Forward-AD two widely-used learning frameworks, TensorFlow PyTorch, which support static dynamic...