- Advanced Neural Network Applications
- Advanced Memory and Neural Computing
- Generative Adversarial Networks and Image Synthesis
- Stochastic Gradient Optimization Techniques
- Ferroelectric and Negative Capacitance Devices
- Recommender Systems and Techniques
- Adversarial Robustness in Machine Learning
- Machine Learning and Data Classification
- Parallel Computing and Optimization Techniques
- Scientific Computing and Data Management
- Network Packet Processing and Optimization
- Robotic Path Planning Algorithms
- Energy Load and Power Forecasting
- Distributed and Parallel Computing Systems
- Neural Networks and Applications
- Robotics and Sensor-Based Localization
- Video Analysis and Summarization
- Web Data Mining and Analysis
Meta (United States)
2021-2023
University of Michigan
2017-2021
Menlo School
2021
As the size of Deep Neural Networks (DNNs) continues to grow increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model computation by removing redundant weights. However, we implemented weight for several popular networks on a variety hardware platforms observed surprising results. For many networks, network sparsity caused will actually hurt overall performance despite large reductions in required multiply-accumulate operations....
As the size of Deep Neural Networks (DNNs) continues to grow increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model computation by removing redundant weights. However, we implemented weight for several popular networks on a variety hardware platforms observed surprising results. For many networks, network sparsity caused will actually hurt overall performance despite large reductions in required multiply-accumulate operations....
We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Network (CNN) inference by leveraging network redundancy and massive parallelism. The is exploited in two ways. First, we prune fine-tune the trained model develop distinct methods coalescing overlapping to run inferences efficiently with sparse models. Second, models a reduced bit width bit-serial computation. Our proposed achieves 17.7×/3.7× speedup over server...
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, well high compute, and network bandwidth requirements. We co-designed high-performance, energy-efficient accelerator platform based on these describe ecosystem developed deployed Facebook: both hardware, through Open Compute Platform (OCP), software framework tooling, Pytorch/Caffe2/Glow. A...
The density of FPGA on-chip memory has been continuously increasing with modern FPGAs having thousands block RAMs (BRAMs) distributed across their reconfigurable fabric. These BRAMs can provide a tremendous amount bandwidth for efficient acceleration data-intensive applications. In this work, we propose enhancing the ubiquitous in-memory compute-capabilities. As result, act as normal storage units or bitlines be re-purposed SIMD lanes executing bit-serial arithmetic operations. Our proposed...
Deep Neural Networks (DNNs) have become an essential component of various applications. While today’s DNNs are mainly restricted to cloud services, network connectivity, energy, and data privacy problems make it important support efficient DNN computation on low-cost, low-power processors like microcontrollers. However, due the constrained resources, is challenging execute large models Using sub-byte low-precision input activations weights a typical method reduce computation. But...
Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at cost substantial computation and energy consumption, making it difficult to be deployed on mobile embedded devices. In CNNs, compute-intensive convolutional layers are followed by ReLU activation layer, which clamps negative outputs zeros, resulting in large sparsity. By exploiting such sparsity CNN models, we propose...
Deep convolutional neural networks (CNNs) are deployed in various applications but demand immense computational requirements. Pruning techniques and Winograd convolution two typical methods to reduce the CNN computation. However, they cannot be directly combined because transformation fills sparsity resulting from pruning. Li et al. (2017) propose sparse which weights pruned domain, this technique is not very practical Winograd-domain retraining requires low learning rates hence...
Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost carbon footprint in modern data centers. Pruning is an effective technique that reduces memory compute demand for model inference. However, pruning online challenging due continuous distribution shift (a.k.a non-stationary data). Although incremental training on full able adapt data,...
Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes significant training cost infrastructure cost. Model pruning is an effective technique to reduce computation overhead for deep networks by removing redundant parameters. However, modern are still thirsty due the demand handling big data. Thus, a results in smaller consequently lower accuracy. To without sacrificing capacity, we...
Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes significant training cost infrastructure cost. Model pruning is an effective technique to reduce computation overhead for deep networks by removing redundant parameters. However, modern are still thirsty due the demand handling big data. Thus, a results in smaller consequently lower accuracy. To without sacrificing capacity, we...