Jiecao Yu

ORCID: 0000-0003-2085-0312
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • Generative Adversarial Networks and Image Synthesis
  • Stochastic Gradient Optimization Techniques
  • Ferroelectric and Negative Capacitance Devices
  • Recommender Systems and Techniques
  • Adversarial Robustness in Machine Learning
  • Machine Learning and Data Classification
  • Parallel Computing and Optimization Techniques
  • Scientific Computing and Data Management
  • Network Packet Processing and Optimization
  • Robotic Path Planning Algorithms
  • Energy Load and Power Forecasting
  • Distributed and Parallel Computing Systems
  • Neural Networks and Applications
  • Robotics and Sensor-Based Localization
  • Video Analysis and Summarization
  • Web Data Mining and Analysis

Meta (United States)
2021-2023

University of Michigan
2017-2021

Menlo School
2021

As the size of Deep Neural Networks (DNNs) continues to grow increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model computation by removing redundant weights. However, we implemented weight for several popular networks on a variety hardware platforms observed surprising results. For many networks, network sparsity caused will actually hurt overall performance despite large reductions in required multiply-accumulate operations....

10.1145/3079856.3080215 article EN 2017-06-24

As the size of Deep Neural Networks (DNNs) continues to grow increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model computation by removing redundant weights. However, we implemented weight for several popular networks on a variety hardware platforms observed surprising results. For many networks, network sparsity caused will actually hurt overall performance despite large reductions in required multiply-accumulate operations....

10.1145/3140659.3080215 article EN ACM SIGARCH Computer Architecture News 2017-06-24

We propose Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks - an in-SRAM architecture for accelerating Network (CNN) inference by leveraging network redundancy and massive parallelism. The is exploited in two ways. First, we prune fine-tune the trained model develop distinct methods coalescing overlapping to run inferences efficiently with sparse models. Second, models a reduced bit width bit-serial computation. Our proposed achieves 17.7×/3.7× speedup over server...

10.1109/hpca.2019.00029 article EN 2019-02-01

In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, well high compute, and network bandwidth requirements. We co-designed high-performance, energy-efficient accelerator platform based on these describe ecosystem developed deployed Facebook: both hardware, through Open Compute Platform (OCP), software framework tooling, Pytorch/Caffe2/Glow. A...

10.48550/arxiv.2107.04140 preprint EN cc-by arXiv (Cornell University) 2021-01-01

The density of FPGA on-chip memory has been continuously increasing with modern FPGAs having thousands block RAMs (BRAMs) distributed across their reconfigurable fabric. These BRAMs can provide a tremendous amount bandwidth for efficient acceleration data-intensive applications. In this work, we propose enhancing the ubiquitous in-memory compute-capabilities. As result, act as normal storage units or bitlines be re-purposed SIMD lanes executing bit-serial arithmetic operations. Our proposed...

10.1109/fccm51124.2021.00018 article EN 2021-05-01

Deep Neural Networks (DNNs) have become an essential component of various applications. While today’s DNNs are mainly restricted to cloud services, network connectivity, energy, and data privacy problems make it important support efficient DNN computation on low-cost, low-power processors like microcontrollers. However, due the constrained resources, is challenging execute large models Using sub-byte low-precision input activations weights a typical method reduce computation. But...

10.1145/3358189 article EN ACM Transactions on Embedded Computing Systems 2019-10-07

Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at cost substantial computation and energy consumption, making it difficult to be deployed on mobile embedded devices. In CNNs, compute-intensive convolutional layers are followed by ReLU activation layer, which clamps negative outputs zeros, resulting in large sparsity. By exploiting such sparsity CNN models, we propose...

10.1145/3609093 article EN ACM Transactions on Embedded Computing Systems 2023-09-09

Deep convolutional neural networks (CNNs) are deployed in various applications but demand immense computational requirements. Pruning techniques and Winograd convolution two typical methods to reduce the CNN computation. However, they cannot be directly combined because transformation fills sparsity resulting from pruning. Li et al. (2017) propose sparse which weights pruned domain, this technique is not very practical Winograd-domain retraining requires low learning rates hence...

10.48550/arxiv.1901.02132 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Large scale deep learning provides a tremendous opportunity to improve the quality of content recommendation systems by employing both wider and deeper models, but this comes at great infrastructural cost carbon footprint in modern data centers. Pruning is an effective technique that reduces memory compute demand for model inference. However, pruning online challenging due continuous distribution shift (a.k.a non-stationary data). Although incremental training on full able adapt data,...

10.48550/arxiv.2010.08655 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes significant training cost infrastructure cost. Model pruning is an effective technique to reduce computation overhead for deep networks by removing redundant parameters. However, modern are still thirsty due the demand handling big data. Thus, a results in smaller consequently lower accuracy. To without sacrificing capacity, we...

10.1109/icmla52953.2021.00229 article EN 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) 2021-12-01

Deep learning recommendation systems at scale have provided remarkable gains through increasing model capacity (i.e. wider and deeper neural networks), but it comes significant training cost infrastructure cost. Model pruning is an effective technique to reduce computation overhead for deep networks by removing redundant parameters. However, modern are still thirsty due the demand handling big data. Thus, a results in smaller consequently lower accuracy. To without sacrificing capacity, we...

10.48550/arxiv.2105.01064 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...