Yaohui Cai

ORCID: 0000-0003-3785-3413
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • CCD and CMOS Imaging Sensors
  • Domain Adaptation and Few-Shot Learning
  • Topic Modeling
  • Anomaly Detection Techniques and Applications
  • Natural Language Processing Techniques
  • Adversarial Robustness in Machine Learning
  • Speech Recognition and Synthesis
  • Web Data Mining and Analysis
  • Ferroelectric and Negative Capacitance Devices
  • Advanced Database Systems and Queries
  • Machine Learning and Algorithms
  • Graph Theory and Algorithms
  • Embedded Systems Design Techniques
  • VLSI and FPGA Design Techniques
  • Real-time simulation and control systems
  • Industrial Vision Systems and Defect Detection

Cornell University
2021-2025

Tsinghua University
2023

Peking University
2019-2021

Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to original training dataset retraining during quantization. This often not possible applications with sensitive or proprietary data, e.g., due privacy security concerns. Existing zero-shot use different heuristics address this, but they result in poor performance, especially when quantizing ultralow precision. Here, we...

10.1109/cvpr42600.2020.01318 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Deploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses accelerating image classification, while other fundamental problems, such as object detection, have not adequately addressed. Compared with detection problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this need,...

10.1145/3431920.3439295 preprint EN 2021-02-17

Recent advancements in large language models (LLMs) have generated significant demands for efficient deployment inference workloads. Most existing approaches rely on temporal architectures that reuse hardware units different network layers and operators. However, these methods often encounter challenges achieving low latency due to considerable memory access overhead. This paper investigates the feasibility potential of model-specific spatial acceleration LLM FPGAs. Our approach involves...

10.1145/3626202.3637600 article EN 2024-04-01

Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment inference workloads. While hardware accelerators Transformer-based been extensively studied, the majority existing approaches rely on temporal architectures that reuse units different network layers and operators. However, these methods often encounter challenges achieving low latency due to considerable memory access overhead. This paper...

10.1145/3656177 article EN ACM Transactions on Reconfigurable Technology and Systems 2024-04-04

This work studies post-training parameter quantization in large language models (LLMs). We introduce with incoherence processing (QuIP), a new method based on the insight that benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., weights being even magnitude directions which it is important to round them accurately unaligned coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing quadratic proxy objective; (2) efficient pre- post-processing...

10.48550/arxiv.2307.13304 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to original training dataset retraining during quantization. This often not possible applications with sensitive or proprietary data, e.g., due privacy security concerns. Existing zero-shot use different heuristics address this, but they result in poor performance, especially when quantizing ultra-low precision. Here, we...

10.48550/arxiv.2001.00281 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment inference workloads. The majority existing approaches rely on temporal architectures that reuse hardware units different network layers and operators. However, these methods often encounter challenges achieving low latency due to considerable memory access overhead. This paper investigates the feasibility potential model-specific spatial...

10.48550/arxiv.2312.15159 preprint EN other-oa arXiv (Cornell University) 2023-01-01

FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection instance segmentation, have not been adequately addressed. Compared with problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this, recent proposes dynamic...

10.1109/emc2-nips53020.2019.00019 article EN 2019-12-01

FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection instance segmentation, have not been adequately addressed. Compared with problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this, recent proposes dynamic...

10.48550/arxiv.2002.08357 preprint EN other-oa arXiv (Cornell University) 2020-01-01

A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model. Our approach, named SPADE, exploits bijective distance mapping between input/output graphs constructed approximating manifolds corresponding to data. By leveraging generalized Courant-Fischer theorem, we propose SPADE score model, which proved be an upper bound best Lipschitz constant under manifold setting. To reveal most non-robust data samples highly vulnerable...

10.48550/arxiv.2102.03716 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses accelerating image classification, while other fundamental vision problems, such as object detection, have not adequately addressed. Compared with detection problems are more sensitive the spatial variance objects, and therefore, require specialized convolutions aggregate information. To address this need, recent introduces dynamic deformable...

10.48550/arxiv.2006.08357 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...