- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- CCD and CMOS Imaging Sensors
- Domain Adaptation and Few-Shot Learning
- Topic Modeling
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- Adversarial Robustness in Machine Learning
- Speech Recognition and Synthesis
- Web Data Mining and Analysis
- Ferroelectric and Negative Capacitance Devices
- Advanced Database Systems and Queries
- Machine Learning and Algorithms
- Graph Theory and Algorithms
- Embedded Systems Design Techniques
- VLSI and FPGA Design Techniques
- Real-time simulation and control systems
- Industrial Vision Systems and Defect Detection
Cornell University
2021-2025
Tsinghua University
2023
Peking University
2019-2021
Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to original training dataset retraining during quantization. This often not possible applications with sensitive or proprietary data, e.g., due privacy security concerns. Existing zero-shot use different heuristics address this, but they result in poor performance, especially when quantizing ultralow precision. Here, we...
Deploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses accelerating image classification, while other fundamental problems, such as object detection, have not adequately addressed. Compared with detection problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this need,...
Recent advancements in large language models (LLMs) have generated significant demands for efficient deployment inference workloads. Most existing approaches rely on temporal architectures that reuse hardware units different network layers and operators. However, these methods often encounter challenges achieving low latency due to considerable memory access overhead. This paper investigates the feasibility potential of model-specific spatial acceleration LLM FPGAs. Our approach involves...
Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment inference workloads. While hardware accelerators Transformer-based been extensively studied, the majority existing approaches rely on temporal architectures that reuse units different network layers and operators. However, these methods often encounter challenges achieving low latency due to considerable memory access overhead. This paper...
This work studies post-training parameter quantization in large language models (LLMs). We introduce with incoherence processing (QuIP), a new method based on the insight that benefits from $\textit{incoherent}$ weight and Hessian matrices, i.e., weights being even magnitude directions which it is important to round them accurately unaligned coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing quadratic proxy objective; (2) efficient pre- post-processing...
Quantization is a promising approach for reducing the inference time and memory footprint of neural networks. However, most existing quantization methods require access to original training dataset retraining during quantization. This often not possible applications with sensitive or proprietary data, e.g., due privacy security concerns. Existing zero-shot use different heuristics address this, but they result in poor performance, especially when quantizing ultra-low precision. Here, we...
Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment inference workloads. The majority existing approaches rely on temporal architectures that reuse hardware units different network layers and operators. However, these methods often encounter challenges achieving low latency due to considerable memory access overhead. This paper investigates the feasibility potential model-specific spatial...
FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection instance segmentation, have not been adequately addressed. Compared with problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this, recent proposes dynamic...
FPGAs provide a flexible and efficient platform to accelerate rapidly-changing algorithms for computer vision. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, including object detection instance segmentation, have not been adequately addressed. Compared with problems are more sensitive the spatial variance objects, therefore, require specialized convolutions aggregate information. To address this, recent proposes dynamic...
A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model. Our approach, named SPADE, exploits bijective distance mapping between input/output graphs constructed approximating manifolds corresponding to data. By leveraging generalized Courant-Fischer theorem, we propose SPADE score model, which proved be an upper bound best Lipschitz constant under manifold setting. To reveal most non-robust data samples highly vulnerable...
Deploying deep learning models on embedded systems has been challenging due to limited computing resources. The majority of existing work focuses accelerating image classification, while other fundamental vision problems, such as object detection, have not adequately addressed. Compared with detection problems are more sensitive the spatial variance objects, and therefore, require specialized convolutions aggregate information. To address this need, recent introduces dynamic deformable...