NFDI4DS | UHH-SEMS - Publication Details

Sai Qian Zhang

ORCID: 0000-0002-4815-9235

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5013517247

Research Areas

Advanced Neural Network Applications
Advanced Memory and Neural Computing
Software-Defined Networks and 5G
Caching and Content Delivery
Ferroelectric and Negative Capacitance Devices
Privacy-Preserving Technologies in Data
Adversarial Robustness in Machine Learning
Domain Adaptation and Few-Shot Learning
CCD and CMOS Imaging Sensors
Advanced Image and Video Retrieval Techniques
Parallel Computing and Optimization Techniques
Network Traffic and Congestion Control
Advanced Data Storage Technologies
Natural Language Processing Techniques
IoT and Edge/Fog Computing
Gaze Tracking and Assistive Technology
Low-power high-performance VLSI design
Neural Networks and Applications
Topic Modeling
Network Packet Processing and Optimization
Virtual Reality Applications and Impacts
Distributed Control Multi-Agent Systems
VLSI and FPGA Design Techniques
Advanced Malware Detection Techniques
Numerical Methods and Algorithms

New York University
2024-2025

Courant Institute of Mathematical Sciences
2025

Harvard University Press
2018-2024

META Health
2024

Meta (United States)
2024

Harvard University
2019-2020

University of Toronto
2015-2017

Chalmers University of Technology
2016

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations

OPENALEX - Publications

H. T. Kung Bradley McDanel Sai Qian Zhang

This paper describes a novel approach of packing sparse convolutional neural networks into denser format for efficient implementations using systolic arrays. By combining multiple columns filter matrix single dense column stored in the array, utilization efficiency array can be substantially increased (e.g., 8x) due to density nonzero weights resulting packed matrix. In columns, each row, all but one with largest magnitude are pruned. The remaining retrained preserve high accuracy. We study...

10.1145/3297858.3304028 article EN 2019-04-04

A Multi-Agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning

OPENALEX - Publications

Sai Qian Zhang Jieyu Lin Qi Zhang

Federated learning (FL) is a training technique that enables client devices to jointly learn shared model by aggregating locally computed models without exposing their raw data. While most of the existing work focuses on improving FL accuracy, in this paper, we focus efficiency, which often hurdle for adopting real world applications. Specifically, design an efficient framework optimizes processing latency and communication all are primary considerations implementation FL. Inspired recent...

10.1609/aaai.v36i8.20894 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Routing Algorithms for Network Function Virtualization Enabled Multicast Topology on SDN

OPENALEX - Publications

Sai Qian Zhang Qi Zhang Hadi Bannazadeh Alberto Leon‐Garcia

Many multicast services such as live multimedia distribution and real-time event monitoring require mechanisms that involve network functions (e.g., firewall video transcoding). Network function virtualization (NFV) is a concept proposes using to implement on infrastructure building block (such high volume servers virtual machines), where software provides the functionality of existing purpose-built equipment. We present an approach for mechanism whereby flows are processed by NFV before...

10.1109/tnsm.2015.2465371 article EN IEEE Transactions on Network and Service Management 2015-08-13

On the Robustness of Cooperative Multi-Agent Reinforcement Learning

OPENALEX - Publications

Jieyu Lin Kristina Dzeparoska Sai Qian Zhang Alberto Leon‐Garcia Nicolas Papernot

In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team maximize total reward. We analyze the robustness of c-MARL adversaries capable attacking one on team. Through ability manipulate this agent's observations, adversary seeks decrease Attacking is challenging for three reasons: first, it difficult estimate rewards or how they are impacted by an agent mispredicting; second, models non-differentiable; and third, feature space...

10.1109/spw50608.2020.00027 article EN 2020-05-01

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

OPENALEX - Publications

Sai Qian Zhang Bradley McDanel H. T. Kung

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via shared exponent across group of values. In this paper, we propose Fast First, Accurate Second Training (FAST) system DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precision BFP input operands, enabling incremental increases DNN throughout training. By increasing both...

10.1109/hpca53966.2022.00067 preprint EN 2022-04-01

H4H: Hybrid Convolution-Transformer Architecture Search for NPU-CIM Heterogeneous Systems for AR/VR Applications

OPENALEX - Publications

Yiwei Zhao Jinhui Chen Sai Qian Zhang Syed Shakib Sarwar Kleber Stangherlin and 6 more

10.1145/3658617.3697627 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Efficient Foveated Rendering in Virtual Reality

OPENALEX - Publications

W Liu Budmonde Duinkharjav Qi Sun Sai Qian Zhang

Leveraging real-time eye tracking, foveated rendering optimizes hardware efficiency and enhances visual quality virtual reality (VR). This approach leverages eye-tracking techniques to determine where the user is looking, allowing system render high-resolution graphics only in foveal region-the small area of retina acuity highest, while peripheral view rendered at lower resolution. However, modern deep learning-based gaze-tracking solutions often exhibit a long-tail distribution tracking...

10.1109/tvcg.2025.3549577 article EN IEEE Transactions on Visualization and Computer Graphics 2025-01-01

PICACHU: Plug-In CGRA Handling Upcoming Nonlinear Operations in LLMs

OPENALEX - Publications

Jiaxiang Qin Tianhua Xia Cheng Tan Jeff Zhang Sai Qian Zhang

10.1145/3676641.3716013 article EN 2025-03-27

Joint NFV placement and routing for multicast service on SDN

OPENALEX - Publications

Sai Qian Zhang Ali Tizghadam Byungchul Park Hadi Bannazadeh Alberto Leon‐Garcia

Network function visualization (NFV) has emerged as a promising paradigm in networking, where the hardware-based middleboxes are replaced with software-based virtualized entities typically running on cloud to provide specific functionalities. By deploying NFV, network services become more adaptive and cost-effective. Many multicast such real-time multimedia streaming intrusion detection require appropriate chaining; however, NFVs placement well traffic routing strategy guarantee that flows...

10.1109/noms.2016.7502829 article EN NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium 2016-04-01

Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

OPENALEX - Publications

Sai Qian Zhang Qi Zhang Jieyu Lin

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability a wide range of real-world applications. However, achieving efficient communication among agents always been an overarching problem in MARL. In this work, we propose Variance Based Control (VBC), simple yet technique improve efficiency By limiting the variance exchanged messages between during training phase, noisy component can be eliminated effectively, while useful part...

10.48550/arxiv.1909.02682 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation

OPENALEX - Publications

Bradley McDanel Sai Qian Zhang H. T. Kung Xin Dong

We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate array (FPGA) implementation. By jointly optimizing CNN models, computing architectures, hardware implementations, our achieves unprecedented performance in trade-off space characterized by latency, energy efficiency, utilization, accuracy. An FPGA implementation is used as validation vehicle design, achieving 2.28ms latency...

10.1145/3330345.3330385 article EN 2019-06-18

RTN: Reparameterized Ternary Network

OPENALEX - Publications

Yuhang Li Xin Dong Sai Qian Zhang Haoli Bai Yuanpeng Chen and 1 more

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in networks: squashing range of values; gradient vanishing during backpropagation unexploited hardware acceleration ternary networks. By reparameterizing weights vector full precision scale offset for fixed vector, decouple...

10.1609/aaai.v34i04.5912 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Adaptive Distributed Convolutional Neural Network Inference at the Network Edge with ADCNN

OPENALEX - Publications

Sai Qian Zhang Jieyu Lin Qi Zhang

The emergence of the Internet Things (IoT) has led to a remarkable increase in volume data generated at network edge. In order support real-time smart IoT applications, massive amounts from edge devices need be processed using methods such as deep neural networks (DNNs) with low latency. To improve application performance and minimize resource cost, enterprises have begun adopt Edge computing, computation paradigm that advocates processing input locally However, nodes are often...

10.1145/3404397.3404473 article EN 2020-08-09

Succinct and Robust Multi-Agent Communication With Temporal Message Control

OPENALEX - Publications

Sai Qian Zhang Jieyu Lin Qi Zhang

Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). However, existing schemes often require to exchange an excessive number of messages at run-time under a reliable channel, which hinders its practicality many real-world situations. In this paper, we present \textit{Temporal Message Control} (TMC), simple yet effective approach for achieving succinct and robust MARL....

10.48550/arxiv.2010.14391 preprint EN other-oa arXiv (Cornell University) 2020-01-01

TCAM space-efficient routing in a software defined network

OPENALEX - Publications

Sai Qian Zhang Qi Zhang Ali Tizghadam Byungchul Park Hadi Bannazadeh and 2 more

10.1016/j.comnet.2017.06.020 article EN Computer Networks 2017-07-04

Network Function Virtualization enabled multicast routing on SDN

OPENALEX - Publications

Sai Qian Zhang Qi Zhang Hadi Bannazadeh Alberto Leon‐Garcia

Many multicast services such as live multimedia distribution and real-time event monitoring require constructing a mechanism that involves network functions (e.g. firewall, video transcoding). Network Function Virtualization (NFV) is concept proposes using virtualization to implement on infrastructure building block (such high volume servers, virtual machines), where software provides the functionality of existing purpose-built equipment. We present an approach for whereby flows are...

10.1109/icc.2015.7249214 article EN 2015-06-01

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

OPENALEX - Publications

H. T. Kung Bradley McDanel Sai Qian Zhang

This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets columns in the original filter matrix associated with layer, we increase utilization efficiency substantially (e.g., ~4x) due to increased density nonzeros resulting packed matrix. In columns, each row, all weights but one largest magnitude are pruned. We retrain remaining preserve high accuracy. demonstrate that mitigating data...

10.48550/arxiv.1811.04770 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Mapping Systolic Arrays onto 3D Circuit Structures: Accelerating Convolutional Neural Network Inference

OPENALEX - Publications

H. T. Kung Bradley McDanel Sai Qian Zhang

In recent years, numerous designs have used systolic arrays to accelerate convolutional neural network (CNN) inference. this work, we demonstrate that can further speed up CNN inference and lower its power consumption by mapping onto 3D circuit structures as opposed conventional 2D structures. Specifically, operating in space, a wide array consisting of number subarrays efficiently implement layers prevalent state the art CNNs. Additionally, accumulating intermediate results along third...

10.1109/sips.2018.8598454 article EN 2018-10-01

Maestro: A Memory-on-Logic Architecture for Coordinated Parallel Use of Many Systolic Arrays

OPENALEX - Publications

H. T. Kung Bradley McDanel Sai Qian Zhang Xin Dong Chih Chiang Chen

We present the Maestro memory-on-logic 3D-IC architecture for coordinated parallel use of a plurality systolic arrays (SAs) in performing deep neural network (DNN) inference. reduces under-utilization common single large SA by allowing many smaller SAs on DNN weight matrices varying shapes and sizes. In order to buffer immediate results memory blocks (MBs) provide high-bandwidth communication between MBs transferring weights employs three innovations. (1) An logic die can access its...

10.1109/asap.2019.00-31 article EN 2019-07-01

Adaptive Tiling: Applying Fixed-size Systolic Arrays To Sparse Convolutional Neural Networks

OPENALEX - Publications

H. T. Kung Bradley McDanel Sai Qian Zhang

We introduce adaptive tiling, a method of partitioning layers in sparse convolutional neural network (CNN) into blocks filters and channels, called tiles, each implementable with fixed-size systolic array. By allowing tile to adapt its size so that it can cover large area, we minimize the total number or equivalently, array calls required perform CNN inference. The proposed scheme resolves challenge applying architectures, traditionally designed for dense matrices, CNNs. To validate...

10.1109/icpr.2018.8545462 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2018-08-01

Aurora: Adaptive Block Replication in Distributed File Systems

OPENALEX - Publications

Qi Zhang Sai Qian Zhang Alberto Leon‐Garcia Raouf Boutaba

Distributed file systems such as Google File System and Hadoop have been used to store large volumes of data in Cloud centers. These divide sets blocks fixed size replicate them over multiple machines achieve both reliability efficiency. Recent studies shown that tend a wide disparity popularity. In this context, the naive block replication schemes by these often cause an uneven load distribution across machines, which reduces overall I/O throughput system. While many algorithms proposed,...

10.1109/icdcs.2015.52 article EN 2015-06-01

CAMEL: Co-Designing AI Models and eDRAMs for Efficient On-Device Learning

OPENALEX - Publications

Sai Qian Zhang Thierry Tambe Nestor Cuevas Gu-Yeon Wei David Brooks

On-device learning allows AI models to adapt user data, thereby enhancing service quality on edge platforms. However, training resource-limited devices poses significant challenges due the demanding computing workload and substantial memory consumption data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access (eDRAM) as primary storage medium for transient data. In comparison static (SRAM), eDRAM provides higher density...

10.1109/hpca57654.2024.00071 article EN 2024-03-02

Learning Client Selection Strategy for Federated Learning across Heterogeneous Mobile Devices

OPENALEX - Publications

Sai Qian Zhang Jieyu Lin Qi Zhang Yu‐Jia Chen

10.1109/isqed60706.2024.10528721 article EN 2024-04-03

Coming Soon ...