Sai Qian Zhang

ORCID: 0000-0002-4815-9235
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Memory and Neural Computing
  • Software-Defined Networks and 5G
  • Caching and Content Delivery
  • Ferroelectric and Negative Capacitance Devices
  • Privacy-Preserving Technologies in Data
  • Adversarial Robustness in Machine Learning
  • Domain Adaptation and Few-Shot Learning
  • CCD and CMOS Imaging Sensors
  • Advanced Image and Video Retrieval Techniques
  • Parallel Computing and Optimization Techniques
  • Network Traffic and Congestion Control
  • Advanced Data Storage Technologies
  • Natural Language Processing Techniques
  • IoT and Edge/Fog Computing
  • Gaze Tracking and Assistive Technology
  • Low-power high-performance VLSI design
  • Neural Networks and Applications
  • Topic Modeling
  • Network Packet Processing and Optimization
  • Virtual Reality Applications and Impacts
  • Distributed Control Multi-Agent Systems
  • VLSI and FPGA Design Techniques
  • Advanced Malware Detection Techniques
  • Numerical Methods and Algorithms

New York University
2024-2025

Courant Institute of Mathematical Sciences
2025

Harvard University Press
2018-2024

META Health
2024

Meta (United States)
2024

Harvard University
2019-2020

University of Toronto
2015-2017

Chalmers University of Technology
2016

This paper describes a novel approach of packing sparse convolutional neural networks into denser format for efficient implementations using systolic arrays. By combining multiple columns filter matrix single dense column stored in the array, utilization efficiency array can be substantially increased (e.g., 8x) due to density nonzero weights resulting packed matrix. In columns, each row, all but one with largest magnitude are pruned. The remaining retrained preserve high accuracy. We study...

10.1145/3297858.3304028 article EN 2019-04-04

Federated learning (FL) is a training technique that enables client devices to jointly learn shared model by aggregating locally computed models without exposing their raw data. While most of the existing work focuses on improving FL accuracy, in this paper, we focus efficiency, which often hurdle for adopting real world applications. Specifically, design an efficient framework optimizes processing latency and communication all are primary considerations implementation FL. Inspired recent...

10.1609/aaai.v36i8.20894 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Many multicast services such as live multimedia distribution and real-time event monitoring require mechanisms that involve network functions (e.g., firewall video transcoding). Network function virtualization (NFV) is a concept proposes using to implement on infrastructure building block (such high volume servers virtual machines), where software provides the functionality of existing purpose-built equipment. We present an approach for mechanism whereby flows are processed by NFV before...

10.1109/tnsm.2015.2465371 article EN IEEE Transactions on Network and Service Management 2015-08-13

In cooperative multi-agent reinforcement learning (c-MARL), agents learn to cooperatively take actions as a team maximize total reward. We analyze the robustness of c-MARL adversaries capable attacking one on team. Through ability manipulate this agent's observations, adversary seeks decrease Attacking is challenging for three reasons: first, it difficult estimate rewards or how they are impacted by an agent mispredicting; second, models non-differentiable; and third, feature space...

10.1109/spw50608.2020.00027 article EN 2020-05-01

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via shared exponent across group of values. In this paper, we propose Fast First, Accurate Second Training (FAST) system DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precision BFP input operands, enabling incremental increases DNN throughout training. By increasing both...

10.1109/hpca53966.2022.00067 preprint EN 2022-04-01

Leveraging real-time eye tracking, foveated rendering optimizes hardware efficiency and enhances visual quality virtual reality (VR). This approach leverages eye-tracking techniques to determine where the user is looking, allowing system render high-resolution graphics only in foveal region-the small area of retina acuity highest, while peripheral view rendered at lower resolution. However, modern deep learning-based gaze-tracking solutions often exhibit a long-tail distribution tracking...

10.1109/tvcg.2025.3549577 article EN IEEE Transactions on Visualization and Computer Graphics 2025-01-01

Network function visualization (NFV) has emerged as a promising paradigm in networking, where the hardware-based middleboxes are replaced with software-based virtualized entities typically running on cloud to provide specific functionalities. By deploying NFV, network services become more adaptive and cost-effective. Many multicast such real-time multimedia streaming intrusion detection require appropriate chaining; however, NFVs placement well traffic routing strategy guarantee that flows...

10.1109/noms.2016.7502829 article EN NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium 2016-04-01

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability a wide range of real-world applications. However, achieving efficient communication among agents always been an overarching problem in MARL. In this work, we propose Variance Based Control (VBC), simple yet technique improve efficiency By limiting the variance exchanged messages between during training phase, noisy component can be eliminated effectively, while useful part...

10.48550/arxiv.1909.02682 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate array (FPGA) implementation. By jointly optimizing CNN models, computing architectures, hardware implementations, our achieves unprecedented performance in trade-off space characterized by latency, energy efficiency, utilization, accuracy. An FPGA implementation is used as validation vehicle design, achieving 2.28ms latency...

10.1145/3330345.3330385 article EN 2019-06-18

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit which have tremendous speed-up, memory saving with quantized activation and weights. We first bring up three omitted issues in networks: squashing range of values; gradient vanishing during backpropagation unexploited hardware acceleration ternary networks. By reparameterizing weights vector full precision scale offset for fixed vector, decouple...

10.1609/aaai.v34i04.5912 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

The emergence of the Internet Things (IoT) has led to a remarkable increase in volume data generated at network edge. In order support real-time smart IoT applications, massive amounts from edge devices need be processed using methods such as deep neural networks (DNNs) with low latency. To improve application performance and minimize resource cost, enterprises have begun adopt Edge computing, computation paradigm that advocates processing input locally However, nodes are often...

10.1145/3404397.3404473 article EN 2020-08-09

Recent studies have shown that introducing communication between agents can significantly improve overall performance in cooperative Multi-agent reinforcement learning (MARL). However, existing schemes often require to exchange an excessive number of messages at run-time under a reliable channel, which hinders its practicality many real-world situations. In this paper, we present \textit{Temporal Message Control} (TMC), simple yet effective approach for achieving succinct and robust MARL....

10.48550/arxiv.2010.14391 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Many multicast services such as live multimedia distribution and real-time event monitoring require constructing a mechanism that involves network functions (e.g. firewall, video transcoding). Network Function Virtualization (NFV) is concept proposes using virtualization to implement on infrastructure building block (such high volume servers, virtual machines), where software provides the functionality of existing purpose-built equipment. We present an approach for whereby flows are...

10.1109/icc.2015.7249214 article EN 2015-06-01

This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets columns in the original filter matrix associated with layer, we increase utilization efficiency substantially (e.g., ~4x) due to increased density nonzeros resulting packed matrix. In columns, each row, all weights but one largest magnitude are pruned. We retrain remaining preserve high accuracy. demonstrate that mitigating data...

10.48550/arxiv.1811.04770 preprint EN other-oa arXiv (Cornell University) 2018-01-01

In recent years, numerous designs have used systolic arrays to accelerate convolutional neural network (CNN) inference. this work, we demonstrate that can further speed up CNN inference and lower its power consumption by mapping onto 3D circuit structures as opposed conventional 2D structures. Specifically, operating in space, a wide array consisting of number subarrays efficiently implement layers prevalent state the art CNNs. Additionally, accumulating intermediate results along third...

10.1109/sips.2018.8598454 article EN 2018-10-01

We present the Maestro memory-on-logic 3D-IC architecture for coordinated parallel use of a plurality systolic arrays (SAs) in performing deep neural network (DNN) inference. reduces under-utilization common single large SA by allowing many smaller SAs on DNN weight matrices varying shapes and sizes. In order to buffer immediate results memory blocks (MBs) provide high-bandwidth communication between MBs transferring weights employs three innovations. (1) An logic die can access its...

10.1109/asap.2019.00-31 article EN 2019-07-01

We introduce adaptive tiling, a method of partitioning layers in sparse convolutional neural network (CNN) into blocks filters and channels, called tiles, each implementable with fixed-size systolic array. By allowing tile to adapt its size so that it can cover large area, we minimize the total number or equivalently, array calls required perform CNN inference. The proposed scheme resolves challenge applying architectures, traditionally designed for dense matrices, CNNs. To validate...

10.1109/icpr.2018.8545462 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2018-08-01

Distributed file systems such as Google File System and Hadoop have been used to store large volumes of data in Cloud centers. These divide sets blocks fixed size replicate them over multiple machines achieve both reliability efficiency. Recent studies shown that tend a wide disparity popularity. In this context, the naive block replication schemes by these often cause an uneven load distribution across machines, which reduces overall I/O throughput system. While many algorithms proposed,...

10.1109/icdcs.2015.52 article EN 2015-06-01

On-device learning allows AI models to adapt user data, thereby enhancing service quality on edge platforms. However, training resource-limited devices poses significant challenges due the demanding computing workload and substantial memory consumption data access required by deep neural networks (DNNs). To address these issues, we propose utilizing embedded dynamic random-access (eDRAM) as primary storage medium for transient data. In comparison static (SRAM), eDRAM provides higher density...

10.1109/hpca57654.2024.00071 article EN 2024-03-02
Coming Soon ...