Song Han

ORCID: 0000-0001-7758-3679
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Cryptography and Data Security
  • Advanced Authentication Protocols Security
  • Security in Wireless Sensor Networks
  • Privacy-Preserving Technologies in Data
  • Chaos-based Image/Signal Encryption
  • User Authentication and Security Systems
  • Mobile Ad Hoc Networks
  • Cryptography and Residue Arithmetic
  • Network Security and Intrusion Detection
  • Advanced Neural Network Applications
  • Mobile Agent-Based Network Management
  • Internet Traffic Analysis and Secure E-voting
  • Topic Modeling
  • Energy Efficient Wireless Sensor Networks
  • Privacy, Security, and Data Protection
  • Quantum Computing Algorithms and Architecture
  • Cloud Data Security Solutions
  • Advanced Steganography and Watermarking Techniques
  • RFID technology advancements
  • Natural Language Processing Techniques
  • Access Control and Trust
  • Cooperative Communication and Network Coding
  • Cognitive Radio Networks and Spectrum Sensing
  • Opportunistic and Delay-Tolerant Networks
  • Wireless Communication Security Techniques

IIT@MIT
2025

Bayi Children's Hospital
2025

Zhejiang University
2008-2024

City University of Macau
2023-2024

Hangzhou City University
2023-2024

Zhejiang Gongshang University
2013-2023

Yunnan University
2023

University of Jinan
2023

China University of Geosciences
2021-2023

State Nuclear Power Technology Company (China)
2023

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization Huffman coding, that work together reduce the storage requirement of neural by 35x 49x without affecting their accuracy. Our method first prunes network learning only important connections. Next, quantize weights enforce...

10.48550/arxiv.1510.00149 preprint EN other-oa arXiv (Cornell University) 2015-01-01

Watermarking, which belong to the information hiding field, has seen a lot of research interest. There is work begin conducted in different branches this field. Steganography used for secret communication, whereas watermarking content protection, copyright management, authentication and tamper detection. In paper we present detailed survey existing newly proposed steganographic techniques. We classify techniques based on domains data embedded. limit images only.

10.1109/indin.2005.1560462 article EN 2005-12-22

10.1016/j.jnca.2011.03.004 article EN Journal of Network and Computer Applications 2011-04-01

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional fix the architecture before training starts; as a result, cannot improve architecture. To address these limitations, we describe method reduce storage computation required by neural an order of magnitude without affecting their accuracy learning only important connections. Our prunes redundant connections using three-step method. First, train...

10.48550/arxiv.1506.02626 preprint EN other-oa arXiv (Cornell University) 2015-01-01

With the increasingly powerful and extensive deployment of edge devices, edge/fog computing enables customers to manage analyze data locally, extends power analysis applications network edges. Meanwhile, as next generation grid, smart grid can achieve goal efficiency, economy, security, reliability, use safety environmental friendliness for grid. However, privacy secure issues in fog-based communications are challenging. Without proper protection, customers' will be readily violated. This...

10.1109/tifs.2020.3014487 article EN IEEE Transactions on Information Forensics and Security 2020-08-05

Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises hardware barrier for serving (memory size) and slows down token generation bandwidth). In this paper, we propose Activation-aware Weight Quantization (AWQ), a hardware-friendly approach LLM low-bit weight-only quantization. Our method is based observation that weights are not equally important: protecting only 1% of salient can greatly reduce quantization error. We then to...

10.48550/arxiv.2306.00978 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Large language models (LLMs) have transformed numerous AI applications. On-device LLM is becoming increasingly important: running LLMs locally on edge devices can reduce cloud computing costs and protect users' privacy. However, the astronomical model size limited hardware resources pose significant deployment challenges. To solve these issues, we propose Activation-aware Weight Quantization (AWQ) TinyChat, an algorithm-system full-stack solution for efficient on-device deployment. AWQ a...

10.1145/3714983.3714987 article EN GetMobile Mobile Computing and Communications 2025-01-20

Cyber-physical systems (CPSs) integrate the computation with physical processes. Embedded computers and networks monitor control processes, usually feedback loops where processes affect computations vice versa. CPS was identified as one of eight research priority areas in August 2007 report President's Council Advisors on Science Technology, will be core component many critical infrastructures industrial near future. However, a variety random failures cyber attacks exist CPS, which greatly...

10.1109/jsyst.2013.2257594 article EN IEEE Systems Journal 2014-10-31

State-of-the-art deep neural networks (DNNs) have hundreds of millions connections and are both computationally memory intensive, making them difficult to deploy on embedded systems with limited hardware resources power budgets. While custom helps the computation, fetching weights from DRAM is two orders magnitude more expensive than ALU operations, dominates required power. Previously proposed 'Deep Compression' makes it possible fit large DNNs (AlexNet VGGNet) fully in on-chip SRAM. This...

10.48550/arxiv.1602.01528 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Online anomaly detection (AD) is an important technique for monitoring wireless sensor networks (WSNs), which protects WSNs from cyberattacks and random faults. As a scalable parameter-free unsupervised AD technique, $(k)$-nearest neighbor (kNN) algorithm has attracted lot of attention its applications in computer WSNs. However, the nature lazy-learning makes kNN-based schemes difficult to be used online manner, especially when communication cost constrained. In this paper, new scheme based...

10.1109/tpds.2012.261 article EN IEEE Transactions on Parallel and Distributed Systems 2013-06-26

Wireless body area networks (WBANs), as a promising health-care system, can provide tremendous benefits for timely and continuous patient care remote health monitoring. Owing to the restriction of communication, computation power in WBANs, cloud-assisted which offer more reliable, intelligent, services mobile users patients, are receiving increasing attention. However, how aggregate data multifunctionally efficiently is still an open issue cloud server (CS). In this paper, we propose...

10.1109/tifs.2015.2472369 article EN IEEE Transactions on Information Forensics and Security 2015-08-24

On-device training enables the model to adapt new data collected from sensors by fine-tuning a pre-trained model. Users can benefit customized AI models without having transfer cloud, protecting privacy. However, memory consumption is prohibitive for IoT devices that have tiny resources. We propose an algorithm-system co-design framework make on-device possible with only 256KB of memory. faces two unique challenges: (1) quantized graphs neural networks are hard optimize due low bit-precision...

10.48550/arxiv.2206.15472 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory accelerate inference. However, existing methods cannot maintain accuracy hardware efficiency at the same time. We propose SmoothQuant, a training-free, accuracy-preserving, general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, activation (W8A8) for LLMs. Based on fact that weights easy quantize while activations not, SmoothQuant smooths...

10.48550/arxiv.2211.10438 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Privacy-preserving federated learning, as one of the privacy-preserving computation techniques, is a promising distributed and machine learning (ML) approach for Internet Medical Things (IoMT), due to its ability train regression model without collecting raw data owners (DOs). However, traditional interactive training (IFRT) schemes rely on multiple rounds communication global are still under various privacy security threats. To overcome these problems, several noninteractive (NFRT) have...

10.1109/tnnls.2023.3271859 article EN cc-by IEEE Transactions on Neural Networks and Learning Systems 2023-06-08

We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs long is computationally expensive, requiring extensive hours and GPU resources. For example, on length 8192 needs 16x computational costs in self-attention layers as 2048. In this paper, we speed up extension two aspects. On one hand, although dense global attention needed during inference, model can be...

10.48550/arxiv.2309.12307 preprint EN other-oa arXiv (Cornell University) 2023-01-01

10.1109/cvpr52733.2024.02520 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built larger and models. Such large model both computation intensive memory intensive. Deploying such bulky results high power consumption leads total cost of ownership (TCO) a data center. speedup the make it energy efficient, we first propose load-balance-aware pruning method that can compress LSTM size by 20x (10x from 2x quantization) with...

10.48550/arxiv.1612.00694 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These require low latency high accuracy provide real-time user experience ensure safety. Unlike conventional dense workloads, the sparse irregular nature of poses severe challenges running CNNs efficiently general-purpose hardware. Furthermore, existing acceleration techniques for 2D images do not translate 3D clouds. In this paper, we introduce TorchSparse, a...

10.48550/arxiv.2204.10319 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Transfer learning is important for foundation models to adapt downstream tasks. However, many are proprietary, so users must share their data with model owners fine-tune the models, which costly and raise privacy concerns. Moreover, fine-tuning large computation-intensive impractical most users. In this paper, we propose Offsite-Tuning, a privacy-preserving efficient transfer framework that can billion-parameter without access full model. offsite-tuning, owner sends light-weight adapter...

10.48550/arxiv.2302.04870 preprint EN cc-by arXiv (Cornell University) 2023-01-01

10.1016/j.chaos.2007.01.017 article EN Chaos Solitons & Fractals 2007-02-22

10.1016/j.chaos.2007.06.030 article EN Chaos Solitons & Fractals 2007-08-01
Coming Soon ...