- Advanced Memory and Neural Computing
- Advanced Neural Network Applications
- Ferroelectric and Negative Capacitance Devices
- Advanced Vision and Imaging
- Advanced Multi-Objective Optimization Algorithms
- Advanced Image Processing Techniques
- Image and Signal Denoising Methods
- Sensory Analysis and Statistical Methods
- Machine Learning and ELM
- Optimal Experimental Design Methods
- Cryptographic Implementations and Security
- Advanced Image and Video Retrieval Techniques
- Parallel Computing and Optimization Techniques
- Genomics and Chromatin Dynamics
- Biometric Identification and Security
- Neuroscience and Neural Engineering
- Metaheuristic Optimization Algorithms Research
- Energy Harvesting in Wireless Networks
- Chaos-based Image/Signal Encryption
- Physical Unclonable Functions (PUFs) and Hardware Security
- Advanced Battery Technologies Research
- EEG and Brain-Computer Interfaces
- Low-power high-performance VLSI design
- Evolutionary Algorithms and Applications
Hong Kong University of Science and Technology
2017-2024
University of Hong Kong
2018-2024
Hong Kong Science and Technology Parks Corporation
2022
Philadelphia University
2018-2020
University of Pennsylvania
2018-2020
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds layers. The large computational complexity poses a challenge the hardware design. In this work, we leverage intrinsic activation sparsity DNN substantially reduce execution cycles and energy consumption. An end-to-end training algorithm is proposed develop lightweight (less than 5% overhead) run-time predictor for output on fly. Furthermore, an energy-efficient architecture, SparseNN,...
The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. coarse-grained structured pruning, on other hand, tends have higher accuracy loss than when pruned are same size. In this work, we propose compression method based and novel weight permutation scheme. Through permutation, sparse matrix is further compressed small dense format make full use hardware resources. Compared...
The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On other hand, coarse-grained structured is suitable for but tends have higher accuracy loss than when pruned are same size. In this work, we propose model compression method based on novel weight permutation scheme fully exploit fine-grained hardware design. Through permutation, optimal arrangement matrix obtained, and...
To solve the scaling, memory wall and high power density issues, recently RRAM-based accelerators, which show a better energy area efficiency compared with CMOS-based counterparts, have been proposed for convolutional neural networks. However, architectures still face several design challenges, including timing overhead at analog/digital (A/D) conversion interfacing circuits. address these we propose novel optimization schemes in this work. First an encoding scheme synaptic weights input...
To solve the scaling, memory wall and high power density issues, recently RRAM-based accelerators, which show a better energy area efficiency compared with CMOS-based counterparts, have been proposed for convolutional neural networks. However, architectures still face several design challenges, including timing overhead at analog/digital (A/D) conversion interfacing circuits. address these we propose novel optimization schemes in this work. First an encoding scheme synaptic weights input...
Recently Resistive-RAM (RRAM) crossbar has been used in the design of accelerator convolutional neural networks (CNNs) to solve memory wall issue. However, intensive multiply-accumulate computations (MACs) executed at crossbars during inference phase are still bottleneck for further improvement energy efficiency and throughput. In this work, we explore several methods reduce RRAM-based CNN accelerators. First, output sparsity resulting from widely employed Rectified Linear Unit is exploited,...
Conversion rate optimization (CRO) means designing an e‐commerce web interface so that as many users possible take a desired action such registering for account, requesting contact, or making purchase. Such design is usually done by hand, evaluating one change at time through A/B testing, all combinations of two three variables multivariate multiple independently. Traditional CRO thus limited to small fraction the space only, and often misses important interactions between variables. This...
Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs stacked small in many computer tasks, such as object detection image restoration. The Winograd transformation helps reduce the number of repetitive multiplications convolution is widely supported by commercial AI processors. Researchers have proposed accelerating kernel convolutions linearly decomposing them into then sequentially each algorithm. This work...
A 16-bit on-chip embedded encryption system built upon eFUSE, cipher, hash functions, and EDCs for optical nerve stimulation is presented. The foundry-provided eFUSE IP modified with a one-shot block to support wireless power transfer operation by mitigating the supply voltage drop problem during sensing avoid subsequent resetting. Novel logic gate-based auxiliary circuit facilitates different programming modes in eFUSE. 128-bit cipher reduced 16 bits cascade structure using proposed...
Multivariate testing has recently emerged as a promising technique in web interface design. In contrast to the standard A/B testing, multivariate approach aims at evaluating large number of values few key variables systematically. The Taguchi method is practical implementation this idea, focusing on orthogonal combinations values. It current state art applications such Adobe Target. This paper evaluates an alternative method: population-based search, i.e. evolutionary optimization. Its...
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds layers. The large computation and memory requirements pose a challenge the hardware design. In this work, we leverage intrinsic activation sparsity DNN substantially reduce execution cycles energy consumption. An end-to-end training algorithm is proposed develop lightweight run-time predictor for output on fly. From our experimental results, overhead prediction phase can be reduced less...
Multivariate testing has recently emerged as a promising technique in web interface design. In contrast to the standard A/B testing, multivariate approach aims at evaluating large number of values few key variables systematically. The Taguchi method is practical implementation this idea, focusing on orthogonal combinations values. This paper evaluates an alternative method: population-based search, i.e. evolutionary optimization. Its performance compared that several simulated conditions,...