- Advanced Memory and Neural Computing
- Advanced Neural Network Applications
- Semiconductor materials and devices
- Ferroelectric and Negative Capacitance Devices
- Parallel Computing and Optimization Techniques
- Advanced Image and Video Retrieval Techniques
- Low-power high-performance VLSI design
- CCD and CMOS Imaging Sensors
- Domain Adaptation and Few-Shot Learning
- Adversarial Robustness in Machine Learning
- VLSI and FPGA Design Techniques
- Anomaly Detection Techniques and Applications
- Analog and Mixed-Signal Circuit Design
- Advanced Data Storage Technologies
- Neural Networks and Applications
- Sparse and Compressive Sensing Techniques
- Image and Signal Denoising Methods
- Advanced Data Compression Techniques
- 3D IC and TSV technologies
- Interconnection Networks and Systems
- Spacecraft Design and Technology
- Physical Unclonable Functions (PUFs) and Hardware Security
- Error Correcting Code Techniques
- Quantum-Dot Cellular Automata
- Advanced Image Processing Techniques
Nanyang Technological University
2018-2024
University of Sharjah
2023
Agency for Science, Technology and Research
2021
Polytechnique Montréal
2019
University of Nebraska–Lincoln
2019
Stanford University
2015-2018
Next-generation information technologies will process unprecedented amounts of loosely structured data that overwhelm existing computing systems. N3XT improves the energy efficiency abundant-data applications 1,000-fold by using new logic and memory technologies, 3D integration with fine-grained connectivity, architectures for computation immersed in memory.
The world's appetite for analyzing massive amounts of structured and unstructured data has grown dramatically. computational demands these abundant-data applications, such as deep learning, far exceed the capabilities today's computing systems are unlikely to be met with isolated improvements in transistor or memory technologies, integrated circuit architectures alone. To achieve unprecedented functionality, speed, energy efficiency, one must create transformative nanosystems whose based on...
As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning quantization proposed reduce the model size significantly, which key find suitable allocation (e.g., sparsity codebook) each layer. Existing solutions obtain in an iterative/manual fashion while finetuning compressed model, thus...
Non-volatility is emerging as an essential on-chip memory characteristic across a wide range of application domains, from edge nodes for the Internet Things (IoT) to large computing clusters. On-chip non-volatile (NVM) critical low-energy operation, real-time responses, privacy and security, operation in unpredictable environments, fault-tolerance [1]. Existing NVMs (e.g., Flash, FRAM, EEPROM) suffer high read/write energy/latency, density, integration challenges For example, ideal IoT...
Fast Fourier Transform (FFT) is an essential algorithm for numerous scientific and engineering applications. It key to implement FFT in a high-performance energy-efficient manner. In this paper, we leverage the properties of ultrasonic wave propagation silicon computation. We introduce SonicFFT: A system architecture ultrasonic-based acceleration. To evaluate benefits SonicFFT, compact-model based simulation framework that quantifies performance energy integrated comprising digital computing...
Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through use automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there increasing interest in developing hardware accelerators for CNNs that provide inference performance and energy consumption compared GPUs. Such embedded differ amount compute resources memory-access bandwidth, which...
As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning quantization proposed reduce the model size significantly, which key find suitable allocation (e.g., sparsity codebook) each layer. Existing solutions obtain in an iterative/manual fashion while finetuning compressed model, thus...
Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers developed various model compression techniques such as quantization pruning. Recently, there has a surge research on methods achieve efficiency while retaining performance. Furthermore, more works focus customizing DNN hardware accelerators better...
The world's appetite for abundant-data computing, where a massive amount of structured and unstructured data is analyzed, has increased dramatically. computational demands these applications, such as deep learning, far exceed the capabilities today's systems, especially energy-constrained embedded systems (e.g., mobile with limited battery capacity). These are unlikely to be met by isolated improvements in transistor or memory technologies, integrated circuit (IC) architectures alone....
Wireless body sensor nodes (WBSNs) are miniaturized devices that able to acquire, process and transmit bio-signals (such as electrocardiograms, respiration or human-body kinetics). WBSNs face major design challenges due extremely limited power budgets very small form factors. We demonstrate, for the first time in literature, use of disruptive nanotechnologies create new nano-engineered ultra-low (ULP) WBSN architectures. Compared state-of-the-art multi-core designs, our architectures...
For the first time, we investigated ultra-short-channel ZnO thin-film FETs with L <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ch</inf> = 8 nm extremely scaled channel thickness t xmlns:xlink="http://www.w3.org/1999/xlink">ZnO</inf> of 3nm, device exhibits ultra-low sub-pA/µm off leakage (1.2 pA/µm), high electron mobility (µ xmlns:xlink="http://www.w3.org/1999/xlink">eff</inf> 84 cm2/V•s) record peak transconductance (Gm,) 254 μS/μm at V...
This paper builds a Machine Learning based novel classification framework that performs multi-class optimally when the features are scarce. It uses context of undergraduate STEM courses for making periodic predictions student performance at fine-grained level. shows neither vanilla classifier nor an ensemble perform number small (during early predictions). The ML hybrid predicts four classes equally spaced intervals during semester: at-risk (grade below C), prone to risk ok B), and good A)....
Multiplication is an important fundamental operation that critical in most signal and image processing applications. It also essential for all types of wireless communications We compare general multipliers from architecture point view, maximum clock frequency, latency, throughput, resource usage, as well dynamic power consumption. use a flopped combinational baseline multiplier our comparison we the same FPGA platform to be fair analysis. conclude regular approach implying DSP elements HDL...
Spin Transfer Torque Random Access Memory (STT-RAM) has garnered interest due to its various characteristics such as non-volatility, low leakage power, high density. Its magnetic properties have a vital role in STT switching operations through thermal effectiveness. A key challenge for STT-RAM industrial adaption is the write energy and latency. In this paper, we overcome by exploiting stochastic of cells and, tandem, with circuit-level approximation. We enforce robustness our technique...
The proliferation of advanced analytics and artificial intelligence has been driven by huge volumes data that are mostly generated at the edge. Simultaneously, there is a rising demand to perform on edge platforms (i.e., near-sensor analytics). However, conventional architectures such may not execute targeted applications in an energy-efficient manner. Emerging near in-memory computing paradigms can increase energy efficiency relying emerging logic memory devices. More importantly, these...
Current data-centric workloads, such as deep learning, expose the memory-access inefficiencies of current computing systems. Monolithic 3D integration can overcome this limitation by leveraging fine-grained and dense vertical connectivity to enable massively-concurrent accesses between compute memory units. Thin-Film Transistors (TFTs) Resistive RAM (RRAM) naturally monolithic they are fabricated in low temperature (a crucial requirement). In paper, we explore ZnO-based TFTs HfO <inf...
Non-maximum Suppression (NMS) is an essential post-processing step in modern convolutional neural networks for object detection. Unlike convolutions which are inherently parallel, the de-facto standard NMS, namely GreedyNMS, cannot be easily parallelized and thus could performance bottleneck detection pipelines. MaxpoolNMS introduced as a parallelizable alternative to turn enables faster speed than GreedyNMS at comparable accuracy. However, only capable of replacing first stage two-stage...
Non-maximum Suppression (NMS) in one- and two-stage object detection deep neural networks (e.g., SSD Faster-RCNN) is becoming the computation bottleneck. In this paper, we introduce a hardware acceleration for scalable PSRR-MaxpoolNMS algorithm. Our architecture shows 75.0× 305× speedups compared to software implementation of as well implementations GreedyNMS, respectively, while simultaneously achieving comparable Mean Average Precision (mAP) software-based floating-point implementations....
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order alleviate computation and storage burdens, we propose novel dataflow-based joint quantization approach with hypothesis that fewer number of operations would incur less information loss thus improve final performance. It first introduces scheme efficient bit-shifting rounding represent network parameters...
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order alleviate computation and storage burdens, we propose novel dataflow-based joint quantization approach with hypothesis that fewer number of operations would incur less information loss thus improve final performance. It first introduces scheme efficient bit-shifting rounding represent network parameters...