- Parallel Computing and Optimization Techniques
- Advanced Memory and Neural Computing
- Interconnection Networks and Systems
- Ferroelectric and Negative Capacitance Devices
- Advanced Data Storage Technologies
- Semiconductor materials and devices
- 3D IC and TSV technologies
- Advanced Neural Network Applications
- Low-power high-performance VLSI design
- Advancements in Semiconductor Devices and Circuit Design
- Embedded Systems Design Techniques
- VLSI and FPGA Design Techniques
- Advanced Graph Neural Networks
- Graph Theory and Algorithms
- Radiation Effects in Electronics
- Adversarial Robustness in Machine Learning
- Quantum Computing Algorithms and Architecture
- VLSI and Analog Circuit Testing
- Machine Learning and ELM
- Domain Adaptation and Few-Shot Learning
- Energy Harvesting in Wireless Networks
- Integrated Circuits and Semiconductor Failure Analysis
- Advanced Image and Video Retrieval Techniques
- CCD and CMOS Imaging Sensors
- Quantum Information and Cryptography
Hong Kong University of Science and Technology
2023-2025
University of Hong Kong
2023-2025
Xinjiang Agricultural University
2024-2025
Hohai University
2022-2025
Westlake University
2025
Tsinghua University
2014-2024
Alibaba Group (China)
2020-2024
Hunan University
2022-2024
East China Normal University
2021-2024
Tumor Hospital of Guangxi Medical University
2024
Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential be used main Moreover, with crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and been widely studied accelerate neural network (NN) applications. In this work, we...
Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated NVM candidate technologies, spin-torque-transfer (STT-RAM, or MRAM), phase-change random-access (PCRAM), and resistive (ReRAM) are regarded as most promising candidates. As ultimate goal of this research is to deploy them into multiple levels in hierarchy, it necessary explore wide design space find proper implementation at different hierarchy from highly latency-optimized caches density-...
Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to foreseeable end Moore's Law. Machine learning, especially deep neural networks (DNNs), has become most dazzling domain witnessing successful applications wide spectrum artificial intelligence (AI) tasks. The incomparable accuracy DNNs achieved by paying cost hungry memory consumption and high computational complexity, which greatly impedes their deployment...
Spiking neural networks (SNNs) that enables energy efficient implementation on emerging neuromorphic hardware are gaining more attention. Yet now, SNNs have not shown competitive performance compared with artificial (ANNs), due to the lack of effective learning algorithms and programming frameworks. We address this issue from two aspects: (1) propose a neuron normalization technique adjust selectivity develop direct algorithm for deep SNNs. (2) Via narrowing rate coding window converting...
Magnetic random access memory (MRAM) is a promising technology, which has fast read access, high density, and non-volatility. Using 3D heterogeneous integrations, it becomes feasible cost-efficient to stack MRAM atop conventional chip multiprocessors (CMPs). However, one disadvantage of its long write latency energy. In this paper, we first MRAM-based L2 caches directly CMPs compare against SRAM counterparts in terms performance We observe that the direct stacking might harm due...
Processing-in-memory (PIM) is a promising solution to address the "memory wall" challenges for future computer systems. Prior proposed PIM architectures put additional computation logic in or near memory. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential be used main Moreover, with crossbar array structure, ReRAM can perform matrix-vector multiplication efficiently, and been widely studied accelerate neural network (NN) applications. In this work, we...
Processing-in-memory (PIM) provides high bandwidth, massive parallelism, and energy efficiency by implementing computations in main memory, therefore eliminating the overhead of data movement between CPU memory. While most recent work focused on PIM DRAM memory with 3D die-stacking technology, we propose to leverage unique features emerging non-volatile (NVM), such as resistance-based storage current sensing, enable efficient design NVM. We Pinatubo1, a <u>P</u>rocessing <u>I</u>n...
Due to little considerations in the hardware constraints, e.g., limited connections between physical qubits enable two-qubit gates, most quantum algorithms cannot be directly executed on Noisy Intermediate-Scale Quantum (NISQ) devices. Dynamically remapping logical compiler is needed gates algorithm, which introduces additional operations and inevitably reduces fidelity of algorithm. Previous solutions finding such suffer from high complexity, poor initial mapping quality, flexibility...
Long interconnects are becoming an increasingly important problem from both power and performance perspectives. This motivates designers to adopt on-chip network-based communication infrastructures three-dimensional (3D) designs where multiple device layers stacked together. Considering the current trends towards increasing use of chip multiprocessing, it is timely consider 3D multiprocessor design memory networking issues, especially in context data management large L2 caches. The overall...
Caching techniques have been an efficient mechanism for mitigating the effects of processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially in context chip multiprocessors (CMPs), present many challenges area requirements, core-to-cache balance, power consumption, and design complexity. New advancements technology enable caches to be built from other technologies, such as Embedded DRAM (EDRAM), Magnetic RAM (MRAM), Phase-change (PRAM), both 2D chips or 3D...
Magnetic Random Access Memory (MRAM) has been considered as a promising memory technology due to many attractive properties. Integrating MRAM with CMOS logic may incur extra manufacture cost, its hybrid magnetic-CMOS fabrication process. Stacking on top of logics using 3D integration is way minimize this cost overhead. In paper, we discuss the circuit design issues for MRAM, and present cache model. Based model, compare against SRAM DRAM in terms area, performance, energy. Finally conduct...
The scalability of DRAM faces challenges from increasing power consumption and the difficulty building high aspect ratio capacitors. Consequently, emerging memory technologies including Phase Change Memory (PCM), Spin-Transfer Torque RAM (STT-RAM), Resistive (ReRAM) are being actively pursued as replacements for memory. Among these candidates, ReRAM has superior characteristics such density, low write energy, endurance, making it a very attractive cost-efficient alternative to DRAM. In this...
As the emerging field of machine learning, deep learning shows excellent ability in solving complex problems. However, size networks becomes increasingly large scale due to demands practical applications, which poses significant challenge construct a high performance implementations neural networks. In order improve as well maintain low power cost, this paper we design accelerator unit (DLAU), is scalable architecture for large-scale using field-programmable gate array (FPGA) hardware...
Data movement between the processing units and memory in traditional von Neumann architecture is creating "memory wall" problem. To bridge gap, two approaches, memory-rich processor (more on-chip memory) compute-capable (processing-in-memory) have been studied. However, first one has strong computing capability but limited capacity/bandwidth, whereas second exact opposite.
Recently, due to the availability of big data and rapid growth computing power, artificial intelligence (AI) has regained tremendous attention investment. Machine learning (ML) approaches have been successfully applied solve many problems in academia industry. Although explosion applications is driving development ML, it also imposes severe challenges processing speed scalability on conventional computer systems. Computing platforms that are dedicatedly designed for AI considered, ranging...
Inspired by the great success of neural networks, graph convolutional networks (GCNs) are proposed to analyze data. GCNs mainly include two phases with distinct execution patterns. The Aggregation phase, behaves as processing, showing a dynamic and irregular pattern. Combination acts more like presenting static regular hybrid patterns require design that alleviates irregularity exploits regularity. Moreover, achieve higher performance energy efficiency, needs leverage high intra-vertex...
High density, low leakage and non-volatility are the attractive features of Spin-Transfer-Torque-RAM (STT-RAM), which has made it a strong competitor against SRAM as universal memory replacement in multi-core systems. However, STT-RAM suffers from high write latency energy impeded its widespread adoption. To this end, we look at trading-off STT-RAM's property (data-retention-time) to overcome these problems. We formulate relationship between retention-time write-latency, find optimal for...
Persistent memory is an emerging technology which allows in-memory persistent data objects to be updated at much higher throughput than when using disks as storage. Previous designs use logging or copy-on-write mechanisms update data, unfortunately reduces the system performance roughly half that of a native with no persistence support. One great challenges in this application class therefore how efficiently enable atomic, consistent, and durable updates ensure survives and/or failures. Our...
Energy harvesting has been widely investigated as a promising method of providing power for ultra-low-power applications. Such energy sources include solar energy, radio-frequency (RF) radiation, piezoelectricity, thermal gradients, etc. However, the supplied by these is highly unreliable and dependent upon ambient environment factors. Hence, it necessary to develop specialized systems that are tolerant this variation, also capable making forward progress on computation tasks. The simulation...
In this letter, a flexible memory simulator - NVMain 2.0, is introduced to help the community for modeling not only commodity DRAMs but also emerging technologies, such as die-stacked DRAM caches, non-volatile memories (e.g., STT-RAM, PCRAM, and ReRAM) including multi-level cells (MLC), hybrid plus systems. Compared existing simulators, 2.0 features user interface with compelling simulation speed capability of providing sub-array-level parallelism, fine-grained refresh, MLC data encoder...
As technology scales, interconnects have become a major performance bottleneck and source of power consumption for microprocessors. Increasing interconnect costs make it necessary to consider alternate ways building modern One promising option is 3D architectures where stack multiple device layers with direct vertical tunneling through them are put together on the same chip. fabrication integrated circuits has viable, developing CAD tools architectural techniques imperative explore design...