- Ferroelectric and Negative Capacitance Devices
- Advanced Memory and Neural Computing
- Machine Learning and ELM
- Semiconductor materials and devices
- Network Packet Processing and Optimization
- Domain Adaptation and Few-Shot Learning
- Music and Audio Processing
- MXene and MAX Phase Materials
- Advanced Bandit Algorithms Research
- Caching and Content Delivery
- Speech and Audio Processing
- Speech Recognition and Synthesis
- Advanced Neural Network Applications
- Cocoa and Sweet Potato Agronomy
- Natural Language Processing Techniques
- Mind wandering and attention
- Music Technology and Sound Studies
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Food Chemistry and Fat Analysis
- Image Enhancement Techniques
- 2D Materials and Applications
- Automated Road and Building Extraction
- Reinforcement Learning in Robotics
- Electronic and Structural Properties of Oxides
De La Salle University
2016-2024
University of Notre Dame
2019-2022
University of the Philippines Diliman
2012-2014
University of the Philippines System
2014
Pattern searches, a key operation in many data analytic applications, often deal with represented by multiple states per dimension. However, hash tables, common software-based pattern search approach, require large amount of additional memory, and thus, are limited the memory wall. A hardware-based solution is to use content-addressable memories (CAMs) that support fast associative searches parallel. Ternary CAMs (TCAMs) bit-wise Hamming distance (HD) based searches. Detecting HD vectors...
Nearest neighbor (NN) search is an essential operation in many applications, such as one/few-shot learning and image classification. As such, fast low-energy hardware support for accurate NN highly desirable. Ternary content-addressable memories (TCAMs) have been proposed to accelerate few-shot tasks by implementing <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$L$</tex> <inf xmlns:xlink="http://www.w3.org/1999/xlink">∞</inf> Hamming distance...
Abstract Lifelong on-device learning is a key challenge for machine intelligence, and this requires from few, often single, samples. Memory-augmented neural networks have been proposed to achieve the goal, but memory module must be stored in off-chip memory, heavily limiting practical use. In work, we experimentally validated that all different structures memory-augmented network can implemented fully integrated memristive crossbar platform with an accuracy closely matches digital hardware....
Nearest neighbor (NN) search computations are at the core of many applications such as few-shot learning, classification, and hyperdimensional computing. As such, efficient hardware support for NN is highly desired. In-memory computing using emerging devices offers attractive solutions search. Solutions based on ternary content-addressable memories (TCAMs) offer high energy latency improvements expense accuracy. In this work, we propose a novel distance function that can be natively...
Associative memories (AMs), which efficiently "associate" input queries with appropriate data words/locations in the memory, are powerful in-memory-computing cores. Harnessing benefits of AMs requires cross-layer design efforts that span from devices and circuits to architectures applications. This paper showcases representative AM designs based on different non-volatile memory technologies (resistive RAM (RRAM), ferroelectric FETs (FeFETs), Flash). End-to-end evaluations for machine...
As CMOS technology advances, the performance gap between CPU and main memory has not improved. Furthermore, hardware deployed for Internet of Things (IoT) applications need to process ever growing volumes data, which can further exacerbate "memory wall". Computing-in-memory (CiM) architectures, where logic arithmetic operations are performed in memory, significantly reduce energy latency overheads associated with data transfer, potentially alleviate processor-memory bottlenecks. In this...
Deep random forest (DRF), which combines deep learning and forest, exhibits comparable accuracy, interpretability, low memory computational overhead to neural networks (DNNs) in edge intelligence tasks. However, efficient DRF accelerator is lagging behind its DNN counterparts. The key acceleration lies realizing the branch-split operation at decision nodes. In this work, we propose implementing through associative searches realized with ferroelectric analog content addressable (ACAM)....
In a number of machine learning models, an input query is searched across the trained class vectors to find closest feature vector in cosine similarity metric. However, performing similarities between Von-Neumann machines involves large multiplications, Euclidean normalizations and division operations, thus incurring heavy hardware energy latency overheads. Moreover, due memory wall problem that presents conventional architecture, frequent similarity-based searches (CSSs) over requires lot...
Neural networks with external memories have been proven to minimize catastrophic forgetting, a major problem in applications such as lifelong and few-shot learning. However, memory enhanced neural (MENNs) often require large number of floating point-based cosine distance metric calculations perform necessary attentional operations, which greatly increases energy consumption hardware cost. This paper investigates other metrics order achieve more efficient implementations MENNs. We propose...
Genome analysis is becoming more important in the fields of forensic science, medicine, and history. Sequencing technologies such as High Throughput (HTS) Third Generation (TGS) have greatly accelerated genome sequencing. However, read mapping remains significantly slower than Because enormous amount data needed, speed transfer between memory processing unit limits execution speed. In-memory computing can help address memory-bandwidth bottleneck by minimizing transfers. Ternary Content...
This article proposes a novel magnetoelectric (ME) effect-based ternary content addressable memory (TCAM). The potential array-level write and search performances of the proposed ME-TCAM are studied using experimentally calibrated compact physical models SPICE simulations. voltage-controlled operation ME devices eliminates large joule heating present in current-controlled magnetic their low-voltage makes them more energy-efficient compared to static random access memory-based TCAMs...
Editor's note: Semiconductor industry is steadily on the quest for emerging devices and device technologies that lead to higher performance efficiency of computing over CMOS technology. This tutorial introduces potential integrate ferroelectric material into digital as well analog circuits. With a focus FeFET technology, authors first present characteristics, advantages in comparison but also other such RRAM. The article comprehensively demonstrates use technology circuits, architectures,...
Transformer networks have outperformed recurrent neural and convolutional in various sequential tasks. However, scaling transformer for long sequences has been challenging because of memory compute bottlenecks. are impeded by bandwidth limitations their low operation per byte ratio resulting utilization GPU's computing resources. In-memory processing can mitigate bottlenecks eliminating the transfer time between units. Furthermore, use attention mechanisms to characterize relationships...
Recommendation systems (RecSys) suggest items to users by predicting their preferences based on historical data. Typical RecSys handle large embedding tables and many table related operations. The memory size bandwidth of the conventional computer architecture restrict performance RecSys. This work proposes an in-memory-computing (IMC) (iMARS) for accelerating filtering ranking stages deep neural network-based iMARS leverages IMC-friendly implemented inside a ferroelectric FET IMC fabric....
This article presents the design of a novel and compact spin-orbit torque (SOT)-based ternary content addressable memory (TCAM). Experimentally validated/calibrated micromagnetic macrospin simulations have been used to quantify various tradeoffs regarding write operation, such as energy, error rate, retention time. SPICE incorporating sources variability are evaluate search operations, optimize proposed TCAM cell based on SOT magnetic random access (SOT-MRAM), benchmark it against static...
Among few-shot learning methods, prototypical networks (PNs) are one of the most popular approaches due to their excellent classification accuracies and network simplicity. Test examples classified based on distances from class prototypes. Despite application-level advantages PNs, latency transferring data memory compute units is much higher than PN computation time. Thus, PNs performance limited by bandwidth. Computing-in-memory addresses this bandwidth-bottleneck problem bringing a subset...
Transformer networks have outperformed recurrent and convolutional neural in terms of accuracy various sequential tasks. However, memory compute bottlenecks prevent transformer from scaling to long sequences due their high execution time energy consumption. Different attention mechanisms been proposed lower computational load but still suffer the bandwidth bottleneck. In-memory processing can help alleviate by reducing transfer overhead between units, thus allowing scale longer sequences. We...
The rapidly increasing volume and complexity of data is demanding the relentless scaling computing power. With transistor feature size approaching physical limits, benefits that CMOS technology can provide diminishing. For future energy efficient systems, researchers aim to exploit various emerging nanotechnologies replace conventional technology. In particular, ferroelectric FETs (FeFETs) appear be a promising candidate continue improving efficiency for data-intensive applications. Advances...
Deep random forest (DRF), which incorporates the core features of deep learning and (RF), exhibits comparable classification accuracy, interpretability, low memory computational overhead when compared with neural networks (DNNs) in various information processing tasks for edge intelligence. However, development efficient hardware to accelerate DRF is lagging behind its DNN counterparts. The key acceleration lies efficiently realizing branch-split operation at decision nodes traversing a...
Attention-in-Memory (AiM), a computing-in-memory (CiM) design, is introduced to implement the attentional layer of Memory Augmented Neural Networks (MANNs). AiM consists memory array based on Ferroelectric FETs (FeFET) along with CMOS peripheral circuits implementing configurable functionalities, i.e., it can be dynamically changed from ternary content-addressable (TCAM) general-purpose (GP) CiM. When compared state-of-the art accelerators, achieves comparable end-to-end speed-up and energy...
Experience replay is an essential component in deep reinforcement learning (DRL), which stores the experiences and generates for agent to learn real time. Recently, prioritized experience (PER) has been proven be powerful widely deployed DRL agents. However, implementing PER on traditional CPU or GPU architectures incurs significant latency overhead due its frequent irregular memory accesses. This paper proposes a hardware-software co-design approach design associative (AM) based PER, AMPER,...
Cacao farms worldwide lose up to 40% of their crops annually due several diseases. To reduce the damage, farmers and agricultural technicians regularly monitor well-being crops. But at present many still rely on visual inspection assess degree infection crops, resulting errors inconsistencies subjective nature assessment procedure. improve procedure, this research developed a framework for detecting segmenting infected parts fruit measure level cacao pods based k-means algorithm supplemented...
Over the past decades, emerging, data-driven machine learning (ML) paradigms have increased in popularity, and revolutionized many application domains. To date, a substantial effort has been devoted to devising mechanisms for facilitating deployment near ubiquitous use of these memory intensive ML models. This review paper presents in-memory computing (IMC) accelerators emerging from bottom-up perspective through choice devices, design circuits/architectures, application-level results.