Yaoyu Tao

ORCID: 0000-0001-7500-5250
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Memory and Neural Computing
  • Error Correcting Code Techniques
  • Advanced Wireless Communication Techniques
  • Ferroelectric and Negative Capacitance Devices
  • DNA and Biological Computing
  • Cooperative Communication and Network Coding
  • Neural Networks and Applications
  • Coding theory and cryptography
  • Neural Networks and Reservoir Computing
  • CCD and CMOS Imaging Sensors
  • Photoreceptor and optogenetics research
  • Quantum Computing Algorithms and Architecture
  • Model Reduction and Neural Networks
  • Phase-change materials and chalcogenides
  • Machine Learning and ELM
  • Semiconductor materials and devices
  • Advancements in Semiconductor Devices and Circuit Design
  • Parallel Computing and Optimization Techniques
  • Transition Metal Oxide Nanomaterials
  • Radiation Effects in Electronics
  • Advanced biosensing and bioanalysis techniques
  • Embedded Systems Design Techniques
  • Stochastic Gradient Optimization Techniques
  • Data Quality and Management
  • Image and Video Stabilization

Peking University
2022-2025

Chinese Institute for Brain Research
2025

Beijing Academy of Artificial Intelligence
2023-2025

University of Michigan
2012-2023

Qualcomm (United States)
2022

Abstract Wireless internet-of-things (WIoT) with data acquisition sensors are evolving rapidly and the demand for transmission efficiency is growing rapidly. Frequency converter that synthesizes signals at different frequencies mixes them sensor datastreams a key component efficient wireless transmission. However, existing frequency converters employ separate synthesize mix circuits complex digital analog using complementary metal-oxide semiconductor (CMOS) devices, naturally incurring...

10.1038/s41467-024-45923-7 article EN cc-by Nature Communications 2024-02-19

A 1.48mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> 1024-bit belief propagation polar decoder is designed in 65nm CMOS. unidirectional processing reduces the memory size to 45Kb, and simplifies element. double-column 1024-parallel architecture enables a 4.68Gb/s throughput. bit-splitting latch-based register file accommodates logic for an 85% density. The circuit techniques reduce power 478mW efficiency of 15.5pJ/b/iteration at...

10.1109/vlsic.2014.6858413 article EN 2014-06-01

Abstract Humans are complex organisms made by millions of physiological systems. Therefore, activities can represent physical or mental states the human body. Physiological signal processing is essential in monitoring features. For example, non‐invasive electroencephalography (EEG) signals be used to reconstruct brain consciousness and detect eye movements for identity verification. However, requires high resolution, sensitivity, fast responses, low power consumption, hindering practical...

10.1002/aelm.202300021 article EN cc-by Advanced Electronic Materials 2023-04-17

Polar codes are capacity-achieving channel and they have recently been adopted for fifth-generation (5G) enhanced mobile broadband (eMBB) control channels. Using successive cancellation list (SCL) decoding, the error-correction performance of polar can surpass state-of-the-art a comparable length. However, sequential SC decoding incurs long latency, requires complex tracking candidates. We present split-tree SCL decoder that works by dividing code's tree to sub-trees following algorithm. The...

10.1109/jssc.2020.3005763 article EN publisher-specific-oa IEEE Journal of Solid-State Circuits 2020-07-20

Abstract Computing‐in‐memory (CIM) architecture inspired by the hierarchy of human brain is proposed to resolve von Neumann bottleneck and boost acceleration artificial intelligence. Whereas remarkable progress has been achieved for CIM, making further improvements in CIM performance becoming increasingly challenging, which mainly caused disparity between rapid evolution synaptic arrays relatively slow building efficient neuronal devices. Specifically, dedicated efforts are required toward...

10.1002/adfm.202405618 article EN Advanced Functional Materials 2024-05-14

Compute-in-memory based on resistive random-access memory has emerged as a promising technology for accelerating neural networks edge devices. It can reduce frequent data transfers and improve energy efficiency. However, the nonvolatile nature of raises concerns that stored weights be easily extracted during computation. To address this challenge, we propose RePACK, threefold protection scheme safeguards network input, weight, structural information. utilizes bipartite-sort coding to store...

10.1038/s41467-025-56412-w article EN cc-by-nc-nd Nature Communications 2025-01-25

The rapid development of deep learning enables significant breakthroughs for intelligent edge‐terminal devices. However, neural network training edge computing is currently overly dependent on cloud service platforms, resulting in low adaptivity fast‐changing real‐world environments. energy efficiency also strictly constrained by the traditional Von‐Neumann architecture with separate memory and processing units. To improve adaptability devices, a fully parallel online scheme based...

10.1002/aisy.202401068 article EN cc-by Advanced Intelligent Systems 2025-01-28

Processing-in-memory (PIM) based on emerging devices such as memristors is more vulnerable to noise than traditional memories, due the physical non-idealities and complex operations in analog domains. To ensure high reliability, efficient error-correcting code (ECC) highly desired. However, state-of-the-art ECC schemes for PIM suffer drawbacks including dataflow interruptions, low rates, limited error correction patterns. In this work, we propose non-binary low-density parity-check (NB-LDPC)...

10.48550/arxiv.2502.11487 preprint EN arXiv (Cornell University) 2025-02-17

Nonbinary LDPC (NB-LDPC) codes, defined over Galois field, offer better coding gain and a lower error floor than binary codes. However, the complex decoding large memory requirement have prevented any practical chip implementations. We present 1.22 Gb/s fully parallel decoder of GF(64) (160, 80) regular-(2, 4) NB-LDPC code in 65 nm CMOS. The reduced number edges code's factor graph permits low wiring overhead architecture. throughput is further improved by one-step look-ahead check node...

10.1109/jssc.2014.2362854 article EN IEEE Journal of Solid-State Circuits 2014-11-13

Nonbinary polar codes defined over Galois field GF(q) have shown improved error-correction performance than binary using successive-cancellation list (SCL) decoding. However, nonbinary operations are complex and a direct-mapped decoder results in low throughput, representing difficulties for practical adoptions. In this work, we develop, to the best of our knowledge, first hardware implementation SCL We present high-throughput architecture split-tree algorithm. The sub-trees decoded parallel...

10.1109/iscas48785.2022.9937445 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28

The rapid development of visual neuromorphic hardware can be attributed to their ability capture, store and process optical signals from the environment. main limitation existing is that realization complex functions premised on increase manufacturing cost, volume energy consumption. In this study, we demonstrated an synaptic device based a three-terminal van der Waals (vdW) heterojunction realize sensing light wavelength intensity as well short-term long-term plasticity. image recognition...

10.1002/adma.202401060 article EN Advanced Materials 2024-10-29

Nonbinary LDPC codes have shown superior performance, but decoding nonbinary is complex, incurring a long latency and much degraded throughput. We propose low-latency variable processing node by skimming algorithm, together with extended min-sum check prefetching relaxing redundancy control. The nodes are jointly designed for an optimal pipeline schedule. This low-latency, high-throughput architecture applied to class of high-performance (2, d <sub...

10.1109/iscas.2012.6271844 article EN 1993 IEEE International Symposium on Circuits and Systems 2012-05-01

The primary design goal of a communication or storage system is to allow the most reliable transmission more information at lowest signal-to-noise ratio (SNR). State-of-the-art channel codes including turbo and binary LDPC have been extensively used in recent applications [1-2] close gap towards possible SNR, known as Shannon limit. recently developed nonbinary (NB-LDPC) code, defined over Galois field (GF), holds great promise for approaching limit [3]. It offers better coding gain lower...

10.1109/isscc.2013.6487797 article EN 2013-02-01

Abstract Neural architecture search (NAS), as a subfield of automated machine learning, can design neural network models with better performance than manual design. However, the energy and time consumptions conventional software‐based NAS are huge, hindering its development applications. Herein, 4 Mb phase change memory (PCM) chips first fabricated that enable two key in‐memory computing operations—in‐memory multiply‐accumulate (MAC) rank for efficient NAS. The impacts coating layer material...

10.1002/adfm.202300458 article EN Advanced Functional Materials 2023-07-28

Integrating heterogeneous chiplets in a package presents promising and cost-effective approach to constructing scalable flexible systems for accelerating wide range of workloads. We introduce Arvon that integrates 14-nm FPGA chiplet with two efficient densely packed 22-nm DSP using embedded multidie interconnect bridges (EMIBs). The are interconnected via 1.536-Tb/s advanced interface bus (AIB) 1.0 7.68-Tb/s AIB 2.0 interface. is programmable, supporting various workloads from neural network...

10.1109/jssc.2023.3343457 article EN IEEE Journal of Solid-State Circuits 2023-12-27

Neural Architecture Search architecture search (NAS) can design neural network models with better performance than manual design. However, the energy and time consumption of conventional software-based NAS are huge, hindering its development applications. In article number 2300458, Zhitang Song, Yuchao Yang, Ru Huang, co-workers report 4 Mb PCM chips that enable in-memory MAC rank for efficient NAS. The PCM-based improves efficiency by 4779× 123× compared GPU.

10.1002/adfm.202470083 article EN Advanced Functional Materials 2024-04-01

Epilepsy is a prevalent neurological disorder, rendering the development of automated seizure detection systems imperative. While complex machine learning models are powerful, their training and hardware deployment remain challenging. The reservoir computing system offers low-cost solution in terms both requirements training. In this paper, we introduce compact for detection, based on α-In2Se3 dynamic memristors. Leaky integrate-and-fire neurons used post-processing output system,...

10.1063/5.0171274 article EN cc-by APL Machine Learning 2023-12-01

Neural ordinary differential equation (Neural-ODE) outperforms conventional deep neural networks (DNNs) in modeling continuous-time or dynamical systems by adopting numerical ODE integration onto a shallow embedded NN. However, Neural-ODE suffers from slow inference due to the costly iterative stepsize search integration, especially when using higher-order Runge-Kutta (RK) methods and smaller error tolerance for improved accuracy. In this work, we first present algorithmic techniques speedup...

10.1145/3543622.3573044 article EN 2023-02-10

Memristive in-memory sorting has been proposed recently to improve hardware efficiency. Using iterative min computations, data movements between memory and external processing units can be eliminated for improved latency energy However, the bit-traversal algorithm search requires a large number of column reads on memristive memory. In this work, we propose column-skipping with help near-memory circuit. Redundant skipped based recorded states To enhance scalability, develop multi-bank...

10.1109/iscas48785.2022.9937928 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2022-05-28

Technology scaling continues to improve density, but also reduces the critical charge hold a logic state, causing devices become more susceptible accidental disruptions due noise and soft errors. Increased process variation adds reliability challenge, resulting in over designs extra timing margins at cost of power consumption, silicon area performance degradation. We present efficient situ error detection techniques exploit datapath characteristics for monitoring circuit errors: pre-edge...

10.1109/iscas.2013.6571961 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2013-05-01

Successive-cancellation list (SCL) decoding of polar codes has been adopted for 5G wireless communications. How-ever, the performance moderate code length is not satisfactory. Heuristic or deep-learning-aided (DL-aided) flip algorithms have developed to improve by locating error bit positions after SCL decoding. In this work, we propose a new algorithm with help differentiable neural computer (DNC). New state and action encoding are DNC training inference efficiency. The proposed two-phase...

10.1109/globecom46510.2021.9685277 article EN 2015 IEEE Global Communications Conference (GLOBECOM) 2021-12-01

In-memory computing based on multi-level RRAMs has shown great potential in neural network computations. In-situ training is becoming increasingly attractive, but requires accurate on-chip RRAM cell write. In this paper, we demonstrate a fully write and verify circuit novel methodology to improve the endurance of RRAM. Simulations show up $120 \times$ programming speed improvement, achieving 86% reduction conductance standard deviation $50 longer device lifetime.

10.1109/edtm58488.2024.10512132 article EN 2024-03-03
Coming Soon ...