Naoya Onizawa

ORCID: 0000-0002-4855-7081
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Memory and Neural Computing
  • Low-power high-performance VLSI design
  • Error Correcting Code Techniques
  • Network Packet Processing and Optimization
  • Interconnection Networks and Systems
  • Quantum Computing Algorithms and Architecture
  • Ferroelectric and Negative Capacitance Devices
  • Neural Networks and Applications
  • Advanced Wireless Communication Techniques
  • Parallel Computing and Optimization Techniques
  • Analog and Mixed-Signal Circuit Design
  • Caching and Content Delivery
  • Semiconductor materials and devices
  • Advanced Adaptive Filtering Techniques
  • Cooperative Communication and Network Coding
  • VLSI and Analog Circuit Testing
  • Radiation Effects in Electronics
  • Advancements in Semiconductor Devices and Circuit Design
  • Quantum-Dot Cellular Automata
  • CCD and CMOS Imaging Sensors
  • Stochastic Gradient Optimization Techniques
  • Quantum and electron transport phenomena
  • Cellular Automata and Applications
  • Energy Harvesting in Wireless Networks
  • Neural dynamics and brain function

Tohoku University
2016-2025

Tohoku University Hospital
2019

University of Novi Sad
2018

NEC (Japan)
2014

McGill University
2012-2013

University of Waterloo
2012-2013

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a implementation. However, numerous elements and complex interconnections are usually required, leading to large area occupation copious power consumption. Stochastic computing (SC) shown promising results for low-power area-efficient implementations, even though existing stochastic algorithms long streams cause latencies. In...

10.1109/tvlsi.2017.2654298 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-02-01

Nonvolatile spintronic devices have potential advantages, such as fast read/write and high endurance together with back-end-of-the-line compatibility, which offers the possibility of constructing not only stand-alone RAMs embedded that can be used in conventional VLSI circuits systems but also standby-power-free high-performance nonvolatile CMOS logic employing logic-in-memory architecture. The advantages devices, especially magnetic tunnel junction (MTJ) circuits, are discussed, current...

10.1109/jproc.2016.2574939 article EN Proceedings of the IEEE 2016-09-07

Invertible logic can operate in one of two modes: 1) a forward mode, which inputs are presented and single, correct output is produced, 2) reverse the fixed take on values consistent with output. It possible to create invertible using various Boltzmann machine configurations. Such systems have been shown solve certain challenging problems quickly, such as factorization combinatorial optimization. In this paper, we show that be implemented simple spiking neural networks based stochastic...

10.1109/tcsi.2018.2889732 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2019-01-11

Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA increases rapidly, as size problem grows. Recently, stochastic simulated (SSA) that converges faster than conventional has been reported. In this paper, we present hardware-aware SSA (HA-SSA) memory-efficient FPGA implementations. HA-SSA can reduce memory usage storing intermediate results while maintaining computing speed SSA. For evaluation purposes,...

10.1109/jetcas.2023.3243260 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2023-02-08

This letter introduces a design and proof-of-concept implementation of Gabor filters based on stochastic computation for area-efficient hardware. The filter exhibits powerful image feature extraction capability, but it requires significant computational power. Using computation, sine function used in the is approximated by exploiting several tanh functions designed state machine. A realized using shaper exponential simulated compared with original that shows almost equivalent behaviour at...

10.1109/lsp.2015.2392123 article EN IEEE Signal Processing Letters 2015-01-14

This paper presents a design of True Random Number Generator (TRNG) using Spin Transfer Torque Magnetic Tunnel Junction (STT-MTJ) device. Since the probability STT-MTJ-based TRNG is locked digitally controlled feedback loop, sensitivity gain can be reduced greatly, which eliminates high-gain amplifier in loop. It demonstrated circuit simulator (NS-SPICE where STT-MTJ model established based on 90nm CMOS/MTJ process technologies) and MATLAB that random sequences generated from become 50%,...

10.1109/newcas.2015.7182089 article EN 2015-06-01

Probabilistic computing using probabilistic bits (p-bits) presents an efficient alternative to traditional CMOS logic for complex problem-solving, including simulated annealing and machine learning. Realizing p-bits with emerging devices such as magnetic tunnel junctions introduces device variability, which was expected negatively impact computational performance. However, this study reveals unexpected finding: variability can not only degrade but also enhance algorithm performance,...

10.1038/s41598-025-90520-3 article EN cc-by-nc-nd Scientific Reports 2025-02-19

This paper presents algorithm, architecture, and fabrication results of a nonvolatile context-driven search engine that reduces energy consumption as well computational delay compared to classical hardware software-based approaches. The proposed architecture stores only associations between items from multiple fields in the form binary links, merges repeated field reduce memory requirements accesses. fabricated chip achieves <formula formulatype="inline"...

10.1109/jetcas.2014.2361061 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2014-10-21

We present a method to design high-throughput fully parallel low-density parity-check (LDPC) decoders. With our method, decoder's longest wires are divided into several short with pipeline registers. Log-likelihood ratio messages transmitted along these pipelined paths thus sent over multiple clock cycles, and the critical path delay can be reduced while maintaining comparable bit error rate performance. The number of registers inserted is estimated by using wiring information extracted from...

10.1109/tvlsi.2008.2011360 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2009-04-15

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention since many applications require high-speed operations. However, numerous processing elements and complex interconnections are usually required, leading to a large area occupation high power consumption. Stochastic computing shown promising results for area-efficient implementations, even though existing stochastic algorithms long streams that exhibit latency. In this paper, we propose an...

10.1109/istc.2016.7593108 preprint EN 2016-09-01

Deep Neural Networks (DNNs) have recently shown state-of-the-art results on various applications, such as computer vision and recognition tasks. DNN inference engines can be implemented in hardware with high energy efficiency the computation realized using a low-precision fixed point or even binary precision sufficient cognition accuracies. On other hand, training DNNs well-known back-propagation algorithm requires high-precision floating-point computations CPU and/or GPU causing significant...

10.1109/tcsi.2019.2960383 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2019-12-31

This paper introduces an analog-to-stochastic converter using a magnetic tunnel junction (MTJ) device for vision chips based on stochastic computation. Stochastic computation has been recently exploited area-efficient hardware implementation, such as low-density parity-check decoders and image processors. However, power-and-area hungry two-step (analog-to-digital digital-to-stochastic) converters are required the analog to signal conversion. To realize one-step conversion, MTJ is used it...

10.1109/tnano.2015.2511151 article EN IEEE Transactions on Nanotechnology 2015-12-22

We propose a low-power content-addressable memory (CAM) employing new algorithm for associativity between the input tag and corresponding address of output data. The proposed architecture is based on recently developed sparse clustered network using binary connections that on-average eliminates most parallel comparisons performed during search. Therefore, dynamic energy consumption design significantly lower compared with conventional CAM design. Given an tag, computes few possibilities...

10.1109/tvlsi.2014.2316733 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2014-04-30

In this paper, we present a novel logic design style, namely memristor overwrite (MOL), associated with an original MOL-based computational memory.MOL relies on fully digital representation of and can operate different memristive device technologies.Its integration in crossbar arrays memories allows the execution bit vector-level primitive operations two steps at most.Promising features performances are demonstrated through implementation N -bit full addition using proposed memory.

10.1109/tvlsi.2020.3011522 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2020-08-06

This article critically investigates the limitations of simulated annealing algorithm using probabilistic bits (pSA) in solving large-scale combinatorial optimization problems. The study begins with an in-depth analysis pSA process, focusing on issues resulting from unexpected oscillations among p-bits. These hinder energy reduction Ising model and thus obstruct successful execution complex tasks. Through detailed simulations, we unravel root cause this stagnation, identifying feedback...

10.1038/s41598-024-51639-x article EN cc-by Scientific Reports 2024-01-16

Associative memories are alternatives to indexed that when implemented in hardware can benefit many applications such as data mining. The classical neural network based methodology is impractical implement since order increase the size of memory, number information bits stored per memory bit (efficiency) approaches zero. In addition, length a message be and retrieved needs same nodes causing total messages capable storing (diversity) limited. Recently, novel algorithm on sparse clustered...

10.1109/iscas.2012.6271922 article EN 1993 IEEE International Symposium on Circuits and Systems 2012-05-01

A new asynchronous delay-insensitive data-transmission method based on level-encoded dual-rail (LEDR) encoding with novel packet-structure restriction is proposed to realize a high-throughput network-on-chip (NoC) router together compact hardware. The use of LEDR makes communication steps and the registers being used half in comparison four-phase because spacer information one eliminated, which significantly improves network throughput. By using packet structure, phase header tail flits...

10.1109/tc.2013.81 article EN IEEE Transactions on Computers 2013-04-08

Invertible logic using a probabilistic magnetoresistive device model has been recently presented that can compute functions in bidirectional ways and solve several problems quickly, such as factorization combinational optimization. In this article, we present design framework for invertible circuits. Our approach makes use of linear programming to create Hamiltonian library with the minimum number nodes small invertible-logic functions. addition, is approximated based on stochastic computing...

10.1109/tcad.2020.3003906 article EN cc-by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2020-06-22

Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising models. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The implementation approximates p-bit function, which can search solution to combinatorial optimization problem at lower energy than conventional p-bits. Searching around global minimum increase probability finding solution. proposed...

10.1109/tnnls.2022.3159713 article EN cc-by IEEE Transactions on Neural Networks and Learning Systems 2022-03-28

This paper introduces a self-timed overlapped search mechanism for high-throughput content-addressable memories (CAMs) with low energy. Most mismatches can be found by searching the first few bits in word. Consequently, if word circuit is divided into two sections that are sequentially searched, most match lines second section unused. As faster than an entire word, we could potentially increase throughput initiating second-stage on unused as soon first-stage complete. The realized using...

10.1109/async.2012.25 article EN 2012-05-01

This paper introduces a reordered overlapped search mechanism for high-throughput low-energy content-addressable memories (CAMs). Most mismatches can be found by searching few bits of word. To lower power dissipation, word circuit is often divided into two sections that are sequentially searched or even pipelined. Because this process, most match lines in the second section unused. Since last very fast compared to rest bits, we propose increase throughput asynchronously initiating...

10.1109/tcsi.2013.2283997 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2013-10-21

This article introduces high-throughput/low-energy true random number generators (TRNGs) based on CMOS and three-terminal magnetic tunnel junction (MTJ) devices. MTJs are fast probabilistic switching devices, which can be used as sources for TRNGs. However, the probability is quite sensitive to write current given MTJs, precise closed-loop control necessary. Thus, a high-complexity circuit required, such high precision digital-to-analog converters (DACs), occupying large area causing energy...

10.1109/tvlsi.2020.3005413 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2020-07-22

This paper presents a low-energy asynchronous interleaver for clockless fully parallel low-density parity-check (LDPC) decoding. The proposed data-transmission circuit based on half-duplex single-track protocol makes it possible to realize wire-efficient with small energy consumption. Moreover, data-monitoring system adaptively shuts down the if not necessary, which reduces number of data transmissions and, hence, consumed. decoder is evaluated using (1056,528) irregular LDPC code under...

10.1109/tcsi.2011.2107271 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2011-02-16

A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and corresponding address of output data. The proposed architecture based on recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most parallel comparisons performed during search. Therefore, dynamic energy consumption design significantly lower compared to conventional CAM design. Given an tag, computes few...

10.1109/asap.2013.6567594 preprint EN 2013-06-01
Coming Soon ...