NFDI4DS | UHH-SEMS - Publication Details

Naoya Onizawa

ORCID: 0000-0002-4855-7081

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5049740452

Research Areas

Advanced Memory and Neural Computing
Low-power high-performance VLSI design
Error Correcting Code Techniques
Network Packet Processing and Optimization
Interconnection Networks and Systems
Quantum Computing Algorithms and Architecture
Ferroelectric and Negative Capacitance Devices
Neural Networks and Applications
Advanced Wireless Communication Techniques
Parallel Computing and Optimization Techniques
Analog and Mixed-Signal Circuit Design
Caching and Content Delivery
Semiconductor materials and devices
Advanced Adaptive Filtering Techniques
Cooperative Communication and Network Coding
VLSI and Analog Circuit Testing
Radiation Effects in Electronics
Advancements in Semiconductor Devices and Circuit Design
Quantum-Dot Cellular Automata
CCD and CMOS Imaging Sensors
Stochastic Gradient Optimization Techniques
Quantum and electron transport phenomena
Cellular Automata and Applications
Energy Harvesting in Wireless Networks
Neural dynamics and brain function

Tohoku University
2016-2025

Tohoku University Hospital
2019

University of Novi Sad
2018

NEC (Japan)
2014

McGill University
2012-2013

University of Waterloo
2012-2013

VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

OPENALEX - Publications

Arash Ardakani François Leduc-Primeau Naoya Onizawa Takahiro Hanyu Warren J. Gross

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a implementation. However, numerous elements and complex interconnections are usually required, leading to large area occupation copious power consumption. Stochastic computing (SC) shown promising results for low-power area-efficient implementations, even though existing stochastic algorithms long streams cause latencies. In...

10.1109/tvlsi.2017.2654298 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-02-01

Standby-Power-Free Integrated Circuits Using MTJ-Based VLSI Computing

OPENALEX - Publications

Takahiro Hanyu Tetsuo Endoh Daisuke Suzuki Hiroki Koike Yitao Ma and 4 more

Nonvolatile spintronic devices have potential advantages, such as fast read/write and high endurance together with back-end-of-the-line compatibility, which offers the possibility of constructing not only stand-alone RAMs embedded that can be used in conventional VLSI circuits systems but also standby-power-free high-performance nonvolatile CMOS logic employing logic-in-memory architecture. The advantages devices, especially magnetic tunnel junction (MTJ) circuits, are discussed, current...

10.1109/jproc.2016.2574939 article EN Proceedings of the IEEE 2016-09-07

Efficient CMOS Invertible Logic Using Stochastic Computing

OPENALEX - Publications

Sean C. Smithson Naoya Onizawa Brett H. Meyer Warren J. Gross Takahiro Hanyu

Invertible logic can operate in one of two modes: 1) a forward mode, which inputs are presented and single, correct output is produced, 2) reverse the fixed take on values consistent with output. It possible to create invertible using various Boltzmann machine configurations. Such systems have been shown solve certain challenging problems quickly, such as factorization combinatorial optimization. In this paper, we show that be implemented simple spiking neural networks based stochastic...

10.1109/tcsi.2018.2889732 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2019-01-11

Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing

OPENALEX - Publications

Duckgyu Shin Naoya Onizawa Warren J. Gross Takahiro Hanyu

Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA increases rapidly, as size problem grows. Recently, stochastic simulated (SSA) that converges faster than conventional has been reported. In this paper, we present hardware-aware SSA (HA-SSA) memory-efficient FPGA implementations. HA-SSA can reduce memory usage storing intermediate results while maintaining computing speed SSA. For evaluation purposes,...

10.1109/jetcas.2023.3243260 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2023-02-08

Gabor Filter Based on Stochastic Computation

OPENALEX - Publications

Naoya Onizawa Daisaku Katagiri Kazumichi Matsumiya Warren J. Gross Takahiro Hanyu

This letter introduces a design and proof-of-concept implementation of Gabor filters based on stochastic computation for area-efficient hardware. The filter exhibits powerful image feature extraction capability, but it requires significant computational power. Using computation, sine function used in the is approximated by exploiting several tanh functions designed state machine. A realized using shaper exponential simulated compared with original that shows almost equivalent behaviour at...

10.1109/lsp.2015.2392123 article EN IEEE Signal Processing Letters 2015-01-14

Design of an STT-MTJ based true random number generator using digitally controlled probability-locked loop

OPENALEX - Publications

Satoshi Oosawa Takayuki Konishi Naoya Onizawa Takahiro Hanyu

This paper presents a design of True Random Number Generator (TRNG) using Spin Transfer Torque Magnetic Tunnel Junction (STT-MTJ) device. Since the probability STT-MTJ-based TRNG is locked digitally controlled feedback loop, sensitivity gain can be reduced greatly, which eliminates high-gain amplifier in loop. It demonstrated circuit simulator (NS-SPICE where STT-MTJ model established based on 90nm CMOS/MTJ process technologies) and MATLAB that random sequences generated from become 50%,...

10.1109/newcas.2015.7182089 article EN 2015-06-01

GPU-accelerated simulated annealing based on p-bits with real-world device-variability modeling

OPENALEX - Publications

Naoya Onizawa Takahiro Hanyu

Probabilistic computing using probabilistic bits (p-bits) presents an efficient alternative to traditional CMOS logic for complex problem-solving, including simulated annealing and machine learning. Realizing p-bits with emerging devices such as magnetic tunnel junctions introduces device variability, which was expected negatively impact computational performance. However, this study reveals unexpected finding: variability can not only degrade but also enhance algorithm performance,...

10.1038/s41598-025-90520-3 article EN cc-by-nc-nd Scientific Reports 2025-02-19

A Nonvolatile Associative Memory-Based Context-Driven Search Engine Using 90 nm CMOS/MTJ-Hybrid Logic-in-Memory Architecture

OPENALEX - Publications

Hooman Jarollahi Naoya Onizawa Vincent Gripon Noboru Sakimura Tadahiko Sugibayashi and 4 more

This paper presents algorithm, architecture, and fabrication results of a nonvolatile context-driven search engine that reduces energy consumption as well computational delay compared to classical hardware software-based approaches. The proposed architecture stores only associations between items from multiple fields in the form binary links, merges repeated field reduce memory requirements accesses. fabricated chip achieves <formula formulatype="inline"...

10.1109/jetcas.2014.2361061 article EN IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2014-10-21

Stochastic Simulated Quantum Annealing for Fast Solution of Combinatorial Optimization Problems

OPENALEX - Publications

Naoya Onizawa Ryoma Sasaki Duckgyu Shin Warren J. Gross Takahiro Hanyu

10.1109/access.2024.3431540 article EN cc-by-nc-nd IEEE Access 2024-01-01

Design of High-Throughput Fully Parallel LDPC Decoders Based on Wire Partitioning

OPENALEX - Publications

Naoya Onizawa Takahiro Hanyu Vincent Gaudet

We present a method to design high-throughput fully parallel low-density parity-check (LDPC) decoders. With our method, decoder's longest wires are divided into several short with pipeline registers. Log-likelihood ratio messages transmitted along these pipelined paths thus sent over multiple clock cycles, and the critical path delay can be reduced while maintaining comparable bit error rate performance. The number of registers inserted is estimated by using wiring information extracted from...

10.1109/tvlsi.2008.2011360 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2009-04-15

VLSI implementation of deep neural networks using integral stochastic computing

OPENALEX - Publications

Arash Ardakani François Leduc-Primeau Naoya Onizawa Takahiro Hanyu Warren J. Gross

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention since many applications require high-speed operations. However, numerous processing elements and complex interconnections are usually required, leading to a large area occupation high power consumption. Stochastic computing shown promising results for area-efficient implementations, even though existing stochastic algorithms long streams that exhibit latency. In this paper, we propose an...

10.1109/istc.2016.7593108 preprint EN 2016-09-01

In-Hardware Training Chip Based on CMOS Invertible Logic for Machine Learning

OPENALEX - Publications

Naoya Onizawa Sean C. Smithson Brett H. Meyer Warren J. Gross Takahiro Hanyu

Deep Neural Networks (DNNs) have recently shown state-of-the-art results on various applications, such as computer vision and recognition tasks. DNN inference engines can be implemented in hardware with high energy efficiency the computation realized using a low-precision fixed point or even binary precision sufficient cognition accuracies. On other hand, training DNNs well-known back-propagation algorithm requires high-precision floating-point computations CPU and/or GPU causing significant...

10.1109/tcsi.2019.2960383 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2019-12-31

Analog-to-Stochastic Converter Using Magnetic Tunnel Junction Devices for Vision Chips

OPENALEX - Publications

Naoya Onizawa Daisaku Katagiri Warren J. Gross Takahiro Hanyu

This paper introduces an analog-to-stochastic converter using a magnetic tunnel junction (MTJ) device for vision chips based on stochastic computation. Stochastic computation has been recently exploited area-efficient hardware implementation, such as low-density parity-check decoders and image processors. However, power-and-area hungry two-step (analog-to-digital digital-to-stochastic) converters are required the analog to signal conversion. To realize one-step conversion, MTJ is used it...

10.1109/tnano.2015.2511151 article EN IEEE Transactions on Nanotechnology 2015-12-22

Algorithm and Architecture for a Low-Power Content-Addressable Memory Based on Sparse Clustered Networks

OPENALEX - Publications

Hooman Jarollahi Vincent Gripon Naoya Onizawa Warren J. Gross

We propose a low-power content-addressable memory (CAM) employing new algorithm for associativity between the input tag and corresponding address of output data. The proposed architecture is based on recently developed sparse clustered network using binary connections that on-average eliminates most parallel comparisons performed during search. Therefore, dynamic energy consumption design significantly lower compared with conventional CAM design. Given an tag, computes few possibilities...

10.1109/tvlsi.2014.2316733 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2014-04-30

Memristive Computational Memory Using Memristor Overwrite Logic (MOL)

OPENALEX - Publications

Khaled Alhaj Ali Mostafa Rizk Amer Baghdadi Jean-Philippe Diguet J. Jomaah and 2 more

In this paper, we present a novel logic design style, namely memristor overwrite (MOL), associated with an original MOL-based computational memory.MOL relies on fully digital representation of and can operate different memristive device technologies.Its integration in crossbar arrays memories allows the execution bit vector-level primitive operations two steps at most.Promising features performances are demonstrated through implementation N -bit full addition using proposed memory.

10.1109/tvlsi.2020.3011522 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2020-08-06

Enhanced convergence in p-bit based simulated annealing with partial deactivation for large-scale combinatorial optimization problems

OPENALEX - Publications

Naoya Onizawa Takahiro Hanyu

This article critically investigates the limitations of simulated annealing algorithm using probabilistic bits (pSA) in solving large-scale combinatorial optimization problems. The study begins with an in-depth analysis pSA process, focusing on issues resulting from unexpected oscillations among p-bits. These hinder energy reduction Ising model and thus obstruct successful execution complex tasks. Through detailed simulations, we unravel root cause this stagnation, identifying feedback...

10.1038/s41598-024-51639-x article EN cc-by Scientific Reports 2024-01-16

Architecture and implementation of an associative memory using sparse clustered networks

OPENALEX - Publications

Hooman Jarollahi Naoya Onizawa Vincent Gripon Warren J. Gross

Associative memories are alternatives to indexed that when implemented in hardware can benefit many applications such as data mining. The classical neural network based methodology is impractical implement since order increase the size of memory, number information bits stored per memory bit (efficiency) approaches zero. In addition, length a message be and retrieved needs same nodes causing total messages capable storing (diversity) limited. Recently, novel algorithm on sparse clustered...

10.1109/iscas.2012.6271922 article EN 1993 IEEE International Symposium on Circuits and Systems 2012-05-01

High-Throughput Compact Delay-Insensitive Asynchronous NoC Router

OPENALEX - Publications

Naoya Onizawa Atsushi Matsumoto Tomoyoshi Funazaki Takahiro Hanyu

A new asynchronous delay-insensitive data-transmission method based on level-encoded dual-rail (LEDR) encoding with novel packet-structure restriction is proposed to realize a high-throughput network-on-chip (NoC) router together compact hardware. The use of LEDR makes communication steps and the registers being used half in comparison four-phase because spacer information one eliminated, which significantly improves network throughput. By using packet structure, phase header tail flits...

10.1109/tc.2013.81 article EN IEEE Transactions on Computers 2013-04-08

A Design Framework for Invertible Logic

OPENALEX - Publications

Naoya Onizawa Kaito Nishino Sean C. Smithson Brett H. Meyer Warren J. Gross and 3 more

Invertible logic using a probabilistic magnetoresistive device model has been recently presented that can compute functions in bidirectional ways and solve several problems quickly, such as factorization combinational optimization. In this article, we present design framework for invertible circuits. Our approach makes use of linear programming to create Hamiltonian library with the minimum number nodes small invertible-logic functions. addition, is approximated based on stochastic computing...

10.1109/tcad.2020.3003906 article EN cc-by IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2020-06-22

Fast-Converging Simulated Annealing for Ising Models Based on Integral Stochastic Computing

OPENALEX - Publications

Naoya Onizawa Kota Katsuki Duckgyu Shin Warren J. Gross Takahiro Hanyu

Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising models. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The implementation approximates p-bit function, which can search solution to combinatorial optimization problem at lower energy than conventional p-bits. Searching around global minimum increase probability finding solution. proposed...

10.1109/tnnls.2022.3159713 article EN cc-by IEEE Transactions on Neural Networks and Learning Systems 2022-03-28

High-Throughput Low-Energy Content-Addressable Memory Based on Self-Timed Overlapped Search Mechanism

OPENALEX - Publications

Naoya Onizawa Shoun Matsunaga Vincent Gaudet Takahiro Hanyu

This paper introduces a self-timed overlapped search mechanism for high-throughput content-addressable memories (CAMs) with low energy. Most mismatches can be found by searching the first few bits in word. Consequently, if word circuit is divided into two sections that are sequentially searched, most match lines second section unused. As faster than an entire word, we could potentially increase throughput initiating second-stage on unused as soon first-stage complete. The realized using...

10.1109/async.2012.25 article EN 2012-05-01

High-Throughput Low-Energy Self-Timed CAM Based on Reordered Overlapped Search Mechanism

OPENALEX - Publications

Naoya Onizawa Shoun Matsunaga Vincent Gaudet Warren J. Gross Takahiro Hanyu

This paper introduces a reordered overlapped search mechanism for high-throughput low-energy content-addressable memories (CAMs). Most mismatches can be found by searching few bits of word. To lower power dissipation, word circuit is often divided into two sections that are sequentially searched or even pipelined. Because this process, most match lines in the second section unused. Since last very fast compared to rest bits, we propose increase throughput asynchronously initiating...

10.1109/tcsi.2013.2283997 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2013-10-21

High-Throughput/Low-Energy MTJ-Based True Random Number Generator Using a Multi-Voltage/Current Converter

OPENALEX - Publications

Naoya Onizawa Shogo Mukaida Akira Tamakoshi Hitoshi Yamagata Hiroyuki Fujita and 1 more

This article introduces high-throughput/low-energy true random number generators (TRNGs) based on CMOS and three-terminal magnetic tunnel junction (MTJ) devices. MTJs are fast probabilistic switching devices, which can be used as sources for TRNGs. However, the probability is quite sensitive to write current given MTJs, precise closed-loop control necessary. Thus, a high-complexity circuit required, such high precision digital-to-analog converters (DACs), occupying large area causing energy...

10.1109/tvlsi.2020.3005413 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2020-07-22

Low-Energy Asynchronous Interleaver for Clockless Fully Parallel LDPC Decoding

OPENALEX - Publications

Naoya Onizawa Vincent Gaudet Takahiro Hanyu

This paper presents a low-energy asynchronous interleaver for clockless fully parallel low-density parity-check (LDPC) decoding. The proposed data-transmission circuit based on half-duplex single-track protocol makes it possible to realize wire-efficient with small energy consumption. Moreover, data-monitoring system adaptively shuts down the if not necessary, which reduces number of data transmissions and, hence, consumed. decoder is evaluated using (1056,528) irregular LDPC code under...

10.1109/tcsi.2011.2107271 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2011-02-16

A low-power Content-Addressable Memory based on clustered-sparse networks

OPENALEX - Publications

Hooman Jarollahi Vincent Gripon Naoya Onizawa Warren J. Gross

A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and corresponding address of output data. The proposed architecture based on recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most parallel comparisons performed during search. Therefore, dynamic energy consumption design significantly lower compared to conventional CAM design. Given an tag, computes few...

10.1109/asap.2013.6567594 preprint EN 2013-06-01

Coming Soon ...