- Advanced Memory and Neural Computing
- Low-power high-performance VLSI design
- Error Correcting Code Techniques
- Network Packet Processing and Optimization
- Interconnection Networks and Systems
- Quantum Computing Algorithms and Architecture
- Ferroelectric and Negative Capacitance Devices
- Neural Networks and Applications
- Advanced Wireless Communication Techniques
- Parallel Computing and Optimization Techniques
- Analog and Mixed-Signal Circuit Design
- Caching and Content Delivery
- Semiconductor materials and devices
- Advanced Adaptive Filtering Techniques
- Cooperative Communication and Network Coding
- VLSI and Analog Circuit Testing
- Radiation Effects in Electronics
- Advancements in Semiconductor Devices and Circuit Design
- Quantum-Dot Cellular Automata
- CCD and CMOS Imaging Sensors
- Stochastic Gradient Optimization Techniques
- Quantum and electron transport phenomena
- Cellular Automata and Applications
- Energy Harvesting in Wireless Networks
- Neural dynamics and brain function
Tohoku University
2016-2025
Tohoku University Hospital
2019
University of Novi Sad
2018
NEC (Japan)
2014
McGill University
2012-2013
University of Waterloo
2012-2013
The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a implementation. However, numerous elements and complex interconnections are usually required, leading to large area occupation copious power consumption. Stochastic computing (SC) shown promising results for low-power area-efficient implementations, even though existing stochastic algorithms long streams cause latencies. In...
Nonvolatile spintronic devices have potential advantages, such as fast read/write and high endurance together with back-end-of-the-line compatibility, which offers the possibility of constructing not only stand-alone RAMs embedded that can be used in conventional VLSI circuits systems but also standby-power-free high-performance nonvolatile CMOS logic employing logic-in-memory architecture. The advantages devices, especially magnetic tunnel junction (MTJ) circuits, are discussed, current...
Invertible logic can operate in one of two modes: 1) a forward mode, which inputs are presented and single, correct output is produced, 2) reverse the fixed take on values consistent with output. It possible to create invertible using various Boltzmann machine configurations. Such systems have been shown solve certain challenging problems quickly, such as factorization combinatorial optimization. In this paper, we show that be implemented simple spiking neural networks based stochastic...
Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA increases rapidly, as size problem grows. Recently, stochastic simulated (SSA) that converges faster than conventional has been reported. In this paper, we present hardware-aware SSA (HA-SSA) memory-efficient FPGA implementations. HA-SSA can reduce memory usage storing intermediate results while maintaining computing speed SSA. For evaluation purposes,...
This letter introduces a design and proof-of-concept implementation of Gabor filters based on stochastic computation for area-efficient hardware. The filter exhibits powerful image feature extraction capability, but it requires significant computational power. Using computation, sine function used in the is approximated by exploiting several tanh functions designed state machine. A realized using shaper exponential simulated compared with original that shows almost equivalent behaviour at...
This paper presents a design of True Random Number Generator (TRNG) using Spin Transfer Torque Magnetic Tunnel Junction (STT-MTJ) device. Since the probability STT-MTJ-based TRNG is locked digitally controlled feedback loop, sensitivity gain can be reduced greatly, which eliminates high-gain amplifier in loop. It demonstrated circuit simulator (NS-SPICE where STT-MTJ model established based on 90nm CMOS/MTJ process technologies) and MATLAB that random sequences generated from become 50%,...
Probabilistic computing using probabilistic bits (p-bits) presents an efficient alternative to traditional CMOS logic for complex problem-solving, including simulated annealing and machine learning. Realizing p-bits with emerging devices such as magnetic tunnel junctions introduces device variability, which was expected negatively impact computational performance. However, this study reveals unexpected finding: variability can not only degrade but also enhance algorithm performance,...
This paper presents algorithm, architecture, and fabrication results of a nonvolatile context-driven search engine that reduces energy consumption as well computational delay compared to classical hardware software-based approaches. The proposed architecture stores only associations between items from multiple fields in the form binary links, merges repeated field reduce memory requirements accesses. fabricated chip achieves <formula formulatype="inline"...
We present a method to design high-throughput fully parallel low-density parity-check (LDPC) decoders. With our method, decoder's longest wires are divided into several short with pipeline registers. Log-likelihood ratio messages transmitted along these pipelined paths thus sent over multiple clock cycles, and the critical path delay can be reduced while maintaining comparable bit error rate performance. The number of registers inserted is estimated by using wiring information extracted from...
The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention since many applications require high-speed operations. However, numerous processing elements and complex interconnections are usually required, leading to a large area occupation high power consumption. Stochastic computing shown promising results for area-efficient implementations, even though existing stochastic algorithms long streams that exhibit latency. In this paper, we propose an...
Deep Neural Networks (DNNs) have recently shown state-of-the-art results on various applications, such as computer vision and recognition tasks. DNN inference engines can be implemented in hardware with high energy efficiency the computation realized using a low-precision fixed point or even binary precision sufficient cognition accuracies. On other hand, training DNNs well-known back-propagation algorithm requires high-precision floating-point computations CPU and/or GPU causing significant...
This paper introduces an analog-to-stochastic converter using a magnetic tunnel junction (MTJ) device for vision chips based on stochastic computation. Stochastic computation has been recently exploited area-efficient hardware implementation, such as low-density parity-check decoders and image processors. However, power-and-area hungry two-step (analog-to-digital digital-to-stochastic) converters are required the analog to signal conversion. To realize one-step conversion, MTJ is used it...
We propose a low-power content-addressable memory (CAM) employing new algorithm for associativity between the input tag and corresponding address of output data. The proposed architecture is based on recently developed sparse clustered network using binary connections that on-average eliminates most parallel comparisons performed during search. Therefore, dynamic energy consumption design significantly lower compared with conventional CAM design. Given an tag, computes few possibilities...
In this paper, we present a novel logic design style, namely memristor overwrite (MOL), associated with an original MOL-based computational memory.MOL relies on fully digital representation of and can operate different memristive device technologies.Its integration in crossbar arrays memories allows the execution bit vector-level primitive operations two steps at most.Promising features performances are demonstrated through implementation N -bit full addition using proposed memory.
This article critically investigates the limitations of simulated annealing algorithm using probabilistic bits (pSA) in solving large-scale combinatorial optimization problems. The study begins with an in-depth analysis pSA process, focusing on issues resulting from unexpected oscillations among p-bits. These hinder energy reduction Ising model and thus obstruct successful execution complex tasks. Through detailed simulations, we unravel root cause this stagnation, identifying feedback...
Associative memories are alternatives to indexed that when implemented in hardware can benefit many applications such as data mining. The classical neural network based methodology is impractical implement since order increase the size of memory, number information bits stored per memory bit (efficiency) approaches zero. In addition, length a message be and retrieved needs same nodes causing total messages capable storing (diversity) limited. Recently, novel algorithm on sparse clustered...
A new asynchronous delay-insensitive data-transmission method based on level-encoded dual-rail (LEDR) encoding with novel packet-structure restriction is proposed to realize a high-throughput network-on-chip (NoC) router together compact hardware. The use of LEDR makes communication steps and the registers being used half in comparison four-phase because spacer information one eliminated, which significantly improves network throughput. By using packet structure, phase header tail flits...
Invertible logic using a probabilistic magnetoresistive device model has been recently presented that can compute functions in bidirectional ways and solve several problems quickly, such as factorization combinational optimization. In this article, we present design framework for invertible circuits. Our approach makes use of linear programming to create Hamiltonian library with the minimum number nodes small invertible-logic functions. addition, is approximated based on stochastic computing...
Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising models. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The implementation approximates p-bit function, which can search solution to combinatorial optimization problem at lower energy than conventional p-bits. Searching around global minimum increase probability finding solution. proposed...
This paper introduces a self-timed overlapped search mechanism for high-throughput content-addressable memories (CAMs) with low energy. Most mismatches can be found by searching the first few bits in word. Consequently, if word circuit is divided into two sections that are sequentially searched, most match lines second section unused. As faster than an entire word, we could potentially increase throughput initiating second-stage on unused as soon first-stage complete. The realized using...
This paper introduces a reordered overlapped search mechanism for high-throughput low-energy content-addressable memories (CAMs). Most mismatches can be found by searching few bits of word. To lower power dissipation, word circuit is often divided into two sections that are sequentially searched or even pipelined. Because this process, most match lines in the second section unused. Since last very fast compared to rest bits, we propose increase throughput asynchronously initiating...
This article introduces high-throughput/low-energy true random number generators (TRNGs) based on CMOS and three-terminal magnetic tunnel junction (MTJ) devices. MTJs are fast probabilistic switching devices, which can be used as sources for TRNGs. However, the probability is quite sensitive to write current given MTJs, precise closed-loop control necessary. Thus, a high-complexity circuit required, such high precision digital-to-analog converters (DACs), occupying large area causing energy...
This paper presents a low-energy asynchronous interleaver for clockless fully parallel low-density parity-check (LDPC) decoding. The proposed data-transmission circuit based on half-duplex single-track protocol makes it possible to realize wire-efficient with small energy consumption. Moreover, data-monitoring system adaptively shuts down the if not necessary, which reduces number of data transmissions and, hence, consumed. decoder is evaluated using (1056,528) irregular LDPC code under...
A low-power Content-Addressable-Memory (CAM) is introduced employing a new mechanism for associativity between the input tags and corresponding address of output data. The proposed architecture based on recently developed clustered-sparse-network using binary-weighted connections that on-average will eliminate most parallel comparisons performed during search. Therefore, dynamic energy consumption design significantly lower compared to conventional CAM design. Given an tag, computes few...