- Low-power high-performance VLSI design
- VLSI and FPGA Design Techniques
- Embedded Systems Design Techniques
- Radiation Effects in Electronics
- Advancements in Semiconductor Devices and Circuit Design
- Cryptographic Implementations and Security
- Semiconductor materials and devices
- VLSI and Analog Circuit Testing
- Parallel Computing and Optimization Techniques
- Quantum-Dot Cellular Automata
- Interconnection Networks and Systems
- Cryptography and Residue Arithmetic
- Non-Invasive Vital Sign Monitoring
- Quantum Computing Algorithms and Architecture
- Multilevel Inverters and Converters
- Advanced Memory and Neural Computing
- Tryptophan and brain disorders
- Photonic and Optical Devices
- Photovoltaic System Optimization Techniques
- Image and Signal Denoising Methods
- Advanced Fluorescence Microscopy Techniques
- Cryptography and Data Security
- Time Series Analysis and Forecasting
- Analog and Mixed-Signal Circuit Design
- Coding theory and cryptography
Jane Street (United States)
2024
The Jane Goodall Institute
2024
Xilinx (United States)
2016
The University of Tokyo
2009-2013
Tokyo University of Information Sciences
2010-2013
Bunkyo University
2009
Massey University
2007
Queen's University Belfast
2002
This paper presents enhancements to the Xilinx UltraScale+ clocking architecture support fine-grain time-borrowing. Time borrowing improves performance by redistributing timing slack between fast and slow paths. The Ultra-Scale+ introduces programmable hardware delays pulse generators embedded in tree time-borrowing based both on clock skew scheduling pulsed latches. allows from a few picoseconds multiple nanoseconds sequential pipeline stages without any changes RTL, placement or routing....
A 65 nm self-synchronous field programmable gate array (SSFPGA) with delay insensitive operation and pipeline granularity at the level, is shown to be robust process voltage temperature (PVT) variations. The proposed SSFPGA employs a 38 × of four-input, three-stage configurable logic blocks, introduction new dual tree-divider three-pipeline stage LUT achieve 2.97 GHz throughput 1.2 V. Correct measured 500 mVp-p, 1.12 externally introduced power supply noise V supply, equivalent 42% bounce....
This paper presents a split CPU-FPGA Multi-Scalar Multiplication (MSM) engine written in Hardcaml. Hardcaml MSM was submitted to the 2022 ZPrize cryptography competition and won 1st place FPGA track. targets BLS12-377 elliptic curve is currently lowest-latency implementation utilizing FPGAs published. For of order 2^26 we achieve single-round latency 5.518s average power 52W, with our design running at 278MHz. When performing multiple rounds same base points but random scalars, are able...
Hardcaml is an embedded hardware design domain specific language (DSL) implemented in the OCaml programming language. Unlike high level synthesis (HLS), allows for low control of underlying maximum productivity, while abstracting away many tedious aspects traditional definition languages (HDLs) such as Verilog or VHDL. The richness OCaml's type system combined with Hardcaml's fast circuit elaboration checks reduces chances user-introduced bugs and erroneous connections features like custom...
In this paper we present the design of a low-cost system that can be used to monitor physiological parameters, such as temperature and heart rate, human subject. The consists an electronic device which is worn on wrist finger, by elderly or at-risk person. Using several sensors measure different vital signs, person wirelessly monitored within his own home. An impact sensor has been detect falls. detects if medically distressed sends alarm receiver unit connected computer. This sets off...
A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of signalling allows the FPGA to operate at voltages down 370mV without any parameter tuning. We show both 2.6× total reduction and 6.4× performance improvement same time compared non-power gated SSFPGA, latest research 1.8× in power-delay product (PDP) 2× improvement. When a similar process we...
We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A LUT (SSLUT) consists of three input tree-type structure 8bits SRAM programming. switch box (SSSB) both pass transistors buffers route signals, 12bits SRAM. One common block one SSLUT SSSB occupies 2.2Mλ2 area 35bits SRAM, the prototype SSFPGA 34 ×...
A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of signalling allows the FPGA to operate at voltages down 370mV without any parameter tuning. We show both 2.6× total reduction and 6.4× performance improvement same time compared non-power gated SSFPGA, latest research 1.8× in power-delay product (PDP) 2× improvement. When a similar process we...
We propose a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to eliminate pre-charge time for dynamic logic. A LUT (SSLUT) consists of three input tree-type structure 8 bit SRAM programming. switch box (SSSB) both pass transistors and buffers. One common block one SSLUT SSSB occupies 2.2λ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> area the prototype SSFPGA 34x30 (1020) blocks is designed...
A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of signaling allows the FPGA to operate at voltages down 370mV without any parameter tuning. We show both 2.6x total reduction and 6.4x performance improvement same time compared non-power gated SSFPGA, latest research 1.8x in power-delay product (PDP) 2x improvement. When a similar process we...
We have designed and measured completely self-synchronous 1024-bit RSA crypt-engine, fabricated in 40nm CMOS. implemented two modular exponentiation algorithms, the high-to-low(HTL) Montgomery power ladder(MPL) order to show performance of self-synchronous, gate-level pipelined architectures. Both implementations employ identical data-paths take 804k transistors, with only difference controller, interleaved 1024b cryptographic operations from 6.1ms 3.1ms for HTL 6.0ms MPL, at nominal supply 1.1V.
We propose a Montgomery multiplier composed of gate-level self synchronous processing elements (SS-PE) that can be used to create scalable-length modular multipliers with no broadcast signals for high throughput. A 40nm test circuit shows the SS-PE operates from 0.4V 1.3V at 20°C without tuning, 2.1 Gb/s data-throughput, 476ps delay 1.1V, and energy per operation 322fJ/op 1.1V 1.40fJ/op voltage scaling 0.4V. 8-bit RSA is implemented using SS-PEs, 186Mb/s 130ns data-throughput time...
We have previously presented a process variation robust self synchronous FPGA that uses dual pipelines (DP) for high throughput 3GHz operation. As technology shrinks, the importance of not only robust, but error systems increases. In this paper, we analyze DP robustness to single-event-upsets and propose several gate-level architectures implement detection correction, autonomous disabling faulty pipeline-stages, programmable time-interleaved redundancy, showing are very promising candidate...
We present a low overhead technique that can be used to offset both large systematic and random process delay variation in the near-threshold voltage operation region. an analysis of this applied 65nm CMOS self synchronous FPGA is capable from 2.0V 0.37V. By using dual supplies, we gate-level pipeline stages show variation, achieve energy savings per up 102x for 200 stage pipeline.
In this paper we present an autonomous watchdog circuit for error robustness which can detect logic errors caused by power supply noises and soft errors, with the smallest overheads compared to current research. The proposed is realized dual-pipeline self synchronous system, without need duplicate logic. prevents propagation through chain, are successfully detected. Error tolerance bounce measured at 67% 1.2V. Circuit size energy-per-operation increased 6.9% 16% respectivley case of a 65nm FPGA.
We have designed and measured the performance against power supply bounce aging of a Self Synchronous FPGA (SSFPGA) in 65nm CMOS which achieves 2.97GHz throughput at 1.2V. The proposed SSFPGA employs 38×38 array 4-input, 3-stage Configurable Logic Blocks (SSCLB), with introduction new dual tree-divider 4 input LUT to achieve 4.5× improvement over our previous model [1]. Energy was 3.23 pJ/block/cycle using custom built board. for accelerated degradation results show has 8% longer time margin...
We have designed and measured the performance against power supply bounce aging of a Self Synchronous FPGA (SSFPGA) in 65nm CMOS which achieves 2.97GHz throughput at 1.2V. The proposed SSFPGA employs 38×38 array 4-input, 3-stage Configurable Logic Blocks (SSCLB), with introduction new dual tree-divider 4 input LUT to achieve 4.5× improvement over our previous model. Energy was 3.23 pJ/block/cycle using custom built board. for accelerated degradation results show has 8% longer time margin...
Conventional methods for video motion compensation are usually based on the movement of blocks pixels in spatial domain. This can give rise to "block" type artifacts, that undesirable professional television broadcasting applications. An attractive alternative is perform frequency domain using phase correlation. paper describes a 64-point Fourier transform chip performs forward or inverse, complex two's complement data supplied at rate 13.5 MHz (one real 16 b word and one imaginary word,...
The unstable/unpredictable LSI operation caused by the PVT (Process Voltage Parameter) variations, along with aging effect such as NBTI/PBTI, is one of serious issues in current and future scaled LSIs. In these situations, where environments field are hard to predict at stages circuit design test, conventional approach margin-based test synchronous architecture has pay a large amount penalty speed order guarantee safe operation. Especially from view point delay fault, unpredictable variation...
We report on the throughput optimization of a self synchronous FPGA (SSFPGA) using benchmark circuits. find that dual pipeline architecture we are able to convert designs from verilog onto our SSFPGA, and by alignment techniques match depth can perform at maximum throughput. demonstrate 0 56.1 times improvement with techniques. The is carried out within number logic elements in array buffers switching matrix.
The reliable operation against PVT (process, voltage, and temperature) variation aging effects has been measured of a Gate-Level Pipelined Self Synchronous FPGA (SSFPGA) design in 65nm CMOS. SSFPGA employs 38×38 array 4-input, 3-stage Configurable Logic blocks. Throughput at 2.97GHz 1.2V, with correct from 750mV to 1.6V 25°C. errors being inserted into the was compared conventional synchronous FPGA, which showed had 4.2 times error free operation. effect also on using accelerated cycle...
In this paper we show that self synchronous circuits can provide robust operation in both soft error prone and low voltage operating environments. Self are shown to be checking, where a will either cause detectable or halt of the circuit. A watchdog circuit is proposed autonomously detect dual-rail ‘11’ errors prevent propagation, with measurements 65nm CMOS showing seamless from 1.6V 0.37V. Compared system without size energy-per-operation increased 6.9% 16% respectively, while tolerance...
This paper introduces Hardcaml, an embedded hardware design domain specific language (DSL) implemented in the OCaml programming language. Unlike high level synthesis (HLS), Hardcaml allows for low control of underlying maximum productivity, while abstracting away many tedious aspects traditional definition languages (HDLs) such as Verilog or VHDL. The richness OCaml's type system combined with Hardcaml's fast circuit elaboration checks reduces chance user-introduced bugs and erroneous...