NFDI4DS | UHH-SEMS - Publication Details

Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs

OPENALEX - Publications

Ilya Ganusov Benjamin Devlin

This paper presents enhancements to the Xilinx UltraScale+ clocking architecture support fine-grain time-borrowing. Time borrowing improves performance by redistributing timing slack between fast and slow paths. The Ultra-Scale+ introduces programmable hardware delays pulse generators embedded in tree time-borrowing based both on clock skew scheduling pulsed latches. allows from a few picoseconds multiple nanoseconds sequential pipeline stages without any changes RTL, placement or routing....

10.1109/fpl.2016.7577343 article EN 2016-08-01

A 65 nm Gate-Level Pipelined Self-Synchronous FPGA for High Performance and Variation Robust Operation

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

A 65 nm self-synchronous field programmable gate array (SSFPGA) with delay insensitive operation and pipeline granularity at the level, is shown to be robust process voltage temperature (PVT) variations. The proposed SSFPGA employs a 38 × of four-input, three-stage configurable logic blocks, introduction new dual tree-divider three-pipeline stage LUT achieve 2.97 GHz throughput 1.2 V. Correct measured 500 mVp-p, 1.12 externally introduced power supply noise V supply, equivalent 42% bounce....

10.1109/jssc.2011.2164024 article EN IEEE Journal of Solid-State Circuits 2011-08-25

Hardcaml MSM: A High-Performance Split CPU-FPGA Multi-Scalar Multiplication Engine

OPENALEX - Publications

A.K. Ray Benjamin Devlin Fu Yong Quah Rahul Yesantharao

This paper presents a split CPU-FPGA Multi-Scalar Multiplication (MSM) engine written in Hardcaml. Hardcaml MSM was submitted to the 2022 ZPrize cryptography competition and won 1st place FPGA track. targets BLS12-377 elliptic curve is currently lowest-latency implementation utilizing FPGAs published. For of order 2^26 we achieve single-round latency 5.518s average power 52W, with our design running at 278MHz. When performing multiple rounds same base points but random scalars, are able...

10.1145/3626202.3637577 article EN cc-by 2024-04-01

Hardcaml: An OCaml Hardware Domain-Specific Language for Efficient and Robust Design

OPENALEX - Publications

A.K. Ray Benjamin Devlin Fu Yong Quah Rahul Yesantharao

Hardcaml is an embedded hardware design domain specific language (DSL) implemented in the OCaml programming language. Unlike high level synthesis (HLS), allows for low control of underlying maximum productivity, while abstracting away many tedious aspects traditional definition languages (HDLs) such as Verilog or VHDL. The richness OCaml's type system combined with Hardcaml's fast circuit elaboration checks reduces chances user-introduced bugs and erroneous connections features like custom...

10.1145/3626202.3637586 article EN 2024-04-01

Design of a Low-cost Physiological Parameter Measurement and Monitoring Device

OPENALEX - Publications

Gourab Sen Gupta Subhas Chandra Mukhopadhyay Benjamin Devlin Serge Demidenko

In this paper we present the design of a low-cost system that can be used to monitor physiological parameters, such as temperature and heart rate, human subject. The consists an electronic device which is worn on wrist finger, by elderly or at-risk person. Using several sensors measure different vital signs, person wirelessly monitored within his own home. An impact sensor has been detect falls. detects if medically distressed sends alarm receiver unit connected computer. This sets off...

10.1109/imtc.2007.378997 article EN Conference proceedings - IEEE Instrumentation/Measurement Technology Conference 2007-05-01

Energy minimum operation in a reconfigurable gate-level pipelined and power-gated self synchronous FPGA

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of signalling allows the FPGA to operate at voltages down 370mV without any parameter tuning. We show both 2.6× total reduction and 6.4× performance improvement same time compared non-power gated SSFPGA, latest research 1.8× in power-delay product (PDP) 2× improvement. When a similar process we...

10.1109/islped.2011.5993594 article EN 2011-08-01

A Low Power and High Throughput Self Synchronous FPGA Using 65 nm CMOS with Throughput Optimization by Pipeline Alignment

OPENALEX - Publications

Benjamin Devlin Toru Nakura Makoto Ikeda Kunihiro Asada

We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A LUT (SSLUT) consists of three input tree-type structure 8bits SRAM programming. switch box (SSSB) both pass transistors buffers route signals, 12bits SRAM. One common block one SSLUT SSSB occupies 2.2Mλ2 area 35bits SRAM, the prototype SSFPGA 34 ×...

10.1587/transfun.e93.a.1319 article EN IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences 2010-01-01

Energy minimum operation in a reconfigurable gate-level pipelined and power-gated self synchronous FPGA

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of signalling allows the FPGA to operate at voltages down 370mV without any parameter tuning. We show both 2.6× total reduction and 6.4× performance improvement same time compared non-power gated SSFPGA, latest research 1.8× in power-delay product (PDP) 2× improvement. When a similar process we...

10.5555/2016802.2016806 article EN 2011-08-01

647 MHz, 0.642pJ/block/cycle 65nm self synchronous FPGA

OPENALEX - Publications

Benjamin Devlin MyeongGyu Jeong Toru Nakura Makoto Ikeda Kunihiro Asada

We propose a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to eliminate pre-charge time for dynamic logic. A LUT (SSLUT) consists of three input tree-type structure 8 bit SRAM programming. switch box (SSSB) both pass transistors and buffers. One common block one SSLUT SSSB occupies 2.2λ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> area the prototype SSFPGA 34x30 (1020) blocks is designed...

10.1109/esscirc.2009.5326010 article EN Proceedings of ESSCIRC 2009-09-01

Energy Minimum Operation with Self Synchronous Gate-Level Autonomous Power Gating and Voltage Scaling

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

A 65nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of signaling allows the FPGA to operate at voltages down 370mV without any parameter tuning. We show both 2.6x total reduction and 6.4x performance improvement same time compared non-power gated SSFPGA, latest research 1.8x in power-delay product (PDP) 2x improvement. When a similar process we...

10.1587/transele.e95.c.546 article EN IEICE Transactions on Electronics 2012-01-01

Completely self-synchronous 1024-bit RSA crypt-engine in 40nm CMOS

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Hiroshi Ueki Kazuhiko Fukushima

We have designed and measured completely self-synchronous 1024-bit RSA crypt-engine, fabricated in 40nm CMOS. implemented two modular exponentiation algorithms, the high-to-low(HTL) Montgomery power ladder(MPL) order to show performance of self-synchronous, gate-level pipelined architectures. Both implementations employ identical data-paths take 804k transistors, with only difference controller, interleaved 1024b cryptographic operations from 6.1ms 3.1ms for HTL 6.0ms MPL, at nominal supply 1.1V.

10.1109/asscc.2013.6691044 article EN 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC) 2013-11-01

Performance and side-channel attack analysis of a self synchronous montgomery multiplier processing element for RSA in 40nm CMOS

OPENALEX - Publications

Benjamin Devlin Hiroshi Ueki Shintaro Mori Shigenori Miyauchi Makoto Ikeda and 1 more

We propose a Montgomery multiplier composed of gate-level self synchronous processing elements (SS-PE) that can be used to create scalable-length modular multipliers with no broadcast signals for high throughput. A 40nm test circuit shows the SS-PE operates from 0.4V 1.3V at 20°C without tuning, 2.1 Gb/s data-throughput, 476ps delay 1.1V, and energy per operation 322fJ/op 1.1V 1.40fJ/op voltage scaling 0.4V. 8-bit RSA is implemented using SS-PEs, 186Mb/s 130ns data-throughput time...

10.1109/asscc.2012.6570807 article EN 2022 IEEE Asian Solid-State Circuits Conference (A-SSCC) 2012-12-01

Self Synchronous Circuits for Error Robust Operation in Sub-100nm Processes

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

We have previously presented a process variation robust self synchronous FPGA that uses dual pipelines (DP) for high throughput 3GHz operation. As technology shrinks, the importance of not only robust, but error systems increases. In this paper, we analyze DP robustness to single-event-upsets and propose several gate-level architectures implement detection correction, autonomous disabling faulty pipeline-stages, programmable time-interleaved redundancy, showing are very promising candidate...

10.1109/async.2012.13 article EN 2012-05-01

Gate-level process variation offset technique by using dual voltage supplies to achieve near-threshold energy efficient operation

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

We present a low overhead technique that can be used to offset both large systematic and random process delay variation in the near-threshold voltage operation region. an analysis of this applied 65nm CMOS self synchronous FPGA is capable from 2.0V 0.37V. By using dual supplies, we gate-level pipeline stages show variation, achieve energy savings per up 102x for 200 stage pipeline.

10.1109/coolchips.2012.6216585 article EN 2012-04-01

Gate-level autonomous watchdog circuit for error robustness based on a 65nm self synchronous system

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

In this paper we present an autonomous watchdog circuit for error robustness which can detect logic errors caused by power supply noises and soft errors, with the smallest overheads compared to current research. The proposed is realized dual-pipeline self synchronous system, without need duplicate logic. prevents propagation through chain, are successfully detected. Error tolerance bounce measured at 67% 1.2V. Circuit size energy-per-operation increased 6.9% 16% respectivley case of a 65nm FPGA.

10.1109/icecs.2011.6122212 article EN 2011-12-01

A gate-level pipelined 2.97GHz self synchronous FPGA in 65nm CMOS

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

We have designed and measured the performance against power supply bounce aging of a Self Synchronous FPGA (SSFPGA) in 65nm CMOS which achieves 2.97GHz throughput at 1.2V. The proposed SSFPGA employs 38×38 array 4-input, 3-stage Configurable Logic Blocks (SSCLB), with introduction new dual tree-divider 4 input LUT to achieve 4.5× improvement over our previous model [1]. Energy was 3.23 pJ/block/cycle using custom built board. for accelerated degradation results show has 8% longer time margin...

10.5555/1950815.1950829 article EN Asia and South Pacific Design Automation Conference 2011-01-25

A gate-level pipelined 2.97GHz Self Synchronous FPGA in 65nm CMOS

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

We have designed and measured the performance against power supply bounce aging of a Self Synchronous FPGA (SSFPGA) in 65nm CMOS which achieves 2.97GHz throughput at 1.2V. The proposed SSFPGA employs 38×38 array 4-input, 3-stage Configurable Logic Blocks (SSCLB), with introduction new dual tree-divider 4 input LUT to achieve 4.5× improvement over our previous model. Energy was 3.23 pJ/block/cycle using custom built board. for accelerated degradation results show has 8% longer time margin...

10.1109/aspdac.2011.5722288 article EN 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) 2011-01-01

A 64-point Fourier transform chip for digital television applications

OPENALEX - Publications

J.V. McCanny Roger Woods Chen Hui Tiong Jui Ding Benjamin Devlin

Conventional methods for video motion compensation are usually based on the movement of blocks pixels in spatial domain. This can give rise to "block" type artifacts, that undesirable professional television broadcasting applications. An attractive alternative is perform frequency domain using phase correlation. paper describes a 64-point Fourier transform chip performs forward or inverse, complex two's complement data supplied at rate 13.5 MHz (one real 16 b word and one imaginary word,...

10.1109/isscc.1996.488592 article EN 2002-12-23

Self-Synchrounous Circuits with Completion/Error Detection as a Candidate of Future LSI Resilient for PVT Variations and Aging

OPENALEX - Publications

Kunihiro Asada Makoto Ikeda Benjamin Devlin Tomohiro Sogabe

The unstable/unpredictable LSI operation caused by the PVT (Process Voltage Parameter) variations, along with aging effect such as NBTI/PBTI, is one of serious issues in current and future scaled LSIs. In these situations, where environments field are hard to predict at stages circuit design test, conventional approach margin-based test synchronous architecture has pay a large amount penalty speed order guarantee safe operation. Especially from view point delay fault, unpredictable variation...

10.1109/dft.2010.61 article EN 2010-10-01

Throughput optimization by pipeline alignment of a Self Synchronous FPGA

OPENALEX - Publications

Benjamin Devlin Toru Nakura Makoto Ikeda Kunihiro Asada

We report on the throughput optimization of a self synchronous FPGA (SSFPGA) using benchmark circuits. find that dual pipeline architecture we are able to convert designs from verilog onto our SSFPGA, and by alignment techniques match depth can perform at maximum throughput. demonstrate 0 56.1 times improvement with techniques. The is carried out within number logic elements in array buffers switching matrix.

10.1109/fpt.2009.5377670 article EN 2009-12-01

Evaluation on the reliable operation of a Gate-Level Pipelined Self Synchronous system against PVT and aging

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

The reliable operation against PVT (process, voltage, and temperature) variation aging effects has been measured of a Gate-Level Pipelined Self Synchronous FPGA (SSFPGA) design in 65nm CMOS. SSFPGA employs 38×38 array 4-input, 3-stage Configurable Logic blocks. Throughput at 2.97GHz 1.2V, with correct from 750mV to 1.6V 25°C. errors being inserted into the was compared conventional synchronous FPGA, which showed had 4.2 times error free operation. effect also on using accelerated cycle...

10.1109/iirw.2010.5706500 article EN IEEE International Integrated Reliability Workshop final report 2010-10-01

Self Synchronous Circuits for Robust Operation in Low Voltage and Soft Error Prone Environments

OPENALEX - Publications

Benjamin Devlin Makoto Ikeda Kunihiro Asada

In this paper we show that self synchronous circuits can provide robust operation in both soft error prone and low voltage operating environments. Self are shown to be checking, where a will either cause detectable or halt of the circuit. A watchdog circuit is proposed autonomously detect dual-rail ‘11’ errors prevent propagation, with measurements 65nm CMOS showing seamless from 1.6V 0.37V. Compared system without size energy-per-operation increased 6.9% 16% respectively, while tolerance...

10.1587/transele.e96.c.518 article EN IEICE Transactions on Electronics 2013-01-01

Hardcaml: An OCaml Hardware Domain-Specific Language for Efficient and Robust Design

OPENALEX - Publications

Andy Ray Benjamin Devlin Fu Yong Quah Rahul Yesantharao

This paper introduces Hardcaml, an embedded hardware design domain specific language (DSL) implemented in the OCaml programming language. Unlike high level synthesis (HLS), Hardcaml allows for low control of underlying maximum productivity, while abstracting away many tedious aspects traditional definition languages (HDLs) such as Verilog or VHDL. The richness OCaml's type system combined with Hardcaml's fast circuit elaboration checks reduces chance user-introduced bugs and erroneous...

10.48550/arxiv.2312.15035 preprint EN cc-by arXiv (Cornell University) 2023-01-01