- Parallel Computing and Optimization Techniques
- Interconnection Networks and Systems
- Embedded Systems Design Techniques
- Blockchain Technology Applications and Security
- Low-power high-performance VLSI design
- Advanced Data Storage Technologies
- Video Coding and Compression Technologies
- Caching and Content Delivery
- Cloud Computing and Resource Management
- Advanced Memory and Neural Computing
- VLSI and Analog Circuit Testing
- Cryptographic Implementations and Security
- VLSI and FPGA Design Techniques
- CCD and CMOS Imaging Sensors
- Advanced Vision and Imaging
- Cryptography and Data Security
- Advanced Neural Network Applications
- Numerical Methods and Algorithms
- Ferroelectric and Negative Capacitance Devices
- Manufacturing Process and Optimization
- Real-Time Systems Scheduling
- Image and Video Quality Assessment
- Underwater Vehicles and Communication Systems
- Green IT and Sustainability
- Particle accelerators and beam dynamics
Xilinx (United States)
2022
UNSW Sydney
2008-2020
Google (United States)
2015-2016
University of Amsterdam
2013
Karlsruhe Institute of Technology
2013
Carnegie Mellon University
2013
Hanyang University
2013
Leiden University
2013
National Taiwan University
2013
In this paper, we propose a novel NoC architecture, called darkNoC, where multiple layers of architecturally identical, but physically different routers are integrated, leveraging the extra transistors available due to dark silicon. Each layer is separately optimized for particular voltage-frequency range by adroit use multi-Vt circuit optimization. At given time, only one network illuminated while all other dark. We provide architectural support seamless integration layers, and fast...
Blockchain technologies are on the rise, and Hyperledger Fabric is one of most popular permissioned blockchain platforms. In this paper, we re-architect validation phase based our analysis from fine-grained breakdown phase's latency. Our optimized uses a chaincode cache during transactions, initiates state database reads in parallel with writes to ledger databases parallel. experiments reveal performance improvements 2x for CouchDB 1.3x LevelDB. Notably, optimizations can be adopted future...
We propose approximate dividers with near-zero error bias for both integer and floating-point numbers. The divider, INZeD, is designed using a novel, analytically deduced error-correction method in an log based divider. FaNZeD, on highly optimized mantissa divider that inspired by INZeD. Both of the are configurable.
Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications, with reduced design time and improved flexibility. Typically pipelined MPSoC is balanced at design-time using worst-case parameters. Where there widely varying workload, such designs consume exorbitant amount of power. In this paper, we propose novel adaptive architecture that adapts itself to workloads. Our consists Main Processors Auxiliary distributed run-time balancing approach, where each...
We propose a new error-configurable approximate unsigned integer multiplier named REALM. It incorporates novel error-reduction method into the classical log-based multiplier. Each power-of-two-interval of input operands is partitioned M×M segments, and an factor for each segment analytically determined. These factors can be used across any power-of-two-interval, so we quantize only M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> store...
In this paper, we demonstrate how Hyperledger Fabric, one of the most popular permissioned blockchains, can benefit from network-attached acceleration. The scalability and peak performance Fabric is primarily limited by bottlenecks present in its block validation/commit phase. We propose Blockchain Machine, a hardware accelerator coupled with hardware-friendly communication protocol, to act as validator peer. It be adapted applications their smart contracts, targeted for server FPGA...
This paper describes a rapid design methodology to create pipeline of processors execute streaming applications. The seeks system with the smallest area while its runtime is within specified constraint. Initially, heuristic used rapidly explore large number processor configurations find near Pareto front space, and then an exact integer linear programming (ILP) formulation (EIF) optimal solution. A reduced ILP (RIF) or if EIF does not solution in given time window. was integrated into...
This paper describes a rapid design methodology to create pipeline of processers execute streaming applications. The is in two separate phases: the first phase, uses heuristic rapidly search through large number processor configurations (configurations differ by base processor, additional instructions and cache sizes) find near Pareto front; second utilizes either above or an ILP (Integer Linear Programming) formulation smaller space appropriate final implementation. By utilization fast with...
System-level dynamic power management (DPM) schemes in Multiprocessor System on Chips (MPSoCs) exploit the idleness of processors to reduce energy consumption by putting idle low-power states. In presence multiple states, challenge is predict duration period with high accuracy so that most beneficial state can be selected for processor. this work, we propose a novel scheme adaptive pipelined MPSoCs, suitable multimedia applications. We leverage application knowledge form future workload...
A streaming application, characterized by a kernel that can be broken down into independent tasks which executed in pipelined fashion, inherently allows its implementation on pipeline of Application Specific Instruction set Processors (ASIPs), called MPSoC. The latency and throughput requirements applications put constraints the design such MPSoC, where each ASIP has number available configurations differing additional instructions, instruction data cache sizes. Thus, space MPSoC is all...
Designers of the on-chip interconnect for manycore chips are faced with dilemma meeting performance, power and reliability requirements different operational scenarios. In this paper, we propose a multimode called SuperNet. This can be configured to run in three modes: energy efficient mode; performance and, mode. Our proposed is based on two parallel multi-vt optimized packet switched network-on-chip (NoC) meshes. We describe circuit design techniques architectural modifications required...
Edge training of deep neural networks (DNNs) is a desirable goal for continuous learning; however, it hindered by the enormous computational power required training. Hardware approximate multipliers have shown their effectiveness in gaining resource efficiency DNN inference accelerators; with largely unexplored. To build resource-efficient accelerators supporting training, thorough evaluation convergence and accuracy different architectures needed. This article presents ApproxTrain, an...
System-level dynamic power management (DPM) schemes in Multiprocessor System on Chips (MPSoCs) exploit the idleness of processors to reduce energy consumption by putting idle low-power states. In presence multiple states, challenge is predict duration period with high accuracy so that most beneficial state can be selected for processor. this work, we propose a novel scheme adaptive pipelined MPSoCs, suitable multimedia applications. We leverage application knowledge form future workload...
Network on Chip (NoC) has been envisioned as a scalable fabric for many core chips. However, NoCs can consume considerable share of chip power. Moreover, diverse applications are executed in these multicore, where each application imposes unique load the NoC. To realise NoC which is Energy and Delay efficient, we propose combining multiple VF optimized routers node (in traditional NoCs, have only single router per node) efficient Dark Silicon We present generic with designed different...
Network on Chip (NoC) has been envisioned as a scalable fabric for many core chips. However, NoCs can consume considerable share of chip power. Moreover, diverse applications are executed in these multicore, where each application imposes unique load the NoC. To realise NoC which is Energy and Delay efficient, we propose combining multiple VF optimized routers node (in traditional NoCs, have only single router per node) efficient Dark Silicon We present generic with designed different...
Estimation models play a vital role in many aspects of day to life. Extremely complex estimation are employed the design space exploration SoCs, and efficacy these is usually measured by absolute error compared known actual results. Such based metrics can often result over-designed models, with number researchers suggesting that fidelity an model (correlation between ordering estimated points points) should be examined instead of, or addition to, error. In this paper, for first time, we...
The paradigm of pipelined MPSoC (processors connected in a pipeline) is well suited to data flow nature multimedia applications. Often design space exploration performed optimize execution time, latency or throughput where the variants system are processor configurations due customizable options each processors. Since there can be billions combinations (design points), challenge quickly provide estimates performance metrics those points. Hence, this article, we propose analytical models...
Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications. They are typically balanced at design-time considering worst-case scenarios so that given can be fulfilled all times. Such pipelined lack runtime adaptability and result in inefficient resource utilization power/energy consumption under dynamic workload. In this paper, we propose novel adaptive architecture distributed processor manager to enable adaptation MPSoCs. The proposed consists of main...
Streaming applications can be implemented with a pipeline of processors. Each processor in the an application Specific Instruction Set Processor (ASIP) result being heterogeneous pipelined MPSoC system. Since ASIPs differing configurations, finding optimal set configurations for multiprocessor architecture is difficult problem.
Estimation models play a vital role in many aspects of day to life. Extremely complex estimation are employed the design space exploration SoCs, and efficacy these is usually measured by absolute error compared known actual results. Such based metrics can often result over-designed models, with number researchers suggesting that fidelity an model (correlation between ordering estimated points points) should be examined instead of, or addition to, error. In this paper, for first time, we...
The pipelined Multiprocessor System on Chip (MPSoC) paradigm is well suited to the data flow nature of streaming applications. A MPSoC a system where processing elements (PEs) are connected in pipeline. Each PE implemented using one number processor configurations (configurations differ by instruction sets and cache sizes) available for that PE. goal select with mapping configuration every To estimate run-time MPSoC, designers typically perform cycle-accurate simulation whole system. Since...
Parallel implementations of motion estimation for high definition videos typically exploit various forms parallelism (GOP-, frame-, slice- and macroblock-level) to deliver real-time throughput. Although parallel throughput, they often suffer from limited flexibility scalability due the form architecture used. In this work, we use Group Of MacroBlocks (GOMB) Intra-MB (1MB) with a multi-ASIP (Application Specific Instruction set Processor) provide flexible scalable platform videos. Multiple...
Permissioned blockchain platforms heavily depend on cryptography to provide a layer of trust within the network, thus verification cryptographic signatures often becomes bottleneck. ECDSA is most commonly used scheme in permissioned blockchains. In this work, we propose an efficient implementation signature FPGA, order improve performance blockchains that aim use FPGA-based hardware accelerators. We several optimizations for modular arithmetic (e.g., custom multipliers and fast reduction)...
Permissioned blockchains like Hyperledger Fabric have become quite popular for implementation of enterprise applications. Recent research has mainly focused on improving performance permissioned without any consideration their power/energy consumption. In this paper, we conduct a comprehensive empirical study to understand energy efficiency (throughput/energy) validator peer in (a major bottleneck node). We pick number optimizations from literature (allocated CPUs, software block cache and...