Haris Javaid

ORCID: 0009-0008-3472-0803
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Blockchain Technology Applications and Security
  • Low-power high-performance VLSI design
  • Advanced Data Storage Technologies
  • Video Coding and Compression Technologies
  • Caching and Content Delivery
  • Cloud Computing and Resource Management
  • Advanced Memory and Neural Computing
  • VLSI and Analog Circuit Testing
  • Cryptographic Implementations and Security
  • VLSI and FPGA Design Techniques
  • CCD and CMOS Imaging Sensors
  • Advanced Vision and Imaging
  • Cryptography and Data Security
  • Advanced Neural Network Applications
  • Numerical Methods and Algorithms
  • Ferroelectric and Negative Capacitance Devices
  • Manufacturing Process and Optimization
  • Real-Time Systems Scheduling
  • Image and Video Quality Assessment
  • Underwater Vehicles and Communication Systems
  • Green IT and Sustainability
  • Particle accelerators and beam dynamics

Xilinx (United States)
2022

UNSW Sydney
2008-2020

Google (United States)
2015-2016

University of Amsterdam
2013

Karlsruhe Institute of Technology
2013

Carnegie Mellon University
2013

Hanyang University
2013

Leiden University
2013

National Taiwan University
2013

In this paper, we propose a novel NoC architecture, called darkNoC, where multiple layers of architecturally identical, but physically different routers are integrated, leveraging the extra transistors available due to dark silicon. Each layer is separately optimized for particular voltage-frequency range by adroit use multi-Vt circuit optimization. At given time, only one network illuminated while all other dark. We provide architectural support seamless integration layers, and fast...

10.1145/2593069.2593117 article EN 2014-05-27

Blockchain technologies are on the rise, and Hyperledger Fabric is one of most popular permissioned blockchain platforms. In this paper, we re-architect validation phase based our analysis from fine-grained breakdown phase's latency. Our optimized uses a chaincode cache during transactions, initiates state database reads in parallel with writes to ledger databases parallel. experiments reveal performance improvements 2x for CouchDB 1.3x LevelDB. Notably, optimizations can be adopted future...

10.1109/mascots.2019.00038 article EN 2019-09-25

We propose approximate dividers with near-zero error bias for both integer and floating-point numbers. The divider, INZeD, is designed using a novel, analytically deduced error-correction method in an log based divider. FaNZeD, on highly optimized mantissa divider that inspired by INZeD. Both of the are configurable.

10.1145/3316781.3317773 article EN 2019-05-23

Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications, with reduced design time and improved flexibility. Typically pipelined MPSoC is balanced at design-time using worst-case parameters. Where there widely varying workload, such designs consume exorbitant amount of power. In this paper, we propose novel adaptive architecture that adapts itself to workloads. Our consists Main Processors Auxiliary distributed run-time balancing approach, where each...

10.1145/2024724.2024951 article EN Proceedings of the 34th Design Automation Conference 2011-06-05

We propose a new error-configurable approximate unsigned integer multiplier named REALM. It incorporates novel error-reduction method into the classical log-based multiplier. Each power-of-two-interval of input operands is partitioned M×M segments, and an factor for each segment analytically determined. These factors can be used across any power-of-two-interval, so we quantize only M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> store...

10.23919/date48585.2020.9116315 article EN Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE), 2015 2020-03-01

In this paper, we demonstrate how Hyperledger Fabric, one of the most popular permissioned blockchains, can benefit from network-attached acceleration. The scalability and peak performance Fabric is primarily limited by bottlenecks present in its block validation/commit phase. We propose Blockchain Machine, a hardware accelerator coupled with hardware-friendly communication protocol, to act as validator peer. It be adapted applications their smart contracts, targeted for server FPGA...

10.1109/icdcs54860.2022.00033 article EN 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS) 2022-07-01

This paper describes a rapid design methodology to create pipeline of processors execute streaming applications. The seeks system with the smallest area while its runtime is within specified constraint. Initially, heuristic used rapidly explore large number processor configurations find near Pareto front space, and then an exact integer linear programming (ILP) formulation (EIF) optimal solution. A reduced ILP (RIF) or if EIF does not solution in given time window. was integrated into...

10.1109/tcad.2010.2061353 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2010-10-20

This paper describes a rapid design methodology to create pipeline of processers execute streaming applications. The is in two separate phases: the first phase, uses heuristic rapidly search through large number processor configurations (configurations differ by base processor, additional instructions and cache sizes) find near Pareto front; second utilizes either above or an ILP (Integer Linear Programming) formulation smaller space appropriate final implementation. By utilization fast with...

10.1145/1629911.1629979 article EN 2009-07-26

System-level dynamic power management (DPM) schemes in Multiprocessor System on Chips (MPSoCs) exploit the idleness of processors to reduce energy consumption by putting idle low-power states. In presence multiple states, challenge is predict duration period with high accuracy so that most beneficial state can be selected for processor. this work, we propose a novel scheme adaptive pipelined MPSoCs, suitable multimedia applications. We leverage application knowledge form future workload...

10.5555/2132325.2132465 article EN International Conference on Computer Aided Design 2011-11-07

A streaming application, characterized by a kernel that can be broken down into independent tasks which executed in pipelined fashion, inherently allows its implementation on pipeline of Application Specific Instruction set Processors (ASIPs), called MPSoC. The latency and throughput requirements applications put constraints the design such MPSoC, where each ASIP has number available configurations differing additional instructions, instruction data cache sizes. Thus, space MPSoC is all...

10.1145/1878961.1878978 article EN 2010-10-24

Designers of the on-chip interconnect for manycore chips are faced with dilemma meeting performance, power and reliability requirements different operational scenarios. In this paper, we propose a multimode called SuperNet. This can be configured to run in three modes: energy efficient mode; performance and, mode. Our proposed is based on two parallel multi-vt optimized packet switched network-on-chip (NoC) meshes. We describe circuit design techniques architectural modifications required...

10.1145/2744769.2744912 article EN 2015-06-02

Edge training of deep neural networks (DNNs) is a desirable goal for continuous learning; however, it hindered by the enormous computational power required training. Hardware approximate multipliers have shown their effectiveness in gaining resource efficiency DNN inference accelerators; with largely unexplored. To build resource-efficient accelerators supporting training, thorough evaluation convergence and accuracy different architectures needed. This article presents ApproxTrain, an...

10.1109/tcad.2023.3253045 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2023-03-06

System-level dynamic power management (DPM) schemes in Multiprocessor System on Chips (MPSoCs) exploit the idleness of processors to reduce energy consumption by putting idle low-power states. In presence multiple states, challenge is predict duration period with high accuracy so that most beneficial state can be selected for processor. this work, we propose a novel scheme adaptive pipelined MPSoCs, suitable multimedia applications. We leverage application knowledge form future workload...

10.1109/iccad.2011.6105394 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2011-11-01

Network on Chip (NoC) has been envisioned as a scalable fabric for many core chips. However, NoCs can consume considerable share of chip power. Moreover, diverse applications are executed in these multicore, where each application imposes unique load the NoC. To realise NoC which is Energy and Delay efficient, we propose combining multiple VF optimized routers node (in traditional NoCs, have only single router per node) efficient Dark Silicon We present generic with designed different...

10.5555/2755753.2757101 article EN Design, Automation, and Test in Europe 2015-03-09

Network on Chip (NoC) has been envisioned as a scalable fabric for many core chips. However, NoCs can consume considerable share of chip power. Moreover, diverse applications are executed in these multicore, where each application imposes unique load the NoC. To realise NoC which is Energy and Delay efficient, we propose combining multiple VF optimized routers node (in traditional NoCs, have only single router per node) efficient Dark Silicon We present generic with designed different...

10.7873/date.2015.0694 article EN Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE), 2015 2015-01-01

Estimation models play a vital role in many aspects of day to life. Extremely complex estimation are employed the design space exploration SoCs, and efficacy these is usually measured by absolute error compared known actual results. Such based metrics can often result over-designed models, with number researchers suggesting that fidelity an model (correlation between ordering estimated points points) should be examined instead of, or addition to, error. In this paper, for first time, we...

10.5555/2133429.2133431 article EN International Conference on Computer Aided Design 2010-11-07

The paradigm of pipelined MPSoC (processors connected in a pipeline) is well suited to data flow nature multimedia applications. Often design space exploration performed optimize execution time, latency or throughput where the variants system are processor configurations due customizable options each processors. Since there can be billions combinations (design points), challenge quickly provide estimates performance metrics those points. Hence, this article, we propose analytical models...

10.1109/tpds.2013.268 article EN IEEE Transactions on Parallel and Distributed Systems 2013-10-18

Pipelined MPSoCs provide a high throughput implementation platform for multimedia applications. They are typically balanced at design-time considering worst-case scenarios so that given can be fulfilled all times. Such pipelined lack runtime adaptability and result in inefficient resource utilization power/energy consumption under dynamic workload. In this paper, we propose novel adaptive architecture distributed processor manager to enable adaptation MPSoCs. The proposed consists of main...

10.1109/tcad.2014.2298196 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2014-04-18

Streaming applications can be implemented with a pipeline of processors. Each processor in the an application Specific Instruction Set Processor (ASIP) result being heterogeneous pipelined MPSoC system. Since ASIPs differing configurations, finding optimal set configurations for multiprocessor architecture is difficult problem.

10.1145/1450135.1450137 article EN 2008-10-19

Estimation models play a vital role in many aspects of day to life. Extremely complex estimation are employed the design space exploration SoCs, and efficacy these is usually measured by absolute error compared known actual results. Such based metrics can often result over-designed models, with number researchers suggesting that fidelity an model (correlation between ordering estimated points points) should be examined instead of, or addition to, error. In this paper, for first time, we...

10.1109/iccad.2010.5653959 article EN 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2010-11-01

The pipelined Multiprocessor System on Chip (MPSoC) paradigm is well suited to the data flow nature of streaming applications. A MPSoC a system where processing elements (PEs) are connected in pipeline. Each PE implemented using one number processor configurations (configurations differ by instruction sets and cache sizes) available for that PE. goal select with mapping configuration every To estimate run-time MPSoC, designers typically perform cycle-accurate simulation whole system. Since...

10.1109/date.2010.5457178 article EN Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE), 2015 2010-03-01

Parallel implementations of motion estimation for high definition videos typically exploit various forms parallelism (GOP-, frame-, slice- and macroblock-level) to deliver real-time throughput. Although parallel throughput, they often suffer from limited flexibility scalability due the form architecture used. In this work, we use Group Of MacroBlocks (GOMB) Intra-MB (1MB) with a multi-ASIP (Application Specific Instruction set Processor) provide flexible scalable platform videos. Multiple...

10.1109/estimedia.2011.6088526 article EN 2011-10-01

Permissioned blockchain platforms heavily depend on cryptography to provide a layer of trust within the network, thus verification cryptographic signatures often becomes bottleneck. ECDSA is most commonly used scheme in permissioned blockchains. In this work, we propose an efficient implementation signature FPGA, order improve performance blockchains that aim use FPGA-based hardware accelerators. We several optimizations for modular arithmetic (e.g., custom multipliers and fast reduction)...

10.1109/asap54787.2022.00032 article EN 2022-07-01

Permissioned blockchains like Hyperledger Fabric have become quite popular for implementation of enterprise applications. Recent research has mainly focused on improving performance permissioned without any consideration their power/energy consumption. In this paper, we conduct a comprehensive empirical study to understand energy efficiency (throughput/energy) validator peer in (a major bottleneck node). We pick number optimizations from literature (allocated CPUs, software block cache and...

10.1109/icpads56603.2022.00031 article EN 2023-01-01
Coming Soon ...