Ahmed Hemani

ORCID: 0000-0003-0565-9376
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Embedded Systems Design Techniques
  • Parallel Computing and Optimization Techniques
  • Interconnection Networks and Systems
  • Advanced Memory and Neural Computing
  • Low-power high-performance VLSI design
  • VLSI and FPGA Design Techniques
  • Neural Networks and Applications
  • Real-Time Systems Scheduling
  • VLSI and Analog Circuit Testing
  • Formal Methods in Verification
  • Algorithms and Data Compression
  • CCD and CMOS Imaging Sensors
  • Model-Driven Software Engineering Techniques
  • Advanced Neural Network Applications
  • Genomics and Phylogenetic Studies
  • Neuroscience and Neural Engineering
  • Neural dynamics and brain function
  • Ferroelectric and Negative Capacitance Devices
  • Distributed and Parallel Computing Systems
  • Numerical Methods and Algorithms
  • Real-time simulation and control systems
  • Information and Cyber Security
  • Manufacturing Process and Optimization
  • Radiation Effects in Electronics
  • Evolutionary Algorithms and Applications

KTH Royal Institute of Technology
2015-2024

Kista Photonics Research Center
2000-2017

University of Turku
2014

Philips (Finland)
2004

Swedish Institute
1990

We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, we call Network-on-Chip (NOC), includes both the architecture and design methodology. NOC is m/spl times/n mesh switches resources are placed on slots formed by switches. assume direct layout 2-D providing physical- architectural-level integration. Each switch connected one resource four neighboring switches, each switch. A can be core, memory,...

10.1109/isvlsi.2002.1016885 article EN 2003-06-25

Article Free Access Share on Lowering power consumption in clock by using globally asynchronous locally synchronous design style Authors: A. Hemani ESD Lab, Department of Electronics, KTH, Sweden SwedenView Profile , T. Meincke S. Kumar Indian Institute Technology, New Delhi, India IndiaView Postula CSEE, University Queensland, Brisbane, Australia AustraliaView Olsson Lund University, P. Nilsson J. Oberg Ellervee D. Lundqvist Ericsson Radio Systems AB, Stockholm, Authors Info & Claims DAC...

10.1145/309847.310091 article EN 1999-06-01

Smart home technology, also known as automation system, allows the homeowner and residents to control monitor smart devices like heating, ventilation, air conditioning (HVAC), refrigerators, doors, cameras etc. These features facilitate users by providing a safe well-suited environment. However, at same time these connected could be exploited cybercriminals due overlooked inbuilt security privacy concerns of devices. Because no authentication plain text data transmission, intruders can get...

10.1109/jsen.2021.3087779 article EN IEEE Sensors Journal 2021-06-14

This paper describes an innovative regular non-blocking, point-to-point, point-to-multipoint, low latency interconnection network scheme with sliding window connectivity, which allows arbitrary parallelism among large sub-systems. The area overhead of interconnect is only 30% the chip much smaller as compared to 80% in case FPGA. partially and dynamically reconfigurable. configware reduced 5.6 times by using binary encoding energy efficient dynamic reconfiguration.

10.1109/asicon.2009.5351593 article EN 2009-10-01

The memristor has been extensively used to facilitate the synaptic online learning of brain-inspired spiking neural networks (SNNs). However, current memristor-based work can not support widely yet sophisticated trace-based rules, including Spike-Timing-Dependent Plasticity (STDP) and Bayesian Confidence Propagation Neural Network (BCPNN) rules. This paper proposes a engine implement learning, consisting blocks analog computing blocks. is mimic trace dynamics by exploiting nonlinear physical...

10.1109/tbcas.2023.3291021 article EN IEEE Transactions on Biomedical Circuits and Systems 2023-06-30

10.1016/0893-6080(90)90020-l article EN Neural Networks 1990-01-01

This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for multi-mode accelerator two kernels: FFT the LTE standard and Correlation Pool UMTS to be executed in mutually exclusive manner. The CGRA achieved computational efficiency 39.94 GOPS/watt (OP is multiply-add) silicon 56.20 GOPS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . By analyzing code inferring unused features fully...

10.1109/iscas.2013.6572129 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2013-05-01

Today, coarse grained reconfigurable architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Each application itself is composed of tasks, spatially mapped to different parts platform. Providing worst-case operating point all applications leads excessive energy power consumption. To cater this problem, dynamic voltage frequency scaling (DVFS) a frequently used technique. DVFS allows scale the and/or device, based on runtime constraints. Recent...

10.1109/samos.2013.6621112 article EN 2013-07-01

This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have versions (implementations different degree parallelism) optimal version only be determined at runtime. For such scenarios, traditional worst case designs compile time mapping decisions are neither nor...

10.1109/isqed.2013.6523597 article EN 2013-03-01

Power consumption in clock of large high performance VLSIs can be reduced by adopting globally asynchronous, locally synchronous design style (GALS). GALS has small overheads for the global asynchronous communication and local generation. We propose methods to (a) evaluate benefits account its overheads, which used as basis partitioning system into optimal number/size blocks, (b) automate synthesis communication. Three realistic ASICs, ranging complexity from 1 3 million gates, were...

10.1109/dac.1999.782202 article EN Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361) 2003-01-20

We have defined a flexible latency-insensitive design style called Globally Ratiochronous Locally Synchronous (GRLS), based on quantized voltage levels and rationally-related clock frequencies. In this paper we present the infrastructure necessary to enable Distributed DVFS in such system analyze its overheads, quantitatively showing how, with minimal obtain energy benefits that are close those of totally ideal GALS approach. The show, coupled complexity performance GRLS, which briefly...

10.1145/1840845.1840897 article EN 2010-08-18

Faced with slowing performance and energy benefits of technology scaling, VLSI/Computer architectures have turned from parallel to massively machines for personal embedded applications in the form multi many core architectures. Additionally, pursuit finding sweet spot between engineering computational efficiency, Coarse Grain Reconfigurable Architectures(CRGAs) been researched. While hese articles surveyed, they not rigorously classified enable objective differentiation comparison...

10.1109/ipdpsw.2012.42 article EN 2012-05-01

In the era of platforms hosting multiple applications, where inter-application communication and concurrency patterns are arbitrary, static compile time decision making is neither optimal nor desirable. As a part solving this problem, we present novel method for compactly representing configuration bitstreams single application, with varying parallelisms, as unique, compact, customizable representation, called CGIR. The representation thus stored unraveled at runtime to configure device...

10.1109/fpt.2011.6132719 article EN 2011-12-01

The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible in spite of their initial success, they cannot attain the cognitive capabilities enabled by dynamic hierarchical associative memory systems biological brains. spiking brain models, for e.g. cortex, basal ganglia amygdala have a greater potential to achieve capabilities. Bayesian Confidence Propagation Network (BCPNN) is model cortex. A human scale BCPNN real time requires 162 TFlops/s, 50 TBs synaptic...

10.1007/s11265-020-01562-x article EN cc-by Journal of Signal Processing Systems 2020-07-07

For a synthesis methodology to support implementation independent design specification, capability for space exploration is essential. In this paper we present such specific domain: data communication protocols. A natural way specify various elements of protocols in terms grammar annotated with actions. Our language protocol called PRO-GRAM, based on idea. The hardware specification the done by specifying bit-patterns tokens supposed parse together actual input stream. By constraints and...

10.5555/524431.857933 article EN 1996-11-06

Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way obviate global net is partition into large synchronous blocks each having its own clock. Data with other exchanged asynchronously using handshake signals. Adopting such strategy requires methodology that supports: 1) partitioning method dividing number gain due removal exceeds communication overhead 2) synthesis protocols implement data...

10.1109/iscas.1999.780794 article EN 2003-01-20

A coarse grained morphable Datapath Unit (mDPU) has been proposed. This mDPU implements multiplier in a smart way that enables the component adders to be reused when we do not need multiplier. pipelined design further enhances by creating balanced datapath temporal sense. These two features results optimally uses silicon and time. judicious set of granular instructions are enabled show can implement typical signal processing functions. radix-2 64 point FFT implemented 90 nm technology using...

10.1109/sips.2009.5336246 article EN 2009-10-01

Purpose The purpose of this paper is to address three main problems resulting from uncertainty in information security management: dynamically changing requirements an organization; externalities caused by a system; and obsolete evaluation concerns. Design/methodology/approach In order these critical concerns, framework based on options reasoning borrowed corporate finance proposed adapted architecture decision making for handling issues at organizational level. adaptation as methodology...

10.1108/09685221111115836 article EN Information Management & Computer Security 2011-03-19

SYLVA is a system level synthesis framework that transforms DSP sub-systems modeled as synchronous data flow into hardware implementations in ASIC, FPGAs or CGRAs. synthesizes terms of pre-characterized function (FIMPs). It explores the design space three dimensions, number FIMPs, type FIMPs and pipeline parallelism between producing consuming FIMPs. We introduce timing interface model to enable reuse automatic generation Global Inter-connect Control (GLIC) glue together working system. has...

10.5555/2555692.2555708 article EN 2013-09-29

In this paper, we describe a versatile address generation scheme for distributed storage resources of coarse grain Parallel Distributed Digital Signal Processing (PDDSP) reconfigurable architecture under development in our group. This proposes the units (AGUs) to decouple logic with compute exploit parallelism (ILP and TLP). To achieve this, proposed standard DSP modes like linear vectorized, circular buffer bit-reverse addressing, all parameterizable range increment/decrement offsets is...

10.1109/asap.2011.6043232 article EN 2011-09-01
Coming Soon ...