NFDI4DS | UHH-SEMS - Publication Details

Ahmed Hemani

ORCID: 0000-0003-0565-9376

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5026355063

Research Areas

Embedded Systems Design Techniques
Parallel Computing and Optimization Techniques
Interconnection Networks and Systems
Advanced Memory and Neural Computing
Low-power high-performance VLSI design
VLSI and FPGA Design Techniques
Neural Networks and Applications
Real-Time Systems Scheduling
VLSI and Analog Circuit Testing
Formal Methods in Verification
Algorithms and Data Compression
CCD and CMOS Imaging Sensors
Model-Driven Software Engineering Techniques
Advanced Neural Network Applications
Genomics and Phylogenetic Studies
Neuroscience and Neural Engineering
Neural dynamics and brain function
Ferroelectric and Negative Capacitance Devices
Distributed and Parallel Computing Systems
Numerical Methods and Algorithms
Real-time simulation and control systems
Information and Cyber Security
Manufacturing Process and Optimization
Radiation Effects in Electronics
Evolutionary Algorithms and Applications

KTH Royal Institute of Technology
2015-2024

Kista Photonics Research Center
2000-2017

University of Turku
2014

Philips (Finland)
2004

Swedish Institute
1990

A network on chip architecture and design methodology

OPENALEX - Publications

Shashi Kumar Axel Jantsch Juha-Pekka Soininen Martti Forsell Mikael Millberg and 3 more

We propose a packet switched platform for single chip systems which scales well to an arbitrary number of processor like resources. The platform, we call Network-on-Chip (NOC), includes both the architecture and design methodology. NOC is m/spl times/n mesh switches resources are placed on slots formed by switches. assume direct layout 2-D providing physical- architectural-level integration. Each switch connected one resource four neighboring switches, each switch. A can be core, memory,...

10.1109/isvlsi.2002.1016885 article EN 2003-06-25

Lowering power consumption in clock by using globally asynchronous locally synchronous design style

OPENALEX - Publications

Ahmed Hemani T. Meincke S. Kumar Adam Postuła Thomas Olsson and 4 more

Article Free Access Share on Lowering power consumption in clock by using globally asynchronous locally synchronous design style Authors: A. Hemani ESD Lab, Department of Electronics, KTH, Sweden SwedenView Profile , T. Meincke S. Kumar Indian Institute Technology, New Delhi, India IndiaView Postula CSEE, University Queensland, Brisbane, Australia AustraliaView Olsson Lund University, P. Nilsson J. Oberg Ellervee D. Lundqvist Ericsson Radio Systems AB, Stockholm, Authors Info & Claims DAC...

10.1145/309847.310091 article EN 1999-06-01

PCSS: Privacy Preserving Communication Scheme for SDN Enabled Smart Homes

OPENALEX - Publications

Waseem Iqbal Haider Abbas Bilal Rauf Yawar Abbas Bangash Muhammad Faisal Amjad and 1 more

Smart home technology, also known as automation system, allows the homeowner and residents to control monitor smart devices like heating, ventilation, air conditioning (HVAC), refrigerators, doors, cameras etc. These features facilitate users by providing a safe well-suited environment. However, at same time these connected could be exploited cybercriminals due overlooked inbuilt security privacy concerns of devices. Because no authentication plain text data transmission, intruders can get...

10.1109/jsen.2021.3087779 article EN IEEE Sensors Journal 2021-06-14

Hardware/software partitioning and minimizing memory interface traffic

OPENALEX - Publications

Axel Jantsch Peeter Ellervee Ahmed Hemani Johnny Öberg Hannu Tenhunen

10.5555/198174.198249 article EN European Design Automation Conference 1994-09-23

Partially reconfigurable interconnection network for dynamically reprogrammable resource array

OPENALEX - Publications

Muhammad Ali Shami Ahmed Hemani

This paper describes an innovative regular non-blocking, point-to-point, point-to-multipoint, low latency interconnection network scheme with sliding window connectivity, which allows arbitrary parallelism among large sub-systems. The area overhead of interconnect is only 30% the chip much smaller as compared to 80% in case FPGA. partially and dynamically reconfigurable. configware reduced 5.6 times by using binary encoding energy efficient dynamic reconfiguration.

10.1109/asicon.2009.5351593 article EN 2009-10-01

A Memristor-Based Learning Engine for Synaptic Trace-Based Online Learning

OPENALEX - Publications

Deyu Wang Jiawei Xu Feng Li Lianhao Zhang Chengwei Cao and 5 more

The memristor has been extensively used to facilitate the synaptic online learning of brain-inspired spiking neural networks (SNNs). However, current memristor-based work can not support widely yet sophisticated trace-based rules, including Spike-Timing-Dependent Plasticity (STDP) and Bayesian Confidence Propagation Neural Network (BCPNN) rules. This paper proposes a engine implement learning, consisting blocks analog computing blocks. is mimic trace dynamics by exploiting nonlinear physical...

10.1109/tbcas.2023.3291021 article EN IEEE Transactions on Biomedical Circuits and Systems 2023-06-30

Cell placement by self-organisation

OPENALEX - Publications

Ahmed Hemani Adam Postuła

10.1016/0893-6080(90)90020-l article EN Neural Networks 1990-01-01

39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation

OPENALEX - Publications

Nasim Farahini Shuo Li Muhammad Adeel Tajammul Muhammad Ali Shami Chen Guo and 2 more

This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for multi-mode accelerator two kernels: FFT the LTE standard and Correlation Pool UMTS to be executed in mutually exclusive manner. The CGRA achieved computational efficiency 39.94 GOPS/watt (OP is multiply-add) silicon 56.20 GOPS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . By analyzing code inferring unused features fully...

10.1109/iscas.2013.6572129 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2013-05-01

Energy-aware-task-parallelism for efficient dynamic voltage, and frequency scaling, in CGRAs

OPENALEX - Publications

Syed M. A. H. Jafri Muhammad Adeel Tajammul Ahmed Hemani Kolin Paul Juha Plosila and 1 more

Today, coarse grained reconfigurable architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Each application itself is composed of tasks, spatially mapped to different parts platform. Providing worst-case operating point all applications leads excessive energy power consumption. To cater this problem, dynamic voltage frequency scaling (DVFS) a frequently used technique. DVFS allows scale the and/or device, based on runtime constraints. Recent...

10.1109/samos.2013.6621112 article EN 2013-07-01

Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells

OPENALEX - Publications

Syed M. A. H. Jafri Ozan Bag Ahmed Hemani Nasim Farahini Kolin Paul and 2 more

This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have versions (implementations different degree parallelism) optimal version only be determined at runtime. For such scenarios, traditional worst case designs compile time mapping decisions are neither nor...

10.1109/isqed.2013.6523597 article EN 2013-03-01

Lowering power consumption in clock by using globally asynchronous locally synchronous design style

OPENALEX - Publications

Ahmed Hemani T. Meincke Shashi Kumar A. Postula T. Olsson and 4 more

Power consumption in clock of large high performance VLSIs can be reduced by adopting globally asynchronous, locally synchronous design style (GALS). GALS has small overheads for the global asynchronous communication and local generation. We propose methods to (a) evaluate benefits account its overheads, which used as basis partitioning system into optimal number/size blocks, (b) automate synthesis communication. Three realistic ASICs, ranging complexity from 1 3 million gates, were...

10.1109/dac.1999.782202 article EN Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361) 2003-01-20

Distributed DVFS using rationally-related frequencies and discrete voltage levels

OPENALEX - Publications

Jean-Michel Chabloz Ahmed Hemani

We have defined a flexible latency-insensitive design style called Globally Ratiochronous Locally Synchronous (GRLS), based on quantized voltage levels and rationally-related clock frequencies. In this paper we present the infrastructure necessary to enable Distributed DVFS in such system analyze its overheads, quantitatively showing how, with minimal obtain energy benefits that are close those of totally ideal GALS approach. The show, coupled complexity performance GRLS, which briefly...

10.1145/1840845.1840897 article EN 2010-08-18

Classification of Massively Parallel Computer Architectures

OPENALEX - Publications

Muhammad Ali Shami Ahmed Hemani

Faced with slowing performance and energy benefits of technology scaling, VLSI/Computer architectures have turned from parallel to massively machines for personal embedded applications in the form multi many core architectures. Additionally, pursuit finding sweet spot between engineering computational efficiency, Coarse Grain Reconfigurable Architectures(CRGAs) been researched. While hese articles surveyed, they not rigorously classified enable objective differentiation comparison...

10.1109/ipdpsw.2012.42 article EN 2012-05-01

Compact generic intermediate representation (CGIR) to enable late binding in coarse grained reconfigurable architectures

OPENALEX - Publications

Syed M. A. H. Jafri Ahmed Hemani Kolin Paul Juha Plosila Hannu Tenhunen

In the era of platforms hosting multiple applications, where inter-application communication and concurrency patterns are arbitrary, static compile time decision making is neither optimal nor desirable. As a part solving this problem, we present novel method for compactly representing configuration bitstreams single application, with varying parallelisms, as unique, compact, customizable representation, called CGIR. The representation thus stored unraveled at runtime to configure device...

10.1109/fpt.2011.6132719 article EN 2011-12-01

eBrainII: a 3 kW Realtime Custom 3D DRAM Integrated ASIC Implementation of a Biologically Plausible Model of a Human Scale Cortex

OPENALEX - Publications

Dimitrios Stathis Chirag Sudarshan Yu Yang Matthias Jung Christian Weis and 3 more

The Artificial Neural Networks (ANNs) like CNN/DNN and LSTM are not biologically plausible in spite of their initial success, they cannot attain the cognitive capabilities enabled by dynamic hierarchical associative memory systems biological brains. spiking brain models, for e.g. cortex, basal ganglia amygdala have a greater potential to achieve capabilities. Bayesian Confidence Propagation Network (BCPNN) is model cortex. A human scale BCPNN real time requires 162 TFlops/s, 50 TBs synaptic...

10.1007/s11265-020-01562-x article EN cc-by Journal of Signal Processing Systems 2020-07-07

Grammar-based hardware synthesis of data communication protocols

OPENALEX - Publications

Johnny Öberg Anshul Kumar Ahmed Hemani

For a synthesis methodology to support implementation independent design specification, capability for space exploration is essential. In this paper we present such specific domain: data communication protocols. A natural way specify various elements of protocols in terms grammar annotated with actions. Our language protocol called PRO-GRAM, based on idea. The hardware specification the done by specifying bit-patterns tokens supposed parse together actual input stream. By constraints and...

10.5555/524431.857933 article EN 1996-11-06

Globally asynchronous locally synchronous architecture for large high-performance ASICs

OPENALEX - Publications

T. Meincke Ahmed Hemani S. Kumar Peeter Ellervee Johnny Öberg and 4 more

Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way obviate global net is partition into large synchronous blocks each having its own clock. Data with other exchanged asynchronously using handshake signals. Adopting such strategy requires methodology that supports: 1) partitioning method dividing number gain due removal exceeds communication overhead 2) synthesis protocols implement data...

10.1109/iscas.1999.780794 article EN 2003-01-20

Morphable DPU: Smart and efficient data path for signal processing applications

OPENALEX - Publications

Muhammad Ali Shami Ahmed Hemani

A coarse grained morphable Datapath Unit (mDPU) has been proposed. This mDPU implements multiplier in a smart way that enables the component adders to be reused when we do not need multiplier. pipelined design further enhances by creating balanced datapath temporal sense. These two features results optimally uses silicon and time. judicious set of granular instructions are enabled show can implement typical signal processing functions. radix-2 64 point FFT implemented 90 nm technology using...

10.1109/sips.2009.5336246 article EN 2009-10-01

Addressing dynamic issues in information security management

OPENALEX - Publications

Haider Abbas Christer Sven Magnusson Louise Yngström Ahmed Hemani

Purpose The purpose of this paper is to address three main problems resulting from uncertainty in information security management: dynamically changing requirements an organization; externalities caused by a system; and obsolete evaluation concerns. Design/methodology/approach In order these critical concerns, framework based on options reasoning borrowed corporate finance proposed adapted architecture decision making for handling issues at organizational level. adaptation as methodology...

10.1108/09685221111115836 article EN Information Management & Computer Security 2011-03-19

System level synthesis of hardware for DSP applications using pre-characterized function implementations

OPENALEX - Publications

Shuo Li Nasim Farahini Ahmed Hemani Kathrin Rosvall Ingo Sander

SYLVA is a system level synthesis framework that transforms DSP sub-systems modeled as synchronous data flow into hardware implementations in ASIC, FPGAs or CGRAs. synthesizes terms of pre-characterized function (FIMPs). It explores the design space three dimensions, number FIMPs, type FIMPs and pipeline parallelism between producing consuming FIMPs. We introduce timing interface model to enable reuse automatic generation Global Inter-connect Control (GLIC) glue together working system. has...

10.5555/2555692.2555708 article EN 2013-09-29

Address generation scheme for a coarse grain reconfigurable architecture

OPENALEX - Publications

Muhammad Ali Shami Ahmed Hemani

In this paper, we describe a versatile address generation scheme for distributed storage resources of coarse grain Parallel Distributed Digital Signal Processing (PDDSP) reconfigurable architecture under development in our group. This proposes the units (AGUs) to decouple logic with compute exploit parallelism (ILP and TLP). To achieve this, proposed standard DSP modes like linear vectorized, circular buffer bit-reverse addressing, all parameterizable range increment/decrement offsets is...

10.1109/asap.2011.6043232 article EN 2011-09-01

Coming Soon ...