NFDI4DS | UHH-SEMS - Publication Details

Ali Afzali‐Kusha

ORCID: 0000-0001-8614-2007

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5074063358

Research Areas

Low-power high-performance VLSI design
Advancements in Semiconductor Devices and Circuit Design
Semiconductor materials and devices
Interconnection Networks and Systems
Parallel Computing and Optimization Techniques
Analog and Mixed-Signal Circuit Design
Embedded Systems Design Techniques
Advanced Memory and Neural Computing
VLSI and FPGA Design Techniques
VLSI and Analog Circuit Testing
Ferroelectric and Negative Capacitance Devices
Silicon Carbide Semiconductor Technologies
Integrated Circuits and Semiconductor Failure Analysis
Radiation Effects in Electronics
Supercapacitor Materials and Fabrication
Quantum-Dot Cellular Automata
Thin-Film Transistor Technologies
Advancements in PLL and VCO Technologies
Advanced Neural Network Applications
CCD and CMOS Imaging Sensors
Neuroscience and Neural Engineering
Silicon and Solar Cell Technologies
Energy Efficient Wireless Sensor Networks
Cryptography and Residue Arithmetic
Digital Filter Design and Implementation

University of Tehran
2014-2023

Institute for Research in Fundamental Sciences
2003-2023

University of Southern California
2010

Ilam University
2007

Amirkabir University of Technology
2006

Institute for Cognitive Science Studies
2003

Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers

OPENALEX - Publications

Omid Akbari Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this paper, we propose four 4:2 compressors, which have the flexibility of switching between exact and approximate operating modes. mode, these dual-quality compressors provide higher speeds lower power consumptions at cost accuracy. Each has its own level accuracy in mode as well different delays dissipations Using structures parallel multipliers provides configurable whose accuracies (as their powers speeds) may change dynamically during runtime. The efficiencies a 32-bit Dadda...

10.1109/tvlsi.2016.2643003 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-01-17

RAP-CLA: A Reconfigurable Approximate Carry Look-Ahead Adder

OPENALEX - Publications

Omid Akbari Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this brief, we propose a fast yet energy-efficient reconfigurable approximate carry look-ahead adder (RAP-CLA). This has the ability of switching between and exact operating modes making it suitable for both error-resilient applications. The structure, which is more area power efficient than state-of-the-art adders, achieved by some modifications to conventional look ahead (CLA). efficacy proposed RAP-CLA evaluated comparing its characteristics those two adders as well (exact) CLA in 15...

10.1109/tcsii.2016.2633307 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2016-11-29

Ground plane fin-shaped field effect transistor (GP-FinFET): A FinFET for low leakage power circuits

OPENALEX - Publications

Mehdi Saremi Ali Afzali‐Kusha Saeed Mohammadi

10.1016/j.mee.2012.01.009 article EN Microelectronic Engineering 2012-02-06

TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable Approximate Multiplier

OPENALEX - Publications

Shaghayegh Vahdat Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

A scalable approximate multiplier, called truncation- and rounding-based multiplier (TOSAM) is presented, which reduces the number of partial products by truncating each input operands based on their leading one-bit position. In proposed design, multiplication performed shift, add, small fixed-width operations resulting in large improvements energy consumption area occupation compared to those exact multiplier. To improve total accuracy, part are rounded nearest odd number. Because truncated...

10.1109/tvlsi.2018.2890712 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019-01-25

Substrate Noise Coupling in SoC Design: Modeling, Avoidance, and Validation

OPENALEX - Publications

Ali Afzali‐Kusha Makoto Nagata Nishath Verghese D.J. Allstot

Issues related to substrate noise in system-on-chip design are described including the physical phenomena responsible for its creation, coupling transmission mechanisms and media, parameters affecting strength, impact on mixed-signal integrated circuits. Design guidelines best practices minimize generation, transmission, reception of outlined, different modeling approaches computer simulation methods used quantifying presented. Finally, experiments that validate mitigation techniques reviewed

10.1109/jproc.2006.886029 article EN Proceedings of the IEEE 2006-12-01

Approximate Reverse Carry Propagate Adder for Energy-Efficient DSP Applications

OPENALEX - Publications

Masoud Pashaeifar Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this paper, a reverse carry propagate adder (RCPA) is presented. the RCPA structure, signal propagates in counter-flow manner from most significant bit to least bit; hence, input has higher significance than output carry. This method of propagation leads stability presence delay variations. Three implementations full-adder (RCPFA) cell with different delay, power, energy, and accuracy levels are introduced. The proposed structure may be combined an exact (forward) form hybrid adders...

10.1109/tvlsi.2018.2859939 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2018-08-16

A near-threshold 7T SRAM cell with high write and read margins and low write time for sub-20 nm FinFET technologies

OPENALEX - Publications

Mohammad Ansari Hassan Afzali-Kusha Behzad Ebrahimi Zainalabedin Navabi Ali Afzali‐Kusha and 1 more

10.1016/j.vlsi.2015.02.002 article EN Integration 2015-02-20

Block-Based Carry Speculative Approximate Adder for Energy-Efficient Applications

OPENALEX - Publications

Farhad Ebrahimi-Azandaryani Omid Akbari Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this brief, a low energy consumption block-based carry speculative approximate adder is proposed. Its structure based on partitioning the into some non-overlapped summation blocks whose structures may be selected from both propagate and parallel-prefix adders. Here, output of each block speculated input operands itself those next block. adder, length chain reduced to two (worst case), where in most cases only one employed calculate leading lower average delay. addition, increase accuracy...

10.1109/tcsii.2019.2901060 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2019-02-22

Res-DNN: A Residue Number System-Based DNN Accelerator Unit

OPENALEX - Publications

Nasim Samimi Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this article, a technique, based on using Residue Number System (RNS) is suggested to improve the energy efficiency of Deep Neural Networks (DNNs). DNN architecture, which fully RNS-based, only weights and primary inputs in main memory are binary number system (BNS). The called Res-DNN, offers high saving while requiring higher bit count for data handle overflow compared that BNS one. Scaling techniques processing elements employed RNS-based computations make computation widths same as...

10.1109/tcsi.2019.2951083 article EN publisher-specific-oa IEEE Transactions on Circuits and Systems I Regular Papers 2019-11-25

EDXY – A low cost congestion-aware routing algorithm for network-on-chips

OPENALEX - Publications

Pejman Lotfi-Kamran Amirsajjad Rahmani Masoud Daneshtalab Ali Afzali‐Kusha Zainalabedin Navabi

10.1016/j.sysarc.2010.05.002 article EN Journal of Systems Architecture 2010-05-13

Design and Analysis of Two Low-Power SRAM Cell Structures

OPENALEX - Publications

G. Razavipour Ali Afzali‐Kusha Massoud Pedram

In this paper, two static random access memory (SRAM) cells that reduce the power dissipation due to gate and subthreshold leakage currents are presented. The first cell structure results in reduced voltages for NMOS pass transistors, thus lowers current. It reduces current by increasing ground level during idle (inactive) mode. second makes use of PMOS transistors lower addition, dual threshold voltage technology with forward body biasing is utilized while maintaining performance. Compared...

10.1109/tvlsi.2008.2004590 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2009-03-19

High-Speed and Energy-Efficient Carry Skip Adder Operating Under a Wide Range of Supply Voltage Levels

OPENALEX - Publications

Milad Bahadori Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this paper, we present a carry skip adder (CSKA) structure that has higher speed yet lower energy consumption compared with the conventional one. The enhancement is achieved by applying concatenation and incrementation schemes to improve efficiency of CSKA (Conv-CSKA) structure. addition, instead utilizing multiplexer logic, proposed makes use AND-OR-Invert (AOI) OR-AND-Invert (OAI) compound gates for logic. may be realized both fixed stage size variable styles, wherein latter further...

10.1109/tvlsi.2015.2405133 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2015-03-11

TruncApp: A truncation-based approximate divider for energy efficient DSP applications

OPENALEX - Publications

Shaghayegh Vahdat Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram Zainalabedin Navabi

In this paper, we present a high speed yet energy efficient approximate divider where the division operation is performed by multiplying dividend inverse of divisor. structure, truncated value multiplied exactly (approximately) To assess efficacy proposed divider, its design parameters are extracted and compared to those number prior art dividers in 45nm CMOS technology. Results reveal that structure provides 66% 52% improvements area consumption, respectively, most advanced divider....

10.23919/date.2017.7927254 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2017-03-01

SEERAD: A High Speed yet Energy-Efficient Rounding-based Approximate Divider

OPENALEX - Publications

Reza Zendegani Mehdi Kamal Arash Fayyazi Ali Afzali‐Kusha Saeed Safari and 1 more

In this paper, a high speed yet energy-efficient approximate divider for error resilient applications is proposed. For the division operation, divisor rounded to value with specific form resulting in transformation of operation multiplication one. The proposed enjoys flexibility increasing accuracy at price higher delay and hardware usage. efficacy evaluated comparison three different implementations SRT divider. results show that energy consumption are, on average, 14 300 times smaller than...

10.3850/9783981537079_0521 article EN 2016-01-01

An Ultra Low-Power Memristive Neuromorphic Circuit for Internet of Things Smart Sensors

OPENALEX - Publications

Arash Fayyazi Mohammad Ansari Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this paper, we propose an ultra low-power analog neuromorphic circuit to be trained process sensory data in the Internet of Things smart sensors where and are efficient computing is required. To reduce operating voltage while maintaining performance, focus on designing a memristive without employing operational amplifiers. Therefore, use CMOS inverters as neurons our circuit. We also mixed-signal input/output interfaces make connectable other digital components such embedded processor....

10.1109/jiot.2018.2799948 article EN IEEE Internet of Things Journal 2018-01-30

X-CGRA: An Energy-Efficient Approximate Coarse-Grained Reconfigurable Architecture

OPENALEX - Publications

Omid Akbari Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram Muhammad Shafique

In this article, we present an energy-efficient approximate CGRA (X-CGRA). Instead of conventional exact arithmetic units, it employs configurable adders and multipliers in the so-called quality-scalable processing elements (QSPEs). Furthermore, structure functionality other architectural components, like context memory, are modified based on operating modes QSPEs. The quality reconfigurability X-CGRA makes amenable for both error-resilient nonresilient applications. To map applications...

10.1109/tcad.2019.2937738 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2019-08-27

POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator

OPENALEX - Publications

Erfan Bank-Tavakoli Seyed Abolfazl Ghasemzadeh Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The has low-power and high-speed features that are achieved through overlapping timing of operations pipelining datapath. Moreover, requires negligible internal size storing intermediate data leading to simple routing, which provides lower interconnect delay (higher operating frequency). A designer may...

10.1109/tvlsi.2019.2947639 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019-11-01

Low-power single- and double-edge-triggered flip-flops for high-speed applications

OPENALEX - Publications

Seid Hadi Rasouli Ahmad Khademzadeh Ali Afzali‐Kusha Mehrdad Nourani

The paper presents new low-power flip-flops which are faster compared to previously proposed structures. single-edge-triggered flip-flop, called the MHLFF (modified hybrid latch flip-flop), reduces power dissipation of HLFF (hybrid flip-flop) by avoiding unnecessary node transitions. To reduce consumption flip-flop further, double-edge-triggered modified (DMHLFF) is also proposed. in clock tree reduced halving frequency for same throughput. In addition low power, speed higher while area not...

10.1049/ip-cds:20041241 article EN IEE Proceedings - Circuits Devices and Systems 2005-01-01

Data Encoding Techniques for Reducing Energy Consumption in Network-on-Chip

OPENALEX - Publications

Nima Jafarzadeh Maurizio Palesi Ahmad Khademzadeh Ali Afzali‐Kusha

As technology shrinks, the power dissipated by links of a network-on-chip (NoC) starts to compete with other elements communication subsystem, namely, routers and network interfaces (NIs). In this paper, we present set data encoding schemes aimed at reducing an NoC. The proposed are general transparent respect underlying NoC fabric (i.e., their application does not require any modification link architecture). Experiments carried out on both synthetic real traffic scenarios show effectiveness...

10.1109/tvlsi.2013.2251020 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2013-03-28

LETAM: A low energy truncation-based approximate multiplier

OPENALEX - Publications

Shaghayegh Vahdat Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

10.1016/j.compeleceng.2017.08.019 article EN publisher-specific-oa Computers & Electrical Engineering 2017-10-01

A Theoretical Framework for Quality Estimation and Optimization of DSP Applications Using Low-Power Approximate Adders

OPENALEX - Publications

Masoud Pashaeifar Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram

In this paper, we present a framework for analytically estimating the output quality of common digital signal processing (DSP) blocks that utilize approximate adders. The is based on considering error adders as an additive noise (approximation noise) disturbs DSP block in question. A theoretical modeling approach describing power approximation which integral spectral density over bandwidth, developed. qualities blocks, such finite impulse response filter, discrete cosine transform, and fast...

10.1109/tcsi.2018.2856757 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2018-07-27

TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-Chip

OPENALEX - Publications

Arman Iranfar Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram David Atienza

Thermal stress including temperature gradients in time and space, as well thermal cycling, influences lifetime reliability performance of modern multiprocessor systems-on-chip (MPSoCs). Conventional power management techniques considering the peak temperature/power consumption do not provide a comprehensive solution to avoid high spatial temporal variations. This work presents TheSPoT, novel multilevel stress-aware approach for MPSoCs. At top level, core consolidation deconsolidation is...

10.1109/tcad.2017.2768417 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2017-10-31

PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture

OPENALEX - Publications

Omid Akbari Mehdi Kamal Ali Afzali‐Kusha Massoud Pedram Muhammad Shafique

Coarse-Grained Reconfigurable Architectures (CGRAs) provide tradeoff between the energy-efficiency of Application Specific Integrated Circuits (ASICs) and flexibility General Purpose Processors (GPPs). State-of-the-art CGRAs only support exact architectures precise application executions. However, a majority streaming applications such as multimedia digital signal processing, which are amenable to CGRAs, inherently error resilient. Therefore, these can greatly benefit from emerging trend...

10.23919/date.2018.8342045 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

Evaluation of Pseudo Adaptive XY Routing Using an Object Oriented Model for NOC

OPENALEX - Publications

M. Dehyadgari Mohsen Nickray Ali Afzali‐Kusha Zainalabedin Navabi

In this paper we propose a pseudo adaptive routing which is an extension of classic XY routing. We consider mesh topology for evaluating proposed Our switches use algorithm. The load in the center network ordinary much higher rather than total average. This extra on can cause spot hot. main objective our algorithm to distribute load. One advantages distributing balanced temperature mesh. has two deterministic and modes that status neighbors each switch used decide mode must be selected....

10.1109/icm.2005.1590068 article EN International Conference on Microelectronics 2006-02-15

BZ-FAD: A Low-Power Low-Area Multiplier Based on Shift-and-Add Architecture

OPENALEX - Publications

M. Mottaghi-Dastjerdi Ali Afzali‐Kusha Massoud Pedram

In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. modifications to multiplier which multiplies <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</i> by xmlns:xlink="http://www.w3.org/1999/xlink">B</i> include removal shifting register, direct feeding adder, bypassing adder whenever...

10.1109/tvlsi.2008.2004544 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2009-01-16

Coming Soon ...