Ali Afzali‐Kusha

ORCID: 0000-0001-8614-2007
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Low-power high-performance VLSI design
  • Advancements in Semiconductor Devices and Circuit Design
  • Semiconductor materials and devices
  • Interconnection Networks and Systems
  • Parallel Computing and Optimization Techniques
  • Analog and Mixed-Signal Circuit Design
  • Embedded Systems Design Techniques
  • Advanced Memory and Neural Computing
  • VLSI and FPGA Design Techniques
  • VLSI and Analog Circuit Testing
  • Ferroelectric and Negative Capacitance Devices
  • Silicon Carbide Semiconductor Technologies
  • Integrated Circuits and Semiconductor Failure Analysis
  • Radiation Effects in Electronics
  • Supercapacitor Materials and Fabrication
  • Quantum-Dot Cellular Automata
  • Thin-Film Transistor Technologies
  • Advancements in PLL and VCO Technologies
  • Advanced Neural Network Applications
  • CCD and CMOS Imaging Sensors
  • Neuroscience and Neural Engineering
  • Silicon and Solar Cell Technologies
  • Energy Efficient Wireless Sensor Networks
  • Cryptography and Residue Arithmetic
  • Digital Filter Design and Implementation

University of Tehran
2014-2023

Institute for Research in Fundamental Sciences
2003-2023

University of Southern California
2010

Ilam University
2007

Amirkabir University of Technology
2006

Institute for Cognitive Science Studies
2003

In this paper, we propose four 4:2 compressors, which have the flexibility of switching between exact and approximate operating modes. mode, these dual-quality compressors provide higher speeds lower power consumptions at cost accuracy. Each has its own level accuracy in mode as well different delays dissipations Using structures parallel multipliers provides configurable whose accuracies (as their powers speeds) may change dynamically during runtime. The efficiencies a 32-bit Dadda...

10.1109/tvlsi.2016.2643003 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2017-01-17

In this brief, we propose a fast yet energy-efficient reconfigurable approximate carry look-ahead adder (RAP-CLA). This has the ability of switching between and exact operating modes making it suitable for both error-resilient applications. The structure, which is more area power efficient than state-of-the-art adders, achieved by some modifications to conventional look ahead (CLA). efficacy proposed RAP-CLA evaluated comparing its characteristics those two adders as well (exact) CLA in 15...

10.1109/tcsii.2016.2633307 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2016-11-29

A scalable approximate multiplier, called truncation- and rounding-based multiplier (TOSAM) is presented, which reduces the number of partial products by truncating each input operands based on their leading one-bit position. In proposed design, multiplication performed shift, add, small fixed-width operations resulting in large improvements energy consumption area occupation compared to those exact multiplier. To improve total accuracy, part are rounded nearest odd number. Because truncated...

10.1109/tvlsi.2018.2890712 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019-01-25

Issues related to substrate noise in system-on-chip design are described including the physical phenomena responsible for its creation, coupling transmission mechanisms and media, parameters affecting strength, impact on mixed-signal integrated circuits. Design guidelines best practices minimize generation, transmission, reception of outlined, different modeling approaches computer simulation methods used quantifying presented. Finally, experiments that validate mitigation techniques reviewed

10.1109/jproc.2006.886029 article EN Proceedings of the IEEE 2006-12-01

In this paper, a reverse carry propagate adder (RCPA) is presented. the RCPA structure, signal propagates in counter-flow manner from most significant bit to least bit; hence, input has higher significance than output carry. This method of propagation leads stability presence delay variations. Three implementations full-adder (RCPFA) cell with different delay, power, energy, and accuracy levels are introduced. The proposed structure may be combined an exact (forward) form hybrid adders...

10.1109/tvlsi.2018.2859939 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2018-08-16

In this brief, a low energy consumption block-based carry speculative approximate adder is proposed. Its structure based on partitioning the into some non-overlapped summation blocks whose structures may be selected from both propagate and parallel-prefix adders. Here, output of each block speculated input operands itself those next block. adder, length chain reduced to two (worst case), where in most cases only one employed calculate leading lower average delay. addition, increase accuracy...

10.1109/tcsii.2019.2901060 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2019-02-22

In this article, a technique, based on using Residue Number System (RNS) is suggested to improve the energy efficiency of Deep Neural Networks (DNNs). DNN architecture, which fully RNS-based, only weights and primary inputs in main memory are binary number system (BNS). The called Res-DNN, offers high saving while requiring higher bit count for data handle overflow compared that BNS one. Scaling techniques processing elements employed RNS-based computations make computation widths same as...

10.1109/tcsi.2019.2951083 article EN publisher-specific-oa IEEE Transactions on Circuits and Systems I Regular Papers 2019-11-25

In this paper, two static random access memory (SRAM) cells that reduce the power dissipation due to gate and subthreshold leakage currents are presented. The first cell structure results in reduced voltages for NMOS pass transistors, thus lowers current. It reduces current by increasing ground level during idle (inactive) mode. second makes use of PMOS transistors lower addition, dual threshold voltage technology with forward body biasing is utilized while maintaining performance. Compared...

10.1109/tvlsi.2008.2004590 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2009-03-19

In this paper, we present a carry skip adder (CSKA) structure that has higher speed yet lower energy consumption compared with the conventional one. The enhancement is achieved by applying concatenation and incrementation schemes to improve efficiency of CSKA (Conv-CSKA) structure. addition, instead utilizing multiplexer logic, proposed makes use AND-OR-Invert (AOI) OR-AND-Invert (OAI) compound gates for logic. may be realized both fixed stage size variable styles, wherein latter further...

10.1109/tvlsi.2015.2405133 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2015-03-11

In this paper, we present a high speed yet energy efficient approximate divider where the division operation is performed by multiplying dividend inverse of divisor. structure, truncated value multiplied exactly (approximately) To assess efficacy proposed divider, its design parameters are extracted and compared to those number prior art dividers in 45nm CMOS technology. Results reveal that structure provides 66% 52% improvements area consumption, respectively, most advanced divider....

10.23919/date.2017.7927254 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2017-03-01

In this paper, a high speed yet energy-efficient approximate divider for error resilient applications is proposed. For the division operation, divisor rounded to value with specific form resulting in transformation of operation multiplication one. The proposed enjoys flexibility increasing accuracy at price higher delay and hardware usage. efficacy evaluated comparison three different implementations SRT divider. results show that energy consumption are, on average, 14 300 times smaller than...

10.3850/9783981537079_0521 article EN 2016-01-01

In this paper, we propose an ultra low-power analog neuromorphic circuit to be trained process sensory data in the Internet of Things smart sensors where and are efficient computing is required. To reduce operating voltage while maintaining performance, focus on designing a memristive without employing operational amplifiers. Therefore, use CMOS inverters as neurons our circuit. We also mixed-signal input/output interfaces make connectable other digital components such embedded processor....

10.1109/jiot.2018.2799948 article EN IEEE Internet of Things Journal 2018-01-30

In this article, we present an energy-efficient approximate CGRA (X-CGRA). Instead of conventional exact arithmetic units, it employs configurable adders and multipliers in the so-called quality-scalable processing elements (QSPEs). Furthermore, structure functionality other architectural components, like context memory, are modified based on operating modes QSPEs. The quality reconfigurability X-CGRA makes amenable for both error-resilient nonresilient applications. To map applications...

10.1109/tcad.2019.2937738 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2019-08-27

In this brief, a low resource utilization field-programmable gate array (FPGA)-based long short-term memory (LSTM) network architecture for accelerating the inference phase is presented. The has low-power and high-speed features that are achieved through overlapping timing of operations pipelining datapath. Moreover, requires negligible internal size storing intermediate data leading to simple routing, which provides lower interconnect delay (higher operating frequency). A designer may...

10.1109/tvlsi.2019.2947639 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2019-11-01

The paper presents new low-power flip-flops which are faster compared to previously proposed structures. single-edge-triggered flip-flop, called the MHLFF (modified hybrid latch flip-flop), reduces power dissipation of HLFF (hybrid flip-flop) by avoiding unnecessary node transitions. To reduce consumption flip-flop further, double-edge-triggered modified (DMHLFF) is also proposed. in clock tree reduced halving frequency for same throughput. In addition low power, speed higher while area not...

10.1049/ip-cds:20041241 article EN IEE Proceedings - Circuits Devices and Systems 2005-01-01

As technology shrinks, the power dissipated by links of a network-on-chip (NoC) starts to compete with other elements communication subsystem, namely, routers and network interfaces (NIs). In this paper, we present set data encoding schemes aimed at reducing an NoC. The proposed are general transparent respect underlying NoC fabric (i.e., their application does not require any modification link architecture). Experiments carried out on both synthetic real traffic scenarios show effectiveness...

10.1109/tvlsi.2013.2251020 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2013-03-28

10.1016/j.compeleceng.2017.08.019 article EN publisher-specific-oa Computers & Electrical Engineering 2017-10-01

In this paper, we present a framework for analytically estimating the output quality of common digital signal processing (DSP) blocks that utilize approximate adders. The is based on considering error adders as an additive noise (approximation noise) disturbs DSP block in question. A theoretical modeling approach describing power approximation which integral spectral density over bandwidth, developed. qualities blocks, such finite impulse response filter, discrete cosine transform, and fast...

10.1109/tcsi.2018.2856757 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2018-07-27

Thermal stress including temperature gradients in time and space, as well thermal cycling, influences lifetime reliability performance of modern multiprocessor systems-on-chip (MPSoCs). Conventional power management techniques considering the peak temperature/power consumption do not provide a comprehensive solution to avoid high spatial temporal variations. This work presents TheSPoT, novel multilevel stress-aware approach for MPSoCs. At top level, core consolidation deconsolidation is...

10.1109/tcad.2017.2768417 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2017-10-31

Coarse-Grained Reconfigurable Architectures (CGRAs) provide tradeoff between the energy-efficiency of Application Specific Integrated Circuits (ASICs) and flexibility General Purpose Processors (GPPs). State-of-the-art CGRAs only support exact architectures precise application executions. However, a majority streaming applications such as multimedia digital signal processing, which are amenable to CGRAs, inherently error resilient. Therefore, these can greatly benefit from emerging trend...

10.23919/date.2018.8342045 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

In this paper we propose a pseudo adaptive routing which is an extension of classic XY routing. We consider mesh topology for evaluating proposed Our switches use algorithm. The load in the center network ordinary much higher rather than total average. This extra on can cause spot hot. main objective our algorithm to distribute load. One advantages distributing balanced temperature mesh. has two deterministic and modes that status neighbors each switch used decide mode must be selected....

10.1109/icm.2005.1590068 article EN International Conference on Microelectronics 2006-02-15

In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. modifications to multiplier which multiplies <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">A</i> by xmlns:xlink="http://www.w3.org/1999/xlink">B</i> include removal shifting register, direct feeding adder, bypassing adder whenever...

10.1109/tvlsi.2008.2004544 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2009-01-16
Coming Soon ...