- Interconnection Networks and Systems
- Parallel Computing and Optimization Techniques
- Low-power high-performance VLSI design
- Embedded Systems Design Techniques
- VLSI and Analog Circuit Testing
- Advanced Memory and Neural Computing
- VLSI and FPGA Design Techniques
- Radiation Effects in Electronics
- Supercapacitor Materials and Fabrication
- Quantum-Dot Cellular Automata
- Ferroelectric and Negative Capacitance Devices
- Software-Defined Networks and 5G
- Advancements in Semiconductor Devices and Circuit Design
- Advanced Optical Network Technologies
- Advanced MIMO Systems Optimization
- Coding theory and cryptography
- Advancements in Battery Materials
- Advanced Neural Network Applications
- Power Line Communications and Noise
- Graphene research and applications
- Cellular Automata and Applications
- Advanced Bandit Algorithms Research
- Advanced Graph Neural Networks
- Manufacturing Process and Optimization
- Cryptography and Residue Arithmetic
Democritus University of Thrace
2016-2025
University of Patras
2003-2018
University of Western Macedonia
2010-2012
Foundation for Research and Technology Hellas
2008-2009
National Technical University of Athens
1997-2005
Research Academic Computer Technology Institute
2004-2005
National Polytechnic School
2003
Manufacturing, through the Industry 4.0 concept, is moving to next phase of digitalization. supported by innovative technologies such as Internet Things, Cloud technology, Augmented and Virtual Reality will also play an important role in manufacturing education, supporting advanced life-long training skilled workforce. Advanced called Education 4.0, networked ecosystems develop skills build competences for new era manufacturing. Towards that, this work present how adoption cyber-physical...
Parallel-prefix adders offer a highly efficient solution to the binary addition problem and are well-suited for VLSI implementations. A novel framework is introduced, which allows design of parallel-prefix Ling adders. The proposed approach saves one-logic level implementation compared structures traditional definition carry lookahead equations reduces fanout requirements design. Experimental results reveal that achieve delay reductions up 14 percent when fastest architectures presented equations.
Structured sparsity has been proposed as an efficient way to prune the complexity of Machine Learning (ML) applications and simplify handling sparse data in hardware. Accelerating ML models, whether for training, or inference, heavily relies on matrix multiplications that can be efficiently executed vector processors, custom engines. This work aims integrate simplicity structured into execution speed up corresponding multiplications. Initially, implementation structured-sparse multiplication...
In this work, we propose a new algorithm for designing diminished-1 modulo 2/sup n/+1multipliers. The implementation of the proposed requires n + 3 partial products that are reduced by tree architecture into two summands, which finally added n/+1 adder. multipliers, compared to existing implementations, offer enhanced operation speed and their regular structure allows efficient VLSI implementations.
Two architectures for modulo 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> +1 adders are introduced in this paper. The first one is built around a sparse carry computation unit that computes only some of the carries addition. This approach enabled by introduction inverted circular idempotency property parallel-prefix operator and its regularity area efficiency further enhanced new prefix operator. resulting diminished-1 can be...
In the last decade, Teaching Factories, which enable a two-way knowledge transfer in manufacturing education, have been built up industry and academia. Such initiatives from local to worldwide level help both parties mutually benefit. This paper introduces framework for delivery of industrial learning training young engineers creating at same time prerequisites SMEs explore new technologies through Factory paradigm. particular, this framework, participants will be receivers valuable able...
The need for efficient implementation of simple crossbar schedulers has increased in the recent years due to advent on-chip interconnection networks that require low latency message delivery. core function any scheduler is arbitration resolves conflicting requests same output. Since, delay arbiters directly determine operation speed scheduler, design faster paramount importance. In this paper, we present a new bit-level algorithm and circuit techniques programmable priority offer...
In this paper, a new leading-zero counter (or detector) is presented. New boolean relations for the bits of count are derived that allow their computation to be performed using standard carry-lookahead techniques. Using proposed approach various design choices can explored and different circuit topologies counting unit. The circuits efficiently implemented either in static or dynamic logic require significantly less energy per operation compared already known architectures. integration with...
Scalable Network-on-Chip (NoC) architectures should achieve high-throughput and low-latency operation without exceeding the stringent area/energy constraints of modern Systems-on-Chip (SoC), even when operating under a high clock frequency. Such requirements directly impact NoC routers interfaces comprising architecture. This paper focuses on micro-architecture presents ShortPath, pipelined router architecture that can high-speed implementations by parallelizing as much possible - resorting...
Machine learning adoption has seen a widespread bloom in recent years, with neural network implementations being at the forefront. In light of these developments, vector processors are currently experiencing resurgence interest, due to their inherent amenability accelerate data-parallel algorithms required machine environments. this paper, we propose scalable and high-performance RISC-V processor core. The presented employs triptych novel mechanisms that work synergistically achieve desired...
Approximate computation has evolved recently as a viable alternative for maximizing energy efficiency. One aspect of approximate computing involves the design hardware units that return sufficiently accurate result examined occasion, rather than an result. As long are allowed to compute approximately, they can be designed with multiple new ways. In this work, we focus on synthesis parallel-prefix adders. Instead exploring specific architectures, done by state-of-the-art approaches,...
The acceleration of deep-learning kernels in hardware relies on matrix multiplications that are executed efficiently Systolic Arrays (SA). To effectively trade off training/inference quality with cost, SA accelerators employ reduced-precision Floating-Point (FP) arithmetic. In this work, we demonstrate the need for new pipeline organizations to reduce latency and improve energy efficiency FP operators chained multiply-add operation imposed by structure SA. proposed skewed design reorganizes...
Large systems-on-chip (SoCs) and chip multiprocessors (CMPs), incorporating tens to hundreds of cores, create a significant integration challenge. Interconnecting huge amount architectural modules in an efficient manner, calls for scalable solutions that would offer both high throughput low-latency communication. The switches are the basic building blocks such interconnection networks their design critically affects performance whole system. So far, innovation switch relied mostly...
The efficiency of modern Networks-on-Chip (NoC) is no longer judged solely by their physical scalability, but also ability to deliver high performance, Quality-of-Service (QoS), and flow isolation at the minimum possible cost. Although traditional architectures supporting Virtual Channels (VC) offer resources for partitioning isolation, an adversarial workload can still interfere degrade performance other workloads that are active in a different set VCs. In this paper, we present PhaseNoC,...
As multicore systems transition to the many-core realm, pressure on interconnection network is substantially elevated. The chip (NoC) expected undertake expanding demands of ever-increasing numbers processing elements, while its area/power footprint remains severely constrained. Hence, low-cost NoC designs that achieve high-throughput and low-latency operation are imperative for future scalability. While buffers routers key enablers high performance, they also major consumers area power. In...
The need for higher throughput and lower communication latency in modern networks-on-chip (NoC) has led to low- high-radix topologies that exploit the speed provided by on-chip wires-after appropriate wire engineering-to transfer flits over longer distances a single clock cycle. In this paper, motivated same principle of fast link traversal, we propose RapidLink NoC architecture, which exploits said rapidly between adjacent routers using double-data-rate (DDR) traversals. is enhanced with...
The efficiency of modern Networks-on-Chip (NoC) is no longer judged solely by their physical scalability, but also ability to deliver high performance, Quality-of-Service (QoS), and flow isolation at the minimum possible cost. Although traditional architectures supporting Virtual Channels (VC) offer resources for partitioning isolation, an adversarial workload can still interfere degrade performance other workloads that are active in a different set VCs. In this paper, we present PhaseNoC,...
Convolution is one of the most critical operations in various application domains and its computation should combine high performance with energy efficiency. This requirement both for standard convolution other spatial variants, such as dilated, strided, or transposed convolutions. In this work, we focus on design a streaming engine, called LazyDCstream, that tuned dilated convolution. LazyDCstream utilizes sliding-window architecture input data reuse leverages already-known decomposition...
Two architectures for parallel-prefix modulo 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> - 1 adders are presented in this paper. For large wordlengths we introduce the sparse that achieve significant reduction of wiring complexity without imposing any delay penalty. Then, Ling-carry formulation addition is presented. Ling save one logic level implementation and provide high-speed solutions smaller adder widths, where small. The...