- Parallel Computing and Optimization Techniques
- Interconnection Networks and Systems
- Advanced Memory and Neural Computing
- Embedded Systems Design Techniques
- Distributed and Parallel Computing Systems
- Advanced Neural Network Applications
- Ferroelectric and Negative Capacitance Devices
- Cloud Computing and Resource Management
- IoT and Edge/Fog Computing
- Advanced Data Storage Technologies
- Low-power high-performance VLSI design
- Real-Time Systems Scheduling
- Semiconductor materials and devices
- Neural Networks and Applications
- Advanced Image and Video Retrieval Techniques
- Distributed systems and fault tolerance
- Brain Tumor Detection and Classification
- Semantic Web and Ontologies
- Data Management and Algorithms
- Web Data Mining and Analysis
- Stochastic Gradient Optimization Techniques
- Caching and Content Delivery
- Advanced Database Systems and Queries
- Handwritten Text Recognition Techniques
- Remote-Sensing Image Classification
Southern Illinois University Carbondale
2016-2025
Seattle University
2022
Nvidia (United States)
2022
Ford Motor Company (United States)
2018
National Technical University of Athens
2003-2014
National and Kapodistrian University of Athens
2003-2011
Institute of Communication and Computer Systems
2010-2011
University of the Aegean
2006-2008
In this paper, a new algorithm for vehicle license plate identification is proposed, on the basis of novel adaptive image segmentation technique (sliding concentric windows) and connected component analysis in conjunction with character recognition neural network. The was tested 1334 natural-scene gray-level images different backgrounds ambient illumination. camera focused plate, while angle view distance from varied according to experimental setup. plates properly segmented were 1287 over...
Current research in the area of Neural Networks (NN) has resulted performance advancements for a variety complex problems. Especially, embedded system applications rely more and on utilization convolutional NNs to provide services such as image/audio classification object detection. The core arithmetic computation performed during NN inference is multiply-accumulate (MAC) operation. In order meet tighter throughput constraints, accelerators integrate thousands MAC units resulting significant...
Neural processing units (NPUs) are becoming an integral part in all modern computing systems due to their substantial role accelerating neural networks (NNs). The significant improvements cost-energy-performance stem from the massive array of multiply accumulate (MAC) that remarkably boosts throughput NN inference. In this work, we first investigate thermal challenges NPUs bring, revealing how MAC arrays, which form heart any NPU, impose serious bottlenecks on-chip excessive power densities....
In this paper, we introduce an energy efficient edge computing solution to collaboratively utilize Multi-access Edge Computing (MEC) and Fully Autonomous Aerial Systems (FAAS) support the demands of Internet Things (IoT) nodes residing in Areas Interest (AoIs) executing machine learning tasks. The Satisfaction Games are adopted determine whether nodes' optimal partial task should be offloaded MEC server or a hovering FAAS above AoI. decision is taken by considering IoT latency, consumption,...
Deep Neural Networks (DNNs) have shown significant advantages in a wide variety of domains. However, DNNs are becoming computationally intensive and energy hungry at an exponential pace, while the same time, there is vast demand for running sophisticated DNN-based services on resource constrained embedded devices. In this paper, we target energy-efficient inference DNN accelerators. To that end, propose automated framework to compress hardware-aware manner by jointly employing pruning...
Todays prevalent solutions for modern embedded systems and general computing employ many processing units connected by an on-chip network leaving behind complex superscalar architectures In this paper, we couple the concept of distributed with parallel applications present a workload-aware run-time framework malleable on many-core platforms. The presented is responsible serving in way at run-time, needs applications, maximizing resource utilization avoiding dominating effects taking into...
In this work, we introduce a control variate approximation technique for low error approximate Deep Neural Network (DNN) accelerators. The is used in Monte Carlo methods to achieve variance reduction. Our approach significantly decreases the induced due multiplications DNN inference, without requiring time-exhaustive retraining compared state-of-the-art. Leveraging our method, use highly approximated multipliers generate power-optimized experimental evaluation on six DNNs, Cifar-10 and...
We present GeoLLM-Squad, a geospatial Copilot that introduces the novel multi-agent paradigm to remote sensing (RS) workflows. Unlike existing single-agent approaches rely on monolithic large language models (LLM), GeoLLM-Squad separates agentic orchestration from task-solving, by delegating RS tasks specialized sub-agents. Built open-source AutoGen and GeoLLM-Engine frameworks, our work enables modular integration of diverse applications, spanning urban monitoring, forestry protection,...
The rapid growth of Machine Learning (ML) has increased demand for DNN hardware accelerators, but their embodied carbon footprint poses significant environmental challenges. This paper leverages approximate computing to design sustainable accelerators by minimizing the Carbon Delay Product (CDP). Using gate-level pruning and precision scaling, we generate area-aware multipliers optimize accelerator with a genetic algorithm. Results demonstrate reduced while meeting performance accuracy requirements.
Internet-of-Things (IoT) consists of interconnected devices with sensing, monitoring and processing functionalities that work in a cooperative way to offer services. Smart buildings, self-driving cars, house management, city electricity pollution are some examples where IoT systems have been already deployed. Amongst different kinds IoT, cameras key role, since they can capture rich resourceful content. The number embedded is rising rapidly establishing the term Internet-of-Video Things...
Recent Deep Neural Networks (DNNs) managed to deliver superhuman accuracy levels on many AI tasks. Several applications rely more and DNNs sophisticated services DNN accelerators are becoming integral components of modern systems-on-chips. perform millions arithmetic operations per inference integrate thousands multiply-accumulate units leading increased energy requirements. Approximate computing principles employed significantly lower the consumption at cost some loss. Nevertheless, recent...
Recent breakthroughs in Neural Networks (NNs) have made DNN accelerators ubiquitous and led to an ever-increasing quest on adopting them from Cloud edge computing. However, state-of-the-art pack immense computational power a relatively confined area, inducing significant on-chip densities that lead intolerable thermal bottlenecks. Existing state of the art focuses using approximate multipliers only trade-off efficiency with inference accuracy. In this work, we present thermal-aware...
Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such controller is ought explore hundreds thousands...
We address the problem of custom Dynamic Memory Management (DMM) in Multi-Processor System-on-Chip (MPSoC) architectures. Customization is enabled through definition a design space that captures global, modular and parameterized manner primitive building blocks multi-threaded DMM. A systematic exploration methodology proposed to efficiently traverse space. Customized Pareto DMM configurations are automatically generated development software tools implementing methodology. Experimental...
Modern embedded systems tend to employ a plethora of inter-connected components leaving behind complex superscalar/centralized approaches. This evolution in system architecture has driven rapid changes on the field application development too, by increasing usage and demanding applications. Thus, multi-agent raise challenge efficient resource management especially cases where occur at run-time with pace. paper couples concept techniques order develop distributed framework for multiagent...
Modern applications rely more and on the simultaneous execution of multiple DNNs, Heterogeneous DNN Accelerators (HDAs) prevail as a solution to this trend. In work, we propose, implement, evaluate low precision Neural Processing Units (NPUs) which serve building blocks construct HDAs, address efficient deployment multi-DNN workloads. Moreover, design HDA designs that increase overall throughput, while reducing energy consumption during NN inference. At <italic...
Real-time applications are raising the challenge of unpredictability. This is an extremely difficult problem in context modern, dynamic, multiprocessor platforms which, while providing potentially high performance, make task timing prediction difficult. In this paper, we present a flexible distributed run-time application mapping framework for both homogeneous and heterogeneous multi-core that adapts to application's needs execution restrictions. The novel idea article autonomic management...
Multiprocessor system-on-chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize high number of cores. Current multicore embedded systems exhibit increased levels dynamic behavior, leading unexpected memory footprint variations unknown at design time. Dynamic management (DMM) is promising solution for such types systems. Although some efficient managers been proposed conventional bus-based MPSoC platforms, there no DMM...
Vision-based robotic applications exhibit increased computational complexity. This problem becomes even more important regarding mission critical application domains. The SPARTAN project focuses in the tight and optimal implementation of computer vision algorithms targeting to rover navigation for space applications. For evaluation purposes, these will be implemented with a co-design methodology onto Virtex-6 FPGA device.
Modern vehicles are enhanced with increased computation, communication and sensing capabilities, providing a variety of new features that pave the way for deployment more sophisticated services. Specifically, smart cars employ hundreds sensors electronic systems in order to obtain situational environmental information. This rapid growth on-vehicle multi-sensor inputs along off-vehicle data streams introduce car era. Thus, systematic techniques combining information provided by on-...
Transistor aging is one of the major concerns that challenges designers in advanced technologies. It profoundly degrades reliability circuits during its lifetime as it slows down transistors resulting errors due to timing violations unless large guardbands are included, which leads considerable performance losses. When comes Neural Processing Units (NPUs), where increasing inference speed primary goal, such losses cannot be tolerated. In this work, we first propose a reliability-aware...