- Parallel Computing and Optimization Techniques
- Cloud Computing and Resource Management
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Low-power high-performance VLSI design
- Radiation Effects in Electronics
- Robotics and Sensor-Based Localization
- Advanced Vision and Imaging
- Computational Geometry and Mesh Generation
- Computer Graphics and Visualization Techniques
- Algorithms and Data Compression
- IoT and Edge/Fog Computing
- Video Coding and Compression Technologies
- Advanced Memory and Neural Computing
- Advancements in Semiconductor Devices and Circuit Design
- Semiconductor materials and devices
- Robotic Path Planning Algorithms
- Augmented Reality Applications
- VLSI and FPGA Design Techniques
- Graph Theory and Algorithms
- Error Correcting Code Techniques
- Ferroelectric and Negative Capacitance Devices
- Real-Time Systems Scheduling
University of Thessaly
2015-2024
University of Patras
2000-2017
Centre for Research and Technology Hellas
2015-2017
Hellenic Open University
2014
Sorbonne Université
2013
Technological Educational Institute of Western Greece
2013
Laboratoire de Recherche en Informatique de Paris 6
2013
Instituto de Telecomunicações
2012
University of Coimbra
2012
Intracom Telecom (Greece)
2010
With high-end systems featuring multicore/multithreaded processors and high component density, power-aware high-performance multithreading libraries become a critical element of the system software stack. Online power performance adaptation multithreaded code from within user-level runtime is relatively new unexplored area research. We present library framework for nearly optimal online codes low-power, execution. Our operates by regulating concurrency changing processors/threads...
Computing has recently reached an inflection point with the introduction of multi-core processors. On-chip thread-level parallelism is doubling approximately every other year. Concurrency lends itself naturally to allowing a program trade performance for power savings by regulating number active cores, however in several domains users are unwilling sacrifice save power. We present prediction model identifying energy-efficient operating points concurrency well-tuned multithreaded scientific...
The problem of automatically generating hardware modules from a high level representation an application has been at the research forefront in last few years. In this paper, we use OpenCL, industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our architectural synthesis tool, SOpenCL (Silicon-OpenCL), adapts OpenCL into novel design flow which efficiently maps coarse fine-grained parallelism onto FPGA reconfigurable fabric. is...
We present Streamflow, a new multithreaded memory manager designed for low overhead, high-performance allocation while transparently favoring locality. Streamflow enables over-head simultaneous by multiple threads and adapts to sequential at speeds comparable that of custom allocators. It favors the transparent exploitation temporal spatial object access locality, reduces allocator-induced cache conflicts false sharing, all using unified design based on segregated heaps. introduces an...
Dependable computing on unreliable substrates is the next challenge community needs to overcome due both manufacturing limitations in low geometries and necessity aggressively minimize power consumption. System designers often need analyze way hardware faults manifest as errors at architectural level how these affect application correctness. This paper introduces GemFI, a fault injection tool based cycle accurate full system simulator Gem5. GemFI provides methods easily extensible support...
This paper addresses the problem of orchestrating and scheduling parallelism at multiple levels granularity on heterogeneous multicore processors. We present mechanisms policies for adaptive exploitation layered Cell Broadband Engine. Our combine event-driven task with malleable loop-level parallelism, which is exploited from runtime system whenever task-level leaves idle cores. a scheduler applications investigate its performance RAxML, an application infers large phylogenetic trees, using...
Several applications may trade-off output quality for energy efficiency by computing only an approximation of their output. Current approaches to software-based approximate often require the programmer specify parts code or data structures that can be approximated. A largely unaddressed challenge is how automate analysis significance quality. To this end, we propose a methodology and toolset automatic analysis. We use interval arithmetic algorithmic differentiation in our profile-driven yet...
Computational phylogeny is a challenging application even for the most powerful supercomputers. It also an ideal candidate benchmarking emerging multiprocessor architectures, because it exhibits fine- and coarse-grain parallelism at multiple levels. In this paper, we present porting, optimization, evaluation of RAxML on cell broadband engine. provably efficient, hill climbing algorithm computing phylogenetic trees, based maximum likelihood (ML) method. The engine, heterogeneous multi-core...
With the latest high-end computing nodes combining shared-memory multiprocessing with hardware multithreading, new scheduling policies are necessary for workloads consisting of multithreaded applications. The use hybrid multiprocessors presents schedulers problem job pairing, i.e. deciding which specific jobs can share each processor minimum performance penalty, by running on different execution contexts. Therefore, expected to decide not only mix will execute simultaneously across...
The bus that connects processors to memory is known be a major architectural bottleneck in SMPs. However, both software and scheduling policies for these systems generally focus on hierarchy optimizations do not address the bandwidth limitations directly. We first present experimental results which indicate saturation can cause an up almost three-fold slowdown applications. Motivated by results, we introduce two take into account consumption of necessary information provided performance...
CPUs typically operate at a voltage which is higher than what strictly required, using margins to account for process variability and anticipate any combination of adverse operating conditions. However, these worst-case scenarios occur rarely, if ever, thus the overly pessimistic resulting in excessive power dissipation leads decreased performance under capping. In this paper, we investigate impact reducing beyond nominal level on efficiency CPU capping mechanisms, three commercial systems,...
We introduce Fluidity, a framework enabling the flexible and adaptive deployment of serverless modular applications in systems comprising cloud, edge, mobile nodes. Based on declarative description application requirements, custom placement policy, formal system infrastructure description, Fluidity plans executes an initial components cloud–edge-mobile continuum. Furthermore, at runtime, monitors resource availability position nodes, adapts accordingly, without any manual intervention from...
We introduce a task-based programming model and runtime system that exploit the observation not all parts of program are equally significant for accuracy end-result, in order to trade off quality outputs increased energy-efficiency. This is done structured flexible way, allowing easy exploitation different points quality/energy space, without adversely affecting application performance. The can apply number policies decide whether it will execute less-significant tasks accurately or...
The proliferation of heterogeneous computing platforms presents the parallel community with new challenges. One such challenge entails evaluating efficacy architectures and identifying architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks capture execution patterns (i.e., dwarfs or motifs) applications, both present future, in order to guide future hardware design. Furthermore, desire a common programming model for facilitates code...
Hardware designers and engineers typically need to explore a multi-parametric design space in order find the best configuration for their designs using simulations that can take weeks months complete. For example, of special purpose chips parameters such as optimal bit width data representation. This is case development complex algorithms Low-Density Parity-Check (LDPC) decoders used modern communication systems. Currently, high-performance computing offers wide set acceleration options,...
The explosive growth of Internet-connected devices will soon result in a flood generated data, which increase the demand for network bandwidth as well compute power to process data. Consequently, there is need more energy efficient servers empower traditional centralized Cloud data-centers emerging decentralized at Edges Cloud. In this paper, we present our approach, aims developing new class micro-servers - UniServer that exceed conservative and performance scaling boundaries by introducing...
Given the importance of parallel mesh generation in large-scale scientific applications and proliferation multilevel SMT-based architectures, it is imperative to obtain insight on interaction between meshing algorithms these systems. We focus Parallel Constrained Delaunay Mesh (PCDM) generation. exploit coarse-grain parallelism at subdomain level fine-grain element level. This multigrain data approach targets clusters built from low-end, commercially available SMTs. Our experimental...
Wide-angle (fisheye) lenses are often used in virtual reality and computer vision applications to widen the field of view conventional cameras. Those lenses, however, distort images. For most real-world video stream needs be transformed, at real-time (20 frames/sec or better), back natural-looking, central perspective space. This paper presents implementation, optimization characterization a fisheye lens distortion correction application on three platforms: conventional, homogeneous...
We introduce a task-based programming model and runtime system that exploit the observation not all parts of program are equally significant for accuracy end-result, in order to trade off quality outputs increased energy-efficiency. This is done structured flexible way, allowing easy exploitation different points quality/energy space, without adversely affecting application performance. The can apply number policies decide whether it will execute less-significant tasks accurately or...