NFDI4DS | UHH-SEMS - Publication Details

Synthesis of Platform Architectures from OpenCL Programs

OPENALEX - Publications

Muhsen Owaida Nikolaos Bellas Konstantis Daloukas Christos D. Antonopoulos

The problem of automatically generating hardware modules from a high level representation an application has been at the research forefront in last few years. In this paper, we use OpenCL, industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our architectural synthesis tool, SOpenCL (Silicon-OpenCL), adapts OpenCL into novel design flow which efficiently maps coarse fine-grained parallelism onto FPGA reconfigurable fabric. is...

10.1109/fccm.2011.19 article EN 2011-05-01

GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates

OPENALEX - Publications

Konstantinos Parasyris Georgios Tziantzoulis Christos D. Antonopoulos Nikolaos Bellas

Dependable computing on unreliable substrates is the next challenge community needs to overcome due both manufacturing limitations in low geometries and necessity aggressively minimize power consumption. System designers often need analyze way hardware faults manifest as errors at architectural level how these affect application correctness. This paper introduces GemFI, a fault injection tool based cycle accurate full system simulator Gem5. GemFI provides methods easily extensible support...

10.1109/dsn.2014.96 article EN 2014-06-01

Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors

OPENALEX - Publications

Nikolaos Bellas Ibrahim Hajj George Stamoulis Nikolaos Bellas Constantine D. Polychronopoulos

In this paper we propose a technique that uses an additional mini cache located between the I-Cache and CPU core, buffers instructions are nested within loops continuously otherwise fetched from I-Cache. This mechanism is combined with code modifications, through compiler, greatly simplify required hardware, eliminate unnecessary instruction fetching, consequently reduce signal switching activity dissipated energy.

10.1145/280756.280788 article EN 1998-01-01

Using dynamic cache management techniques to reduce energy in a high-performance processor

OPENALEX - Publications

Nikolaos Bellas I.N. Hajj Constantine D. Polychronopoulos

Article Using dynamic cache management techniques to reduce energy in a high-performance processor Share on Authors: Nikolaos Bellas Department of Electrical & Computer Engineering and the Coordinated Science Laboratory, University Illinois at Urbana-Champaign, 1308 West Main Street, Urbana, IL ILView Profile , Ibrahim Hajj Constantine Polychronopoulos Authors Info Claims ISLPED '99: Proceedings 1999 international symposium Low power electronics designAugust Pages...

10.1145/313817.313856 article EN 1999-01-01

Energy and performance improvements in microprocessor design using a loop cache

OPENALEX - Publications

Nikolaos Bellas Izzat El Hajj Constantine D. Polychronopoulos George Stamoulis

Energy dissipated in on-chip caches represents a substantial portion the energy budget of today's processors. Extrapolating current trends, this is likely to increase near future, since devices devoted occupy an increasingly larger percentage total area chip. We extend work proposed by J. Kin et al. (1997), which extra, small cache (called filter cache) inserted between CPU data path and L1 serves most references initiated from CPU. In our scheme, compiler used generate code that exploits...

10.1109/iccd.1999.808570 article EN 2003-01-20

Towards automatic significance analysis for approximate computing

OPENALEX - Publications

Vassilis Vassiliadis Jan Riehme Jens Deussen Konstantinos Parasyris Christos D. Antonopoulos and 3 more

Several applications may trade-off output quality for energy efficiency by computing only an approximation of their output. Current approaches to software-based approximate often require the programmer specify parts code or data structures that can be approximated. A largely unaddressed challenge is how automate analysis significance quality. To this end, we propose a methodology and toolset automatic analysis. We use interval arithmetic algorithmic differentiation in our profile-driven yet...

10.1145/2854038.2854058 article EN 2016-02-29

Architectural and compiler techniques for energy reduction in high-performance microprocessors

OPENALEX - Publications

Nikolaos Bellas I.N. Hajj Constantine D. Polychronopoulos George Stamoulis

In this paper, we focus on low-power design techniques for high-performance processors at the architectural and compiler levels. We mainly developing methods reducing energy dissipated in on-chip caches. Energy caches represents a substantial portion budget of today's processors. Extrapolating current trends, is likely to increase near future, since devices devoted occupy an increasingly larger percentage total area chip. propose method that uses additional minicache located between I-Cache...

10.1109/92.845897 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2000-06-01

OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

OPENALEX - Publications

Konstantinos F. Krommydas Wu-chun Feng Christos D. Antonopoulos Nikolaos Bellas

10.1007/s11265-015-1051-z article EN Journal of Signal Processing Systems 2015-10-06

The Impact of CPU Voltage Margins on Power-Constrained Execution

OPENALEX - Publications

Panos Koutsovasilis Christos D. Antonopoulos Nikolaos Bellas Spyros Lalis George N. Papadimitriou and 2 more

CPUs typically operate at a voltage which is higher than what strictly required, using margins to account for process variability and anticipate any combination of adverse operating conditions. However, these worst-case scenarios occur rarely, if ever, thus the overly pessimistic resulting in excessive power dissipation leads decreased performance under capping. In this paper, we investigate impact reducing beyond nominal level on efficiency CPU capping mechanisms, three commercial systems,...

10.1109/tsusc.2020.3045195 article EN IEEE Transactions on Sustainable Computing 2020-12-17

Fluidity: Providing flexible deployment and adaptation policy experimentation for serverless and distributed applications spanning cloud–edge–mobile environments

OPENALEX - Publications

Foivos Pournaropoulos Alexandros Patras Christos D. Antonopoulos Nikolaos Bellas Spyros Lalis

We introduce Fluidity, a framework enabling the flexible and adaptive deployment of serverless modular applications in systems comprising cloud, edge, mobile nodes. Based on declarative description application requirements, custom placement policy, formal system infrastructure description, Fluidity plans executes an initial components cloud–edge-mobile continuum. Furthermore, at runtime, monitors resource availability position nodes, adapts accordingly, without any manual intervention from...

10.1016/j.future.2024.03.031 article EN cc-by Future Generation Computer Systems 2024-03-21

Real-Time Fisheye Lens Distortion Correction Using Automatically Generated Streaming Accelerators

OPENALEX - Publications

Nikolaos Bellas Sek Chai Malcolm Dwyer Dan Linzmeier

Fisheye lenses are often used in scientific or virtual reality applications to enlarge the field of view a conventional camera. lens distortion correction is an image processing application which transforms distorted fisheye images back natural-looking perspective space. This characterized by non-linear streaming memory access patterns that make main bandwidth key performance limiter. We have developed system on custom board includes Xilinx Virtex-4 FPGA. express high level language, and we...

10.1109/fccm.2009.16 article EN 2009-01-01

A programming model and runtime system for significance-aware energy-efficient computing

OPENALEX - Publications

Vassilis Vassiliadis Konstantinos Parasyris Charalampos Chalios Christos D. Antonopoulos Spyros Lalis and 3 more

We introduce a task-based programming model and runtime system that exploit the observation not all parts of program are equally significant for accuracy end-result, in order to trade off quality outputs increased energy-efficiency. This is done structured flexible way, allowing easy exploitation different points quality/energy space, without adversely affecting application performance. The can apply number policies decide whether it will execute less-significant tasks accurately or...

10.1145/2688500.2688546 article EN 2015-01-24

On the characterization of OpenCL dwarfs on fixed and reconfigurable platforms

OPENALEX - Publications

Konstantinos F. Krommydas Wu-chun Feng Muhsen Owaida Christos D. Antonopoulos Nikolaos Bellas

The proliferation of heterogeneous computing platforms presents the parallel community with new challenges. One such challenge entails evaluating efficacy architectures and identifying architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks capture execution patterns (i.e., dwarfs or motifs) applications, both present future, in order to guide future hardware design. Furthermore, desire a common programming model for facilitates code...

10.1109/asap.2014.6868650 article EN 2014-06-01

FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators

OPENALEX - Publications

Nikolaos Bellas Sek Chai Malcolm Dwyer Dan Linzmeier

Modern FPGA platforms provide the hardware and software infrastructure for building a bus-based system on chip (SoC) that meet applications requirements. The designer can customize by selecting from large number of pre-defined peripherals fixed IP functions providing new hardware, typically expressed using RTL. Hardware accelerators application-specific extensions to computational capabilities is an efficient mechanism enhance performance reduce power dissipation. What missing integrated...

10.5555/1898953.1899132 article EN International Parallel and Distributed Processing Symposium 2006-04-25

FPGA implementation of a license plate recognition SoC using automatically generated streaming accelerators

OPENALEX - Publications

Nikolaos Bellas Shaoming Chai Malcolm Dwyer Dan Linzmeier

Modern FPGA platforms provide the hardware and software infrastructure for building a bus-based system on chip (SoC) that meet applications requirements. The designer can customize by selecting from large number of pre-defined peripherals fixed IP functions providing new hardware, typically expressed using RTL. Hardware accelerators application-specific extensions to computational capabilities is an efficient mechanism enhance performance reduce power dissipation. What missing integrated...

10.1109/ipdps.2006.1639437 article EN 2006-01-01

Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case

OPENALEX - Publications

Gabriel Falcão Muhsen Owaida David Novo Madhura Purnaprajna Nikolaos Bellas and 4 more

Hardware designers and engineers typically need to explore a multi-parametric design space in order find the best configuration for their designs using simulations that can take weeks months complete. For example, of special purpose chips parameters such as optimal bit width data representation. This is case development complex algorithms Low-Density Parity-Check (LDPC) decoders used modern communication systems. Currently, high-performance computing offers wide set acceleration options,...

10.1109/fccm.2012.46 article EN 2012-04-01

Implementation of the AVS video decoder on a heterogeneous dual-core SIMD processor

OPENALEX - Publications

Maria Koziri Dimitrios Zacharis Ioannis Katsavounidis Nikolaos Bellas

Multi-core Application Specific Instruction Processors (ASIPs) are increasingly used in multimedia applications due to their high performance and programmability. Nonetheless, efficient use requires extensive modifications the initial code order exploit features of underlying architecture. In this paper, through example implementing Advance Video Coding (AVS) a heterogeneous dual-core SIMD processor, we present guide developers who wish perform task-level decomposition any video decoder...

10.1109/tce.2011.5955207 article EN IEEE Transactions on Consumer Electronics 2011-05-01

Exploiting Significance of Computations for Energy-Constrained Approximate Computing

OPENALEX - Publications

Vassilis Vassiliadis Charalampos Chalios Konstantinos Parasyris Christos D. Antonopoulos Spyros Lalis and 3 more

10.1007/s10766-016-0409-6 article EN International Journal of Parallel Programming 2016-03-24

Using dynamic cache management techniques to reduce energy in general purpose processors

OPENALEX - Publications

Nikolaos Bellas I.N. Hajj Constantine D. Polychronopoulos

The memory hierarchy of high-performance and embedded processors has been shown to be one the major energy consumers. For example, Level-1 (L1) instruction cache (I-Cache) StrongARM processor accounts for 27% power dissipation whole chip, whereas fetch unit (IFU) I-Cache Intel's Pentium Pro are single most important consuming modules with 14% total [2]. Extrapolating current trends, this portion is likely increase in near future, since devices devoted caches occupy an increasingly larger...

10.1109/92.902264 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2000-12-01

Fisheye lens distortion correction on multicore and hardware accelerator platforms

OPENALEX - Publications

Konstantis Daloukas Christos D. Antonopoulos Nikolaos Bellas Sek Chai

Wide-angle (fisheye) lenses are often used in virtual reality and computer vision applications to widen the field of view conventional cameras. Those lenses, however, distort images. For most real-world video stream needs be transformed, at real-time (20 frames/sec or better), back natural-looking, central perspective space. This paper presents implementation, optimization characterization a fisheye lens distortion correction application on three platforms: conventional, homogeneous...

10.1109/ipdps.2010.5470360 article EN 2010-01-01

An energy-efficient and error-resilient server ecosystem exceeding conservative scaling limits

OPENALEX - Publications

Georgios Karakonstantis Konstantinos Tovletoglou Lev Mukhanov Hans Vandierendonck Dimitrios S. Nikolopoulos and 21 more

The explosive growth of Internet-connected devices will soon result in a flood generated data, which increase the demand for network bandwidth as well compute power to process data. Consequently, there is need more energy efficient servers empower traditional centralized Cloud data-centers emerging decentralized at Edges Cloud. In this paper, we present our approach, aims developing new class micro-servers - UniServer that exceed conservative and performance scaling boundaries by introducing...

10.23919/date.2018.8342175 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

A programming model and runtime system for significance-aware energy-efficient computing

OPENALEX - Publications

Vassilis Vassiliadis Konstantinos Parasyris Charalampos Chalios Christos D. Antonopoulos Spyros Lalis and 3 more

We introduce a task-based programming model and runtime system that exploit the observation not all parts of program are equally significant for accuracy end-result, in order to trade off quality outputs increased energy-efficiency. This is done structured flexible way, allowing easy exploitation different points quality/energy space, without adversely affecting application performance. The can apply number policies decide whether it will execute less-significant tasks accurately or...

10.1145/2858788.2688546 article EN ACM SIGPLAN Notices 2015-01-24

A Framework for Evaluating Software on Reduced Margins Hardware

OPENALEX - Publications

Konstantinos Parasyris Panos Koutsovasilis Vassilis Vassiliadis Christos D. Antonopoulos Nikolaos Bellas and 1 more

To improve power efficiency, researchers are experimenting with dynamically adjusting the voltage and frequency margins of systems to just above minimum required for reliable operation. Traditionally, manufacturers did not allow reducing these margins. Consequently, existing studies use system simulators, or software fault-injection methodologies, which slow, inaccurate cannot be applied on realistic workloads. However recent CPUs operation outside nominal voltage/frequency envelope. We...

10.1109/dsn.2018.00043 article EN 2018-06-01

Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs

OPENALEX - Publications

Muhsen Owaida Gabriel Falcão João Andrade Christos D. Antonopoulos Nikolaos Bellas and 5 more

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric space exploration optimization, followed by verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth data representation, through time-consuming Monte Carlo simulations. A prominent example this simulation-based process the decoders error correcting systems, Low-Density Parity-Check (LDPC) codes...

10.1145/2656207 article EN ACM Transactions on Embedded Computing Systems 2015-02-17

Reconfigurable System-on-Chip Architectures for Robust Visual SLAM on Humanoid Robots

OPENALEX - Publications

Maria Rafaela Gkeka Alexandros Patras Nikolaos Tavoularis Stylianos Piperakis Emmanouil Hourdakis and 4 more

Visual Simultaneous Localization and Mapping (vSLAM) is the method of employing an optical sensor to map robot’s observable surroundings while also identifying pose in relation that map. The accuracy speed vSLAM calculations can have a very significant impact on performance effectiveness subsequent tasks need be executed by robot, making it key building component for current robotic designs. application area humanoid robotics particularly difficult due unsteady locomotion. This paper...

10.1145/3570210 article EN ACM Transactions on Embedded Computing Systems 2022-11-09