NFDI4DS | UHH-SEMS - Publication Details

Pedro Trancoso

ORCID: 0000-0002-2776-9253

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5043427541

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Interconnection Networks and Systems
Cloud Computing and Resource Management
Distributed and Parallel Computing Systems
Embedded Systems Design Techniques
Distributed systems and fault tolerance
Advanced Neural Network Applications
Advanced Database Systems and Queries
Low-power high-performance VLSI design
Advanced Memory and Neural Computing
Data Management and Algorithms
Algorithms and Data Compression
Radiation Effects in Electronics
Caching and Content Delivery
IoT and Edge/Fog Computing
CCD and CMOS Imaging Sensors
Graph Theory and Algorithms
Generative Adversarial Networks and Image Synthesis
Genomics and Phylogenetic Studies
Peer-to-Peer Network Technologies
Quantum Computing Algorithms and Architecture
Semantic Web and Ontologies
Advanced Image and Video Retrieval Techniques
Green IT and Sustainability

Chalmers University of Technology
2017-2024

University of Cyprus
2010-2020

Gratz College
2020

Cyprus University of Technology
2017

An-Najah National University
2016

Intercollege
2005

University of Illinois Urbana-Champaign
2002-2003

Urbana University
2003

National Center for Supercomputing Applications
2002

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
1993

TERAFLUX: Harnessing dataflow in next generation teradevices

OPENALEX - Publications

Roberto Giorgi Rosa M. Badía François Bodin Albert Cohen Paraskevas Evripidou and 24 more

10.1016/j.micpro.2014.04.001 article EN Microprocessors and Microsystems 2014-04-18

The memory performance of DSS commercial workloads in shared-memory multiprocessors

OPENALEX - Publications

Pedro Trancoso Josep-L. Larriba-Pey Zijian Zhang Josep Torrellas

Although cache-coherent shared-memory multiprocessors are often used to run commercial workloads, little work has been done characterize how well these machines support such workloads. In particular, we do not have much insight into the demands of workloads on memory subsystem machines. this paper, analyze in detail access patterns several queries that representative Decision Support System (DSS) databases. Our analysis shows use differs largely depending database data, namely via indices or...

10.1109/hpca.1997.569680 article EN 2002-11-22

Data-Driven Multithreading Using Conventional Microprocessors

OPENALEX - Publications

Costas Kyriacou Paraskevas Evripidou Pedro Trancoso

This paper describes the data-driven multithreading (DDM) model and how it may be implemented using off-the-shelf microprocessors. Data-driven is a nonblocking execution that tolerates internode latency by scheduling threads for based on data availability. Scheduling availability can used to exploit cache management policies reduce significantly misses. Such include firing thread only if its already placed in cache. We call this policy CacheFlow policy. The core of DDM implementation...

10.1109/tpds.2006.136 article EN IEEE Transactions on Parallel and Distributed Systems 2006-09-08

Fine-grain Parallelism Using Multi-core, Cell/BE, and GPU Systems: Accelerating the Phylogenetic Likelihood Function

OPENALEX - Publications

Frederico Pratas Pedro Trancoso Alexandros Stamatakis Leonel Sousa

We are currently faced with the situation where applications have increasing computational demands and there is a wide selection of parallel processor systems. In this paper we focus on exploiting fine-grain parallelism for demanding bioinformatics application - MrBayes its phylogenetic likelihood functions (PLF) using different architectures. Our experiments compare side-by-side scalability performance achieved general-purpose multi-core processors, cell/BE, graphics units (GPU). The...

10.1109/icpp.2009.30 article EN International Conference on Parallel Processing 2009-09-01

Trends in High-Performance Computing

OPENALEX - Publications

Volodymyr Kindratenko Pedro Trancoso

HPC system architectures are shifting from the traditional clusters of homogeneous nodes to heterogeneous and accelerators. The future high-performance computing (HPC) technologies developed today showcase leadership-class compute systems, supercomputers. These machines usually designed achieve highest possible performance in terms number 64-bit floating-point operations per second (flops). Their architecture has evolved early custom design systems current commodity multisocket, multicore systems.

10.1109/mcse.2011.52 article EN Computing in Science & Engineering 2011-04-26

Dynamic count filters

OPENALEX - Publications

Josep Aguilar-Saborit Pedro Trancoso Víctor Muntés-Mulero Josep-L. Larriba-Pey

Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly change patterns frequently. Counting Filters (CBF) have been proposed overcome this limitation allow for the dynamic evolution of filters. The only approach a compact efficient representation CBF Spectral (SBF).In paper we propose Dynamic Count (DCF) as new space-time CBF. Although DCF does make use memory, it shows be faster more space than...

10.1145/1121995.1122000 article EN ACM SIGMOD Record 2006-03-01

The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices

OPENALEX - Publications

Marco Solinas Rosa M. Badía François Bodin Albert Cohen Paraskevas Evripidou and 21 more

Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable with 1000+ general purpose cores per chip, probably 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future Emerging Technology (FET) large-scale project funded European Union, which addresses at once leveraging dataflow principles. This paper describes...

10.1109/dsd.2013.39 preprint EN 2013-09-01

TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems

OPENALEX - Publications

Kyriakos Stavrou Marios Nikolaides Demos Pavlou Samer Arandi Paraskevas Evripidou and 1 more

In this paper we present thread flux (TFlux), a complete system that supports the data-driven multithreading (DDM) model of execution. TFlux virtualizes any details underlying therefore offering same programming independently architecture. To achieve goal, has runtime support is built on top commodity operating system. Scheduling threads performed by synchronization unit (TSU), which can be implemented either as hardware or software module. addition, includes preprocessor that, along with...

10.1109/icpp.2008.74 article EN 2008-09-01

Hybrid2: Combining Caching and Migration in Hybrid Memory Systems

OPENALEX - Publications

Evangelos Vasilakis Vassilis Papaefstathiou Pedro Trancoso Ioannis Sourdis

This paper considers a hybrid memory system composed of technologies with different characteristics; in particular small, near exhibiting high bandwidth, i.e., 3D-stacked DRAM, and larger, far offering capacity at lower off-chip DRAM. In the past, such has been used either as DRAM cache or part flat address space combined migration mechanism. Caches offer tradeoffs (between performance, main capacity, data transfer costs, etc.) share similar challenges related to data-transfer granularity...

10.1109/hpca47549.2020.00059 article EN 2020-02-01

Evaluation of heterogeneous AIoT Accelerators within VEDLIoT

OPENALEX - Publications

René Griessl Florian Porrmann Nils Kucza K. Mika Jens Hagemeyer and 10 more

Within VEDLIoT, a project targeting the development of energy-efficient Deep Learning for distributed AIoT applications, several accelerator platforms based on technologies like CPUs, embedded GPUs, FPGAs, or specialized ASICs are evaluated. The VEDLIoT approach is modular and scalable cognitive IoT hardware platforms. Modular microserver technology enables integration different, heterogeneous accelerators into one platform. Benchmarking different takes account performance, energy efficiency...

10.23919/date56975.2023.10137021 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2023-04-01

An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs

OPENALEX - Publications

Fareed Qararyah Muhammad Waqar Azhar Pedro Trancoso

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such is having heterogeneity than traditional CNNs. This present at two levels: intra-layer type inter-layer type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers proposed model-specific with dedicated engines. When designing an accelerator...

10.1145/3639823 article EN ACM Transactions on Architecture and Code Optimization 2024-01-08

Thermal-Aware Scheduling for Future Chip Multiprocessors

OPENALEX - Publications

Kyriakos Stavrou Pedro Trancoso

10.1155/2007/48926 article EN EURASIP Journal on Embedded Systems 2007-01-01

Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

OPENALEX - Publications

Frederico Pratas Pedro Trancoso Leonel Sousa Alexandros Stamatakis Guochun Shi and 1 more

10.1016/j.parco.2011.08.002 article EN Parallel Computing 2011-08-14

LLC-Guided Data Migration in Hybrid Memory Systems

OPENALEX - Publications

Evangelos Vasilakis Vassilis Papaefstathiou Pedro Trancoso Ioannis Sourdis

Although 3D-stacked DRAM offers substantially higher bandwidth than commodity DDR DIMMs, it cannot yet provide the necessary capacity to replace bulk of memory. A promising alternative is use flat address space, hybrid memory systems two or more levels, each exhibiting different performance characteristics. One such existing approach employs a near, high memory, placed on top processor die, combined with far, off-chip. Migrating data from far near has significant potential, but also entails...

10.1109/ipdps.2019.00101 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019-05-01

Detailed characterization of a quad Pentium Pro server running TPC-D

OPENALEX - Publications

Qiang Cao Pedro Trancoso Josep-L. Larriba-Pey Josep Torrellas R.L. Knighten and 1 more

While database workloads consume a major fraction of the cycles in today's machines, there are only few public-domain performance studies that characterize detail how these exercise machines. This fact is due to complexity setting up and tuning workloads, high cost equipment required evaluate them, frequent use proprietary systems. In this paper, we help redress problem by presenting detailed characterization TPC-D benchmark running on 4-processor Pentium Pro SMP multiprocessor with Windows...

10.1109/iccd.1999.808414 article EN 2003-01-20

Exploring Graphics Processor Performance for General Purpose Applications

OPENALEX - Publications

Pedro Trancoso Maria Charalambous

Graphics processors are designed to perform many floating-point operations per second. Consequently, they an attractive architecture for high-performance computing at a low cost. Nevertheless, it is still not very clear how exploit all their potential general-purpose applications. In this work we present comprehensive study of the performance application executing on GPU. addition, analyze possibility using graphics card extend life-time computer system. our experiments compare execution...

10.1109/dsd.2005.40 article EN 2022 25th Euromicro Conference on Digital System Design (DSD) 2005-12-22

Thermal-Aware Scheduling: A Solution for Future Chip Multiprocessors Thermal Problems

OPENALEX - Publications

Kyriacos Stavrou Pedro Trancoso

The increased complexity and operating frequency in current microprocessors is resulting a decrease the performance improvements. In order to keep up with expected gains, major manufacturers have started offer chip-multiprocessor architectures. Nevertheless, integration of several cores on same chip leads heat dissipation consequently additional costs, reliability, loss, among others. this paper we propose thermal-aware scheduling (TAS) technique that aims minimize all these problems. When...

10.1109/dsd.2006.88 article EN 2022 25th Euromicro Conference on Digital System Design (DSD) 2006-01-01

Data parallel acceleration of decision support queries using Cell/BE and GPUs

OPENALEX - Publications

Pedro Trancoso Δέσπω Όθωνος Artemakis Artemiou

Decision Support System (DSS) workloads are known to be one of the most time-consuming database that processes large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessor. The topic addressed in this work is analyze benefits high-performance/low-cost processors such as GPUs and Cell/BE accelerate query execution. In order overcome programming effort developing code for different architectures, we explore use a platform, Rapidmind, which offers...

10.1145/1531743.1531763 article EN 2009-05-18

An energy-efficient and error-resilient server ecosystem exceeding conservative scaling limits

OPENALEX - Publications

Georgios Karakonstantis Konstantinos Tovletoglou Lev Mukhanov Hans Vandierendonck Dimitrios S. Nikolopoulos and 21 more

The explosive growth of Internet-connected devices will soon result in a flood generated data, which increase the demand for network bandwidth as well compute power to process data. Consequently, there is need more energy efficient servers empower traditional centralized Cloud data-centers emerging decentralized at Edges Cloud. In this paper, we present our approach, aims developing new class micro-servers - UniServer that exceed conservative and performance scaling boundaries by introducing...

10.23919/date.2018.8342175 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

The impact of speeding up critical sections with data prefetching and forwarding

OPENALEX - Publications

Pedro Trancoso Josep Torrellas

While shared-memory multiprocessing offers a simple model for process synchronization, actual synchronization may be expensive. Indeed, processors have to wait long time acquire the lock of critical section. In addition, processor stall waiting all its pending accesses complete before releasing lock. To address this problem, we target well-known optimization techniques specifically speed-up sections. We reduce taken by sections applying data prefetching and forwarding minimize number misses...

10.1109/icpp.1996.538562 article EN 2002-12-24

Energy-Efficient Runtime Management of Heterogeneous Multicores using Online Projection

OPENALEX - Publications

Stavros Tzilis Pedro Trancoso Ioannis Sourdis

Heterogeneous multicores offer flexibility in the form of different core types and Dynamic Voltage Frequency Scaling (DVFS), defining a vast configuration space. The optimal choice is not always straightforward, even for single applications, becomes very difficult problem dynamically changing scenarios concurrent applications with unpredictable spawn termination times individual performance requirements. This article proposes an integrated approach runtime decision making energy efficiency...

10.1145/3293446 article EN ACM Transactions on Architecture and Code Optimization 2018-12-31

VEDLIoT: Very Efficient Deep Learning in IoT

OPENALEX - Publications

M. Shamim Kaiser René Griessl Nils Kucza C. Haumann Lennart Tigges and 31 more

The VEDLIoT project targets the development of energy-efficient Deep Learning for distributed AIoT applications. A holistic approach is used to optimize algorithms while also dealing with safety and security challenges. based on a modular scalable cognitive IoT hardware platform. Using microserver technology enables user configure satisfy wide range offers complete design flow Next-Generation devices required collaboratively solving complex applications across systems. methods are tested...

10.23919/date54114.2022.9774653 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2022-03-14

Thermal-Aware Scheduling for Future Chip Multiprocessors

OPENALEX - Publications

Kyriakos Stavrou Pedro Trancoso

10.1186/1687-3963-2007-048926 article EN EURASIP Journal on Embedded Systems 2007-01-01

Coming Soon ...