Pedro Trancoso

ORCID: 0000-0002-2776-9253
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Cloud Computing and Resource Management
  • Distributed and Parallel Computing Systems
  • Embedded Systems Design Techniques
  • Distributed systems and fault tolerance
  • Advanced Neural Network Applications
  • Advanced Database Systems and Queries
  • Low-power high-performance VLSI design
  • Advanced Memory and Neural Computing
  • Data Management and Algorithms
  • Algorithms and Data Compression
  • Radiation Effects in Electronics
  • Caching and Content Delivery
  • IoT and Edge/Fog Computing
  • CCD and CMOS Imaging Sensors
  • Graph Theory and Algorithms
  • Generative Adversarial Networks and Image Synthesis
  • Genomics and Phylogenetic Studies
  • Peer-to-Peer Network Technologies
  • Quantum Computing Algorithms and Architecture
  • Semantic Web and Ontologies
  • Advanced Image and Video Retrieval Techniques
  • Green IT and Sustainability

Chalmers University of Technology
2017-2024

University of Cyprus
2010-2020

Gratz College
2020

Cyprus University of Technology
2017

An-Najah National University
2016

Intercollege
2005

University of Illinois Urbana-Champaign
2002-2003

Urbana University
2003

National Center for Supercomputing Applications
2002

Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
1993

Although cache-coherent shared-memory multiprocessors are often used to run commercial workloads, little work has been done characterize how well these machines support such workloads. In particular, we do not have much insight into the demands of workloads on memory subsystem machines. this paper, analyze in detail access patterns several queries that representative Decision Support System (DSS) databases. Our analysis shows use differs largely depending database data, namely via indices or...

10.1109/hpca.1997.569680 article EN 2002-11-22

This paper describes the data-driven multithreading (DDM) model and how it may be implemented using off-the-shelf microprocessors. Data-driven is a nonblocking execution that tolerates internode latency by scheduling threads for based on data availability. Scheduling availability can used to exploit cache management policies reduce significantly misses. Such include firing thread only if its already placed in cache. We call this policy CacheFlow policy. The core of DDM implementation...

10.1109/tpds.2006.136 article EN IEEE Transactions on Parallel and Distributed Systems 2006-09-08

We are currently faced with the situation where applications have increasing computational demands and there is a wide selection of parallel processor systems. In this paper we focus on exploiting fine-grain parallelism for demanding bioinformatics application - MrBayes its phylogenetic likelihood functions (PLF) using different architectures. Our experiments compare side-by-side scalability performance achieved general-purpose multi-core processors, cell/BE, graphics units (GPU). The...

10.1109/icpp.2009.30 article EN International Conference on Parallel Processing 2009-09-01

HPC system architectures are shifting from the traditional clusters of homogeneous nodes to heterogeneous and accelerators. The future high-performance computing (HPC) technologies developed today showcase leadership-class compute systems, supercomputers. These machines usually designed achieve highest possible performance in terms number 64-bit floating-point operations per second (flops). Their architecture has evolved early custom design systems current commodity multisocket, multicore systems.

10.1109/mcse.2011.52 article EN Computing in Science & Engineering 2011-04-26

Bloom filters are not able to handle deletes and inserts on multisets over time. This is important in many situations when streamed data evolve rapidly change patterns frequently. Counting Filters (CBF) have been proposed overcome this limitation allow for the dynamic evolution of filters. The only approach a compact efficient representation CBF Spectral (SBF).In paper we propose Dynamic Count (DCF) as new space-time CBF. Although DCF does make use memory, it shows be faster more space than...

10.1145/1121995.1122000 article EN ACM SIGMOD Record 2006-03-01

Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable with 1000+ general purpose cores per chip, probably 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a Future Emerging Technology (FET) large-scale project funded European Union, which addresses at once leveraging dataflow principles. This paper describes...

10.1109/dsd.2013.39 preprint EN 2013-09-01

In this paper we present thread flux (TFlux), a complete system that supports the data-driven multithreading (DDM) model of execution. TFlux virtualizes any details underlying therefore offering same programming independently architecture. To achieve goal, has runtime support is built on top commodity operating system. Scheduling threads performed by synchronization unit (TSU), which can be implemented either as hardware or software module. addition, includes preprocessor that, along with...

10.1109/icpp.2008.74 article EN 2008-09-01

This paper considers a hybrid memory system composed of technologies with different characteristics; in particular small, near exhibiting high bandwidth, i.e., 3D-stacked DRAM, and larger, far offering capacity at lower off-chip DRAM. In the past, such has been used either as DRAM cache or part flat address space combined migration mechanism. Caches offer tradeoffs (between performance, main capacity, data transfer costs, etc.) share similar challenges related to data-transfer granularity...

10.1109/hpca47549.2020.00059 article EN 2020-02-01

Within VEDLIoT, a project targeting the development of energy-efficient Deep Learning for distributed AIoT applications, several accelerator platforms based on technologies like CPUs, embedded GPUs, FPGAs, or specialized ASICs are evaluated. The VEDLIoT approach is modular and scalable cognitive IoT hardware platforms. Modular microserver technology enables integration different, heterogeneous accelerators into one platform. Benchmarking different takes account performance, energy efficiency...

10.23919/date56975.2023.10137021 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2023-04-01

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such is having heterogeneity than traditional CNNs. This present at two levels: intra-layer type inter-layer type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers proposed model-specific with dedicated engines. When designing an accelerator...

10.1145/3639823 article EN ACM Transactions on Architecture and Code Optimization 2024-01-08

10.1155/2007/48926 article EN EURASIP Journal on Embedded Systems 2007-01-01

Although 3D-stacked DRAM offers substantially higher bandwidth than commodity DDR DIMMs, it cannot yet provide the necessary capacity to replace bulk of memory. A promising alternative is use flat address space, hybrid memory systems two or more levels, each exhibiting different performance characteristics. One such existing approach employs a near, high memory, placed on top processor die, combined with far, off-chip. Migrating data from far near has significant potential, but also entails...

10.1109/ipdps.2019.00101 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2019-05-01

While database workloads consume a major fraction of the cycles in today's machines, there are only few public-domain performance studies that characterize detail how these exercise machines. This fact is due to complexity setting up and tuning workloads, high cost equipment required evaluate them, frequent use proprietary systems. In this paper, we help redress problem by presenting detailed characterization TPC-D benchmark running on 4-processor Pentium Pro SMP multiprocessor with Windows...

10.1109/iccd.1999.808414 article EN 2003-01-20

Graphics processors are designed to perform many floating-point operations per second. Consequently, they an attractive architecture for high-performance computing at a low cost. Nevertheless, it is still not very clear how exploit all their potential general-purpose applications. In this work we present comprehensive study of the performance application executing on GPU. addition, analyze possibility using graphics card extend life-time computer system. our experiments compare execution...

10.1109/dsd.2005.40 article EN 2022 25th Euromicro Conference on Digital System Design (DSD) 2005-12-22

The increased complexity and operating frequency in current microprocessors is resulting a decrease the performance improvements. In order to keep up with expected gains, major manufacturers have started offer chip-multiprocessor architectures. Nevertheless, integration of several cores on same chip leads heat dissipation consequently additional costs, reliability, loss, among others. this paper we propose thermal-aware scheduling (TAS) technique that aims minimize all these problems. When...

10.1109/dsd.2006.88 article EN 2022 25th Euromicro Conference on Digital System Design (DSD) 2006-01-01

Decision Support System (DSS) workloads are known to be one of the most time-consuming database that processes large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessor. The topic addressed in this work is analyze benefits high-performance/low-cost processors such as GPUs and Cell/BE accelerate query execution. In order overcome programming effort developing code for different architectures, we explore use a platform, Rapidmind, which offers...

10.1145/1531743.1531763 article EN 2009-05-18

The explosive growth of Internet-connected devices will soon result in a flood generated data, which increase the demand for network bandwidth as well compute power to process data. Consequently, there is need more energy efficient servers empower traditional centralized Cloud data-centers emerging decentralized at Edges Cloud. In this paper, we present our approach, aims developing new class micro-servers - UniServer that exceed conservative and performance scaling boundaries by introducing...

10.23919/date.2018.8342175 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2018-03-01

While shared-memory multiprocessing offers a simple model for process synchronization, actual synchronization may be expensive. Indeed, processors have to wait long time acquire the lock of critical section. In addition, processor stall waiting all its pending accesses complete before releasing lock. To address this problem, we target well-known optimization techniques specifically speed-up sections. We reduce taken by sections applying data prefetching and forwarding minimize number misses...

10.1109/icpp.1996.538562 article EN 2002-12-24

Heterogeneous multicores offer flexibility in the form of different core types and Dynamic Voltage Frequency Scaling (DVFS), defining a vast configuration space. The optimal choice is not always straightforward, even for single applications, becomes very difficult problem dynamically changing scenarios concurrent applications with unpredictable spawn termination times individual performance requirements. This article proposes an integrated approach runtime decision making energy efficiency...

10.1145/3293446 article EN ACM Transactions on Architecture and Code Optimization 2018-12-31

The VEDLIoT project targets the development of energy-efficient Deep Learning for distributed AIoT applications. A holistic approach is used to optimize algorithms while also dealing with safety and security challenges. based on a modular scalable cognitive IoT hardware platform. Using microserver technology enables user configure satisfy wide range offers complete design flow Next-Generation devices required collaboratively solving complex applications across systems. methods are tested...

10.23919/date54114.2022.9774653 article EN Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015 2022-03-14

10.1186/1687-3963-2007-048926 article EN EURASIP Journal on Embedded Systems 2007-01-01
Coming Soon ...