- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Interconnection Networks and Systems
- Distributed and Parallel Computing Systems
- Distributed systems and fault tolerance
- Radiation Effects in Electronics
- Advanced Software Engineering Methodologies
- Real-Time Systems Scheduling
- Software System Performance and Reliability
- Advanced Data Storage Technologies
- VLSI and Analog Circuit Testing
- CCD and CMOS Imaging Sensors
- Context-Aware Activity Recognition Systems
- Low-power high-performance VLSI design
- Formal Methods in Verification
- Cloud Computing and Resource Management
- Logic, programming, and type systems
- VLSI and FPGA Design Techniques
- Software Reliability and Analysis Research
- Opportunistic and Delay-Tolerant Networks
- Bluetooth and Wireless Communication Technologies
- Software Testing and Debugging Techniques
- Software Engineering Research
- Business Process Modeling and Analysis
- Digital Filter Design and Implementation
University of Lisbon
2007-2023
Hospital de Santo António
2023
Institute for Biotechnology and Bioengineering
2023
Brazilian Research in Intensive Care Network
2023
Universidade do Porto
2021
Institute for Systems Engineering and Computers
2009-2021
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
2008-2020
Pontifical Catholic University of Rio de Janeiro
2020
Universidade de São Paulo
2020
University of Southern California
2005-2017
We present here a report produced by workshop on ‘Addressing failures in exascale computing’ held Park City, Utah, 4–11 August 2012. The charter of this was to establish common taxonomy about resilience across all the levels computing system, discuss existing knowledge various hardware and software layers an build those results, examining potential solutions from both perspective focusing combined approach. brought together participants with expertise applications, system software, hardware;...
Article Free Access Share on Mapping irregular applications to DIVA, a PIM-based data-intensive architecture Authors: Mary Hall USC Information Sciences Institute, Marina del Rey, CA CAView Profile , Peter Kogge University of Notre Dame, IN INView Jeff Koller Pedro Diniz Jacqueline Chame Draper LaCoss John Granacki Jay Brockman Apoorv Srivastava William Athas Vincent Freeh Jaewook Shin Joonseok Park Authors Info & Claims SC '99: Proceedings the 1999 ACM/IEEE conference SupercomputingJanuary...
This article presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity views the computation as composed of operations on objects. It then analyzes program at this granularity to discover when commute (i.e., generate same final result regardless order in which they execute). If all required perform given commute, compiler can parallel code. We have implemented prototype...
The development of applications for high-performance embedded systems is typically a long and error-prone process. In addition to the required functions, developers must consider various often conflicting non-functional application requirements such as performance energy efficiency. complexity this process exacerbated by multitude target architectures associated retargetable mapping tools. This paper introduces an As-pect-Oriented Programming (AOP) approach that conveys domain knowledge...
The current practice of mapping computations to custom hardware implementations requires programmers assume the role designers. In tuning performance their implementation, designers manually apply loop transformations such as unrolling. transformations. For example, unrolling is used expose instruction-level parallelism at expense more resources for concurrent operator evaluation. Because also increases amount data a computation requires, too much can lead memory bound implementation where...
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically different execution environments. A compiler uses feedback produces several versions of the same source code; each version optimization policy. The generated code alternately performs sampling phases and production phases. Each phase measures overhead in current environment. with least previous phase. computation periodically resamples adjust changes environment.We have implemented context...
This paper presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity views the computation as composed of operations on objects. It then analyzes program at this granularity to discover when commute (i.e. generate same final result regardless order in which they execute). If all required perform given commute, compiler can parallel code.We have implemented prototype compilation...
Context: Data-intensive systems, a.k.a. big data systems (BDS), are software that handle a large volume of in the presence performance quality attributes, such as scalability and availability. Before advent management (e.g. Cassandra) frameworks Spark), organizations had to cope with volumes custom-tailored solutions. In particular, decade ago, Tecgraf/PUC-Rio developed system monitor truck fleet real-time proactively detect events from positioning received. Over years, evolved into complex...
Summary The development of applications for high‐performance embedded systems is a long and error‐prone process because in addition to the required functionality, developers must consider various often conflicting nonfunctional requirements such as performance and/or energy efficiency. complexity this further exacerbated by multitude target architectures mapping tools. This article describes LARA, an aspect‐oriented programming language that allows programmers convey domain‐specific...
Commercially available behavioral synthesis tools do not adequately support FPGA vendor-specific external memory interfaces making it extremely difficult to exploit pipelined access modes as well application specific operations scheduling critical for high-performance solutions. This lack of substantially increases the complexity and burden on designers in mapping applications FPGA-based computing engines. In this paper we address problem interfacing aggressive by proposing a decoupled...
Mapping computations written in high-level programming languages to FPGA-based computing engines requires programmers create the datapath responsible for core of computation as well control structures generate appropriate signals orchestrate its execution. This paper addresses issue automatic generation data storage and reconfigurable using existing compiler dependence analysis techniques. We describe a set parameterizable used target our prototype compiler. present algorithm derive...
As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations deliver efficient software. This paper describes a new framework for and set transformations programs that implement critical sections using mutual exclusion locks. These allow compiler move constructs acquire release locks both within between procedures eliminate constructs.The also presents algorithm, lock elimination, reducing overhead. optimization locates...
This paper considers the role of performance and area estimates from behavioral synthesis in design space exploration. We have developed a compilation system that automatically maps high-level algorithms written C to application-specific designs for Field Programmable Gate Arrays (FPGAs), through collaboration between parallelizing compiler technology tools. Using several code transformations, optimizes increase parallelism utilization external memory bandwidth, selects best among set...
Reconfigurable systems, and in particular, FPGA-based custom computing machines, offer a unique opportunity to define application-specific architectures. These architectures performance advantages for application domains such as image processing, where the use of customized pipelines exploits inherent coarse-grain parallelism. In this paper we describe set program analyses an implementation that map sequential un-annotated C into pipelined running on FPGAs, each with multiple external...
Many video and image/signal processing applications can be structured as sequences of data-dependent tasks using a consumer/producer communication paradigm are therefore amenable to pipelined execution. This paper presents an execution technique speed-up the overall successive, on reconfigurable architecture. The pipelines by overlapping their subject data-dependences. It decouples concurrent data-path control units uses custom, application data-driven, fine-grained synchronization buffering...
As the scale and complexity of future High Performance Computing systems continues to grow, rising frequency faults errors their impact on HPC applications will make it increasingly difficult accomplish useful computation. Traditional means fault detection correction are either hardware based or use software redundancy. Redundancy approaches usually entail complete replication program state computation therefore incurs substantial overhead application performance. Therefore, wide-scale full...