Hiago Mayk G. de A. Rocha

ORCID: 0000-0002-0827-0131
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Cloud Computing and Resource Management
  • Interconnection Networks and Systems
  • Graph Theory and Algorithms
  • Embedded Systems Design Techniques
  • Advanced Graph Neural Networks
  • Advanced Data Storage Technologies
  • VLSI and FPGA Design Techniques
  • Energy Harvesting in Wireless Networks
  • Distributed and Parallel Computing Systems
  • Advanced Memory and Neural Computing

Universidade Federal da Bahia
2025

Universidade Federal do Rio Grande do Sul
2020-2023

The execution of large real-world graphs, such as web searches and social networks, has been boosting by modern HPC systems. However, their irregular communication patterns poor data locality impose many challenges, mainly when executed on NUMA As we show in this paper, there is no one-fits-all configuration for threads/data mapping, the best combination will vary according to system, graph algorithm, input at hand. Based that, propose Graphith: a framework that automatically enhances...

10.1109/pdp52278.2021.00033 article EN 2021-03-01

This paper proposes PredG, a Machine Learning framework to enhance the graph processing performance by finding ideal thread and data mapping on NUMA systems. PredG is agnostic input graph: it uses available graphs' features train an ANN perform predictions as new graphs arrive - without any application execution after being trained. When evaluating over representative algorithms three systems, its solutions are up 41% faster than Linux OS Default Best Static average 2% far from Oracle -,...

10.1145/3489517.3530581 article EN Proceedings of the 59th ACM/IEEE Design Automation Conference 2022-07-10

NUMA systems have become commonly used in HPC. However, to fully take advantage of these systems, the right thread-to-core allocation and page placement are essential. On top that, considering that many parallel applications limited scalability, applying thread throttling (i.e., artificially reducing number active threads) most times will further improve energy and/or performance. Because it involves variables, previous research has not considered aforementioned approaches altogether....

10.1109/hpcc-smartcity-dss50907.2020.00030 article EN 2020-12-01

Summary The growing need for extracting information from large graphs has been pushing the development of parallel graph algorithms. However, highly irregular structure real‐world limits performance and energy improvements applications. In this paper, we show that, in most cases, using all available cores multiprocessor is not best option terms aforementioned non‐functional requirements. Based on propose GraphKat , a framework that enables simultaneous processing several algorithms/graphs...

10.1002/cpe.7419 article EN Concurrency and Computation Practice and Experience 2022-11-01

Although advances in modern GPUs have accelerated the execution of heavy data processing applications, speeding up graph on these systems is not a trivial task: applications are characterized by their high volume irregular memory access that varies with structure so they do reach peak performance when executing many times. In cases, CPU more suitable. Given structures can be identified through high-level metrics (e.g., diameter and average clustering coefficient), may assist designer...

10.1109/pdp59025.2023.00013 article EN 2023-03-01

This work proposes an optimized task mapping solution called Routing Model-based Genetic Algorithm (RMGA) that combines and routing problems using Integer Linear Programming (ILP) model as a fitness function. We compared our proposed RMGA with other Algorithms (GA) address the problem classical flow x distance function evaluation. Experimental results evaluating communication latency demonstrate algorithm outperforms two GAs from literature. It presents up to 30% lower delay when simulating...

10.1109/sbesc51047.2020.9277843 article EN 2020-11-24

Graphs are data structures capable of representing problems from different domains, such as logistics and social networks. However, these massive graphs stored in high-performance computing (HPC) servers start processing distinct source vertices (i.e., single-source: a user or message network). Therefore, the amount structure sub-graphs to be processed will also change depending on source, highly influencing graph algorithm behavior performance. In this paper, we propose GraphNroll,...

10.1109/sbesc60926.2023.10324068 article EN 2023-11-21

Technology scaling has been allowing a growing number of cores in processors to satisfy the increasing demand new applications, which need process huge amounts data High-Performance Computing (HPC). However, considering that many parallel applications have limited scalability, not always activating maximum available execute an application will provide best outcome energy and performance (represented by Energy-Delay Product, or EDP). Because that, works already proposed different Dynamic...

10.1109/ipdpsw55747.2022.00154 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2022-05-01

Several aspects limit the scalability of parallel applications, e.g., off-chip bus saturation and data synchronization. Moreover, high cost cooling HPC systems, which can outweigh developing system itself, has pushed application’s execution to another level requirements, in terms performance energy. In this work, we propose AtTune: a heuristic-based framework for tuning number processes/threads CPU frequency optimize applications’ execution. AtTune is transparent user, independent input...

10.5753/sbesc_estendido.2020.13105 article EN 2020-11-23

O paralelismo no nível de threads (TLP) tem sido amplamente utilizado para otimizar o uso recursos computacionais (e.g., memórias cache e unidades funcionais da CPU) sistemas alto desempenho. No entanto, como algumas aplicações não escalam com número threads, ficarão ociosos quando a aplicação é executada ideal threads. Neste sentido, execução concorrente paralelas pode ser utilizada prover uma melhor utilização dos sem impactar desempenho consumo energia do sistema um todo. Dito isto, nós...

10.5753/wscad.2020.14058 article PT 2020-10-21

Asymmetric multicore processors (AMP) combine high-performance cores with more energy-efficient ones, capitalizing on the diverse performance demands of modern devices (e.g., smartphones and tablets). Although hardware players have been designing powerful AMPs for desktop server computers, such as Apple M1 Intel Alder Lake family, these impose new challenges parallel computing researchers how to properly use them their fullest. As we show in this paper, best number threads which combination...

10.1109/sbesc60926.2023.10324167 article EN 2023-11-21

Intermittent systems are ultra-low-power batteryless devices that increasing in popularity. These operate with energy extracted entirely from the environment. Since most environments cannot ensure sufficient and steady power supply conditions, intermittent suffer frequent outages, where computation is interrupted due to lack of energy. While numerous works have enabled via many different techniques, there no versatile configurable tools functionality, rapid development/design space...

10.1109/sbcci50935.2020.9189926 article EN 2020-08-01
Coming Soon ...