José E. Moreira

ORCID: 0000-0001-7029-6327
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Interconnection Networks and Systems
  • Advanced Data Storage Technologies
  • Embedded Systems Design Techniques
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Graph Theory and Algorithms
  • Numerical Methods and Algorithms
  • Scientific Computing and Data Management
  • Advanced Graph Neural Networks
  • Algorithms and Data Compression
  • Cellular Automata and Applications
  • Real-Time Systems Scheduling
  • Low-power high-performance VLSI design
  • Computational Physics and Python Applications
  • Software System Performance and Reliability
  • Logic, programming, and type systems
  • Advanced Database Systems and Queries
  • Security and Verification in Computing
  • Scheduling and Optimization Algorithms
  • Advanced Neural Network Applications
  • Stochastic processes and financial applications
  • Forecasting Techniques and Applications
  • Advanced Queuing Theory Analysis

IBM Research - Thomas J. Watson Research Center
2013-2024

University of Aveiro
2011-2024

IBM (United States)
2010-2023

IBM (Belgium)
2022

Alliance for Safe Kids
2018

Indiana University
2016

Amazon Research Foundation
2006

Institut d'Investigació Biomèdica de Girona
2005

Universitat Politècnica de Catalunya
2005

IBM Research - Zurich
2004

This paper gives an overview of the BlueGene/L Supercomputer. is a jointly funded research partnership between IBM and Lawrence Livermore National Laboratory as part United States Department Energy ASCI Advanced Architecture Research Program. Application performance scaling studies have recently been initiated with partners at number academic government institutions, including San Diego Supercomputer Center California Institute Technology. massively parallel system 65,536 nodes based on new...

10.5555/762761.762787 article EN Conference on High Performance Computing (Supercomputing) 2002-11-16

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms broadest possible audience. Mathematically, defines a core set operations that can be used implement wide class in range programming environments. This paper provides an introduction mathematics GraphBLAS. Graphs represent connections between vertices with edges. Matrices graphs using adjacency matrices or incidence matrices. Adjacency are often easier analyze while better for...

10.1109/hpec.2016.7761646 preprint EN 2016-09-01

As the complexity of distributed computing systems increases, management tasks require significantly higher levels automation; examples include diagnosis and prediction based on real-time streams computer events, setting alarms, performing continuous monitoring. The core autonomic computing, a recently proposed initiative towards next-generation IT-systems capable 'self-healing', is ability to analyze data in predict potential problems. goal avoid catastrophic failures through prompt...

10.1145/956750.956799 article EN 2003-08-24

In December 1999, IBM announced the start of a five-year effort to build massively parallel computer, be applied study biomolecular phenomena such as protein folding. The project has two main goals: advance our understanding mechanisms behind folding via large-scale simulation, and explore novel ideas in machine architecture software. This should enable simulations that are orders magnitude larger than current technology permits. Major areas investigation include: how most effectively...

10.1147/sj.402.0310 article EN IBM Systems Journal 2001-01-01

Given the scale of massively parallel systems, occurrence faults is no longer an exception but a regular event. Periodic checkpointing becoming increasingly important in these systems. However, huge memory footprints applications place severe limitations on scalability normal techniques. Incremental well researched technique that addresses concerns, most implementations require paging support from hardware and underlying operating system, which may not be always available. In this paper, we...

10.1145/1006209.1006248 article EN 2004-06-26

Scale-up solutions in the form of large SMPs have represented mainstream commercial computing for past several years. The major server vendors continue to provide increasingly larger and more powerful machines. More recently, scale-out solutions, clusters smaller machines, gained increased acceptance computing. Scale-out are particularly effective high-throughput Web-centric applications. In this paper, we investigate behavior two competing approaches parallelism, scale-up scale-out, an...

10.1109/ipdps.2007.370631 article EN 2007-01-01

The POWER8™ processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM. It fabricated using company's 22-nm Silicon on Insulator (SOI) technology with 15 layers of metal, and it has been designed to significantly improve both single-thread performance single-core throughput over its predecessor, POWER7® processor. rate increase in frequency enabled by new silicon advancements decreased dramatically recent generations, as compared historic trend. This caused many...

10.1147/jrd.2014.2376112 article EN IBM Journal of Research and Development 2015-01-01

BlueGene/L is currently the world's fastest supercomputer. It consists of a large number low power dual-processor compute nodes interconnected by high speed torus and collective networks, Because do not have shared memory, MPI natural programming model for this machine. The library port MPICH2.In paper we discuss implementation collectives on BlueGene/L. MPICH2 based point-to-point communication primitives. This turns out to be suboptimal reasons. Machine-optimized are necessary harness...

10.1145/1088149.1088183 article EN 2005-06-20

This paper gives an overview of the BlueGene/L Supercomputer. is a jointly funded research partnership between IBM and Lawrence Livermore National Laboratory as part United States Department Energy ASCI Advanced Architecture Research Program. Application performance scaling studies have recently been initiated with partners at number academic government institutions,including San Diego Supercomputer Center California Institute Technology. massively parallel system 65,536 nodes based on new...

10.1109/sc.2002.10017 article EN 2002-01-01

The growing computational and storage needs of several scientific applications mandate the deployment extreme-scale parallel machines, such as IBM's BlueGene/L, which can accommodate many 128K processors. In this paper, we present our experiences in collecting filtering error event logs from a 8192 processor BlueGene/L prototype at IBM Rochester, is currently ranked #8 Top-500 list. We analyze collected machine over period 84 days starting August 26, 2004. perform three-step algorithm on...

10.1109/dsn.2005.50 article EN 2005-07-27

The purpose of the GraphBLAS Forum is to standardize linear-algebraic building blocks for graph computations. An important part this standardization effort translate mathematical specification into an actual Application Programming Interface (API) that (i) faithful mathematics and (ii) enables efficient implementations on modern hardware. This paper documents approach taken by C language subcommittee presents main concepts, constructs, objects within API. Use API illustrated showing...

10.1109/ipdpsw.2017.117 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2017-05-01

Effective scheduling strategies to improve response times, throughput, and utilization are an important consideration in large supercomputing environments. Parallel machines these environments have traditionally used space-sharing accommodate multiple jobs at the same time by dedicating nodes a single job until it completes. This approach, however, can result low system wait times. paper discusses three techniques that be beyond simple performance of parallel systems. The first technique we...

10.1109/tpds.2003.1189582 article EN IEEE Transactions on Parallel and Distributed Systems 2003-03-01

First proposed as a mechanism for enhancing Web content, the Java™ language has taken off serious general-purpose programming language. Industry and academia alike have expressed great interest in using Java scientific engineering computations. Applications these domains are characterized by intensive numerical computing often very high performance requirements. In this paper we discuss techniques that lead to codes with comparable FORTRAN or C, more traditional languages field. The centered...

10.1147/sj.391.0021 article EN IBM Systems Journal 2000-01-01

Mapping virtual processes onto physical processos is one of the most important issues in parallel computing. The problem mapping processes/tasks processors equivalent to graph embedding which has been studied extensively. Although many techniques have proposed for embeddings two-dimensional grids, hypercubes, etc., there are few efforts on three-dimensional grids and tori. Motivated better support task Blue Gene/L supercomputer, this paper, we present integration topology library that based...

10.1145/1188455.1188576 article EN 2006-01-01

Two different approaches have been commonly used to address problems associated with space sharing scheduling strategies: (a) augmenting backfilling, which performs out of order job scheduling; and (b) time sharing, using a technique called coscheduling or gang scheduling. With three important experimental results-impact priority queue on impact overestimation execution times, comparison techniques-this paper presents an integrated strategy that combines backfilling Using extensive...

10.1109/ipdps.2000.845975 article EN 2002-11-07

Summary form only given. Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. We evaluate the effectiveness previously developed job scheduling algorithm for in presence faults. have two new job-scheduling algorithms considering while jobs. also evaluated impact these on average bounded slowdown, response time utilization, different levels proactive failure prediction prevention techniques reported literature....

10.1109/ipdps.2004.1302991 article EN 2004-06-10

Deep Neural Networks (DNNs) have emerged as a core tool for machine learning.The computations performed during DNN training and inference are dominated by operations on the weight matrices describing DNN.As DNNs incorporate more stages nodes per stage, these may be required to sparse because of memory limitations.The GraphBLAS.orgmath library standard was developed provide high performance manipulation input/output vectors.For sufficiently matrices, matrix requires significantly less than...

10.1109/hpec.2017.8091098 preprint EN 2017-09-01

In 2013, we released a position paper to launch community effort define common set of building blocks for constructing graph algorithms in the language linear algebra. This led GraphBLAS. We specification C programming binding GraphBLAS 2017. Since that release, multiple libraries conform have been produced. this paper, next phase ongoing effort: project assemble high level built on top While many these are well-known with quality implementations available, they not assembled one place and...

10.1109/ipdpsw.2019.00053 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2019-05-01

Chronic Obstructive Pulmonary Disease (COPD), the third leading cause of death globally, poses a significant public health burden. Despite its high prevalence, underdiagnosis and poor treatment adherence remain major challenges, contributing to increased hospitalization mortality. This study aimed assess inhalation therapy among COPD patients treated at specialty hospital in Quito, Ecuador. A cross-sectional was conducted on 85 diagnosed with tertiary Quito. Data collected through...

10.2147/copd.s493992 article EN cc-by-nc International Journal of COPD 2025-02-01

The Blue Gene®/L (BG/L) supercomputer, with 65,536 dual-processor compute nodes, was designed from the ground up to support efficient execution of massively parallel message-passing programs. Part this is an optimized implementation Message Passing Interface (MPI), which leverages hardware features BG/L. MPI for BG/L implemented on top a more basic infrastructure called message layer. This layer can be used both implement other higher-level libraries and directly by applications. are in two...

10.1147/rd.492.0393 article EN IBM Journal of Research and Development 2005-03-01

Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is common technique for mitigating the amount work lost due job failures, but its effectiveness under realistic circumstances has not been studied. In this paper, we analyze system-level performance periodic using parameters similar those projected systems. Our results reflect simulations on toroidal interconnect...

10.1109/ipdps.2005.337 article EN 2005-04-19

Parallel I/O plays a crucial role for most data-intensive applications running on massively parallel systems like Blue Gene/L that provides the promise of delivering enormous computational capability. We designed and implemented highly scalable file architecture Gene/L, which leverages benefit hierarchical functional partitioning design system software with separate cores. The exploits scalability aspect GPFS (General File System) at backend, while using MPI as an interface between...

10.1109/hpca.2006.1598125 article EN 2006-03-21

Although there has been some experimentation with Java as a language for numerically intensive computing, is perception by many that the unsuited such work because of performance deficiencies. In this article we show how optimizing array bounds checks and null pointer creates loop nests on which aggressive optimizations can be used. Applying these hand to simple matrix-multiply test case leads Java-compliant programs whose in excess 500 Mflops four-processor 332MHz RS/6000 model F50...

10.1145/349214.349222 article EN ACM Transactions on Programming Languages and Systems 2000-03-01

Cyclops is a new architecture for high-performance parallel computers that being developed at the IBM T. J. Watson Research Center. The basic cell of this single-chip SMP (symmetric multiprocessor) system with multiple threads execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism used to tolerate functional unit latencies. Large systems thousands chips can be built by replicating in regular pattern. In paper, we describe evaluate two its hardware...

10.1109/hpca.2002.995720 article EN 2004-04-23

Blue Gene/L is currently the world's fastest and most scalable supercomputer. It has demonstrated essentially linear scaling all way to 131,072 processors in several benchmarks real applications. The operating systems for compute I/O nodes of Gene/L, are among components responsible that scalability. Compute dedicated running application processes, whereas performing system functions. adopted each these reflect this separation function. run a lightweight called node kernel. port Linux...

10.1145/1188455.1188578 article EN 2006-01-01
Coming Soon ...