NFDI4DS | UHH-SEMS - Publication Details

José E. Moreira

ORCID: 0000-0001-7029-6327

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5055326969

Research Areas

Parallel Computing and Optimization Techniques
Distributed and Parallel Computing Systems
Interconnection Networks and Systems
Advanced Data Storage Technologies
Embedded Systems Design Techniques
Cloud Computing and Resource Management
Distributed systems and fault tolerance
Graph Theory and Algorithms
Numerical Methods and Algorithms
Scientific Computing and Data Management
Advanced Graph Neural Networks
Algorithms and Data Compression
Cellular Automata and Applications
Real-Time Systems Scheduling
Low-power high-performance VLSI design
Computational Physics and Python Applications
Software System Performance and Reliability
Logic, programming, and type systems
Advanced Database Systems and Queries
Security and Verification in Computing
Scheduling and Optimization Algorithms
Advanced Neural Network Applications
Stochastic processes and financial applications
Forecasting Techniques and Applications
Advanced Queuing Theory Analysis

IBM Research - Thomas J. Watson Research Center
2013-2024

University of Aveiro
2011-2024

IBM (United States)
2010-2023

IBM (Belgium)
2022

Alliance for Safe Kids
2018

Indiana University
2016

Amazon Research Foundation
2006

Institut d'Investigació Biomèdica de Girona
2005

Universitat Politècnica de Catalunya
2005

IBM Research - Zurich
2004

An Overview of the BlueGene/L Supercomputer

OPENALEX - Publications

N. R. Adiga George Almási George Almási Yariv Aridor Rajkishore Barik and 95 more

This paper gives an overview of the BlueGene/L Supercomputer. is a jointly funded research partnership between IBM and Lawrence Livermore National Laboratory as part United States Department Energy ASCI Advanced Architecture Research Program. Application performance scaling studies have recently been initiated with partners at number academic government institutions, including San Diego Supercomputer Center California Institute Technology. massively parallel system 65,536 nodes based on new...

10.5555/762761.762787 article EN Conference on High Performance Computing (Supercomputing) 2002-11-16

Mathematical foundations of the GraphBLAS

OPENALEX - Publications

Jeremy Kepner Peter Aaltonen David A. Bader Aydın Buluç Franz Franchetti and 11 more

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms broadest possible audience. Mathematically, defines a core set operations that can be used implement wide class in range programming environments. This paper provides an introduction mathematics GraphBLAS. Graphs represent connections between vertices with edges. Matrices graphs using adjacency matrices or incidence matrices. Adjacency are often easier analyze while better for...

10.1109/hpec.2016.7761646 preprint EN 2016-09-01

Critical event prediction for proactive management in large-scale computer clusters

OPENALEX - Publications

R.K. Sahoo Adam J. Oliner Irina Rish Manish Gupta José E. Moreira and 3 more

As the complexity of distributed computing systems increases, management tasks require significantly higher levels automation; examples include diagnosis and prediction based on real-time streams computer events, setting alarms, performing continuous monitoring. The core autonomic computing, a recently proposed initiative towards next-generation IT-systems capable 'self-healing', is ability to analyze data in predict potential problems. goal avoid catastrophic failures through prompt...

10.1145/956750.956799 article EN 2003-08-24

Blue Gene: A vision for protein science using a petaflop supercomputer

OPENALEX - Publications

Frances Allen George Almási Wanda Andreoni D.K. Beece B. J. Berne and 47 more

In December 1999, IBM announced the start of a five-year effort to build massively parallel computer, be applied study biomolecular phenomena such as protein folding. The project has two main goals: advance our understanding mechanisms behind folding via large-scale simulation, and explore novel ideas in machine architecture software. This should enable simulations that are orders magnitude larger than current technology permits. Major areas investigation include: how most effectively...

10.1147/sj.402.0310 article EN IBM Systems Journal 2001-01-01

Adaptive incremental checkpointing for massively parallel systems

OPENALEX - Publications

Saurabh Agarwal Rahul Garg Meeta S. Gupta José E. Moreira

Given the scale of massively parallel systems, occurrence faults is no longer an exception but a regular event. Periodic checkpointing becoming increasingly important in these systems. However, huge memory footprints applications place severe limitations on scalability normal techniques. Incremental well researched technique that addresses concerns, most implementations require paging support from hardware and underlying operating system, which may not be always available. In this paper, we...

10.1145/1006209.1006248 article EN 2004-06-26

Scale-up x Scale-out: A Case Study using Nutch/Lucene

OPENALEX - Publications

Maged Michael José E. Moreira Doron Shiloach Robert W. Wisniewski

Scale-up solutions in the form of large SMPs have represented mainstream commercial computing for past several years. The major server vendors continue to provide increasingly larger and more powerful machines. More recently, scale-out solutions, clusters smaller machines, gained increased acceptance computing. Scale-out are particularly effective high-throughput Web-centric applications. In this paper, we investigate behavior two competing approaches parallelism, scale-up scale-out, an...

10.1109/ipdps.2007.370631 article EN 2007-01-01

IBM POWER8 processor core microarchitecture

OPENALEX - Publications

Balaram Sinharoy J. A. Van Norstrand Richard J. Eickemeyer H. Q. Le J. Leenstra and 15 more

The POWER8™ processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM. It fabricated using company's 22-nm Silicon on Insulator (SOI) technology with 15 layers of metal, and it has been designed to significantly improve both single-thread performance single-core throughput over its predecessor, POWER7® processor. rate increase in frequency enabled by new silicon advancements decreased dramatically recent generations, as compared historic trend. This caused many...

10.1147/jrd.2014.2376112 article EN IBM Journal of Research and Development 2015-01-01

Optimization of MPI collective communication on BlueGene/L systems

OPENALEX - Publications

George Almási Philip Heidelberger Charles J Archer Xavier Martorell C. Chris Erway and 3 more

BlueGene/L is currently the world's fastest supercomputer. It consists of a large number low power dual-processor compute nodes interconnected by high speed torus and collective networks, Because do not have shared memory, MPI natural programming model for this machine. The library port MPICH2.In paper we discuss implementation collectives on BlueGene/L. MPICH2 based point-to-point communication primitives. This turns out to be suboptimal reasons. Machine-optimized are necessary harness...

10.1145/1088149.1088183 article EN 2005-06-20

An Overview of the BlueGene/L Supercomputer

OPENALEX - Publications

N. R. Adiga George Almási George Almási Yariv Aridor Rajkishore Barik and 95 more

This paper gives an overview of the BlueGene/L Supercomputer. is a jointly funded research partnership between IBM and Lawrence Livermore National Laboratory as part United States Department Energy ASCI Advanced Architecture Research Program. Application performance scaling studies have recently been initiated with partners at number academic government institutions,including San Diego Supercomputer Center California Institute Technology. massively parallel system 65,536 nodes based on new...

10.1109/sc.2002.10017 article EN 2002-01-01

Filtering Failure Logs for a BlueGene/L Prototype

OPENALEX - Publications

Yu Liang Y. Zhang Anand Sivasubramaniam R.K. Sahoo José E. Moreira and 1 more

The growing computational and storage needs of several scientific applications mandate the deployment extreme-scale parallel machines, such as IBM's BlueGene/L, which can accommodate many 128K processors. In this paper, we present our experiences in collecting filtering error event logs from a 8192 processor BlueGene/L prototype at IBM Rochester, is currently ranked #8 Top-500 list. We analyze collected machine over period 84 days starting August 26, 2004. perform three-step algorithm on...

10.1109/dsn.2005.50 article EN 2005-07-27

Design of the GraphBLAS API for C

OPENALEX - Publications

Aydın Buluç Tim Mattson Scott McMillan José E. Moreira Carl Yang

The purpose of the GraphBLAS Forum is to standardize linear-algebraic building blocks for graph computations. An important part this standardization effort translate mathematical specification into an actual Application Programming Interface (API) that (i) faithful mathematics and (ii) enables efficient implementations on modern hardware. This paper documents approach taken by C language subcommittee presents main concepts, constructs, objects within API. Use API illustrated showing...

10.1109/ipdpsw.2017.117 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2017-05-01

An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration

OPENALEX - Publications

Yanyong Zhang Hubertus Franke José E. Moreira Anand Sivasubramaniam

Effective scheduling strategies to improve response times, throughput, and utilization are an important consideration in large supercomputing environments. Parallel machines these environments have traditionally used space-sharing accommodate multiple jobs at the same time by dedicating nodes a single job until it completes. This approach, however, can result low system wait times. paper discusses three techniques that be beyond simple performance of parallel systems. The first technique we...

10.1109/tpds.2003.1189582 article EN IEEE Transactions on Parallel and Distributed Systems 2003-03-01

Java programming for high-performance numerical computing

OPENALEX - Publications

José E. Moreira Samuel P. Midkiff Manish Gupta Pedro V. Artigas Marc Snir and 1 more

First proposed as a mechanism for enhancing Web content, the Java™ language has taken off serious general-purpose programming language. Industry and academia alike have expressed great interest in using Java scientific engineering computations. Applications these domains are characterized by intensive numerical computing often very high performance requirements. In this paper we discuss techniques that lead to codes with comparable FORTRAN or C, more traditional languages field. The centered...

10.1147/sj.391.0021 article EN IBM Systems Journal 2000-01-01

Blue Gene system software---Topology mapping for Blue Gene/L supercomputer

OPENALEX - Publications

Hao Yu I‐Hsin Chung José E. Moreira

Mapping virtual processes onto physical processos is one of the most important issues in parallel computing. The problem mapping processes/tasks processors equivalent to graph embedding which has been studied extensively. Although many techniques have proposed for embeddings two-dimensional grids, hypercubes, etc., there are few efforts on three-dimensional grids and tori. Motivated better support task Blue Gene/L supercomputer, this paper, we present integration topology library that based...

10.1145/1188455.1188576 article EN 2006-01-01

Improving parallel job scheduling by combining gang scheduling and backfilling techniques

OPENALEX - Publications

Ying Zhang Hubertus Franke José E. Moreira Anand Sivasubramaniam

Two different approaches have been commonly used to address problems associated with space sharing scheduling strategies: (a) augmenting backfilling, which performs out of order job scheduling; and (b) time sharing, using a technique called coscheduling or gang scheduling. With three important experimental results-impact priority queue on impact overestimation execution times, comparison techniques-this paper presents an integrated strategy that combines backfilling Using extensive...

10.1109/ipdps.2000.845975 article EN 2002-11-07

Fault-aware job scheduling for bBueGene/L systems

OPENALEX - Publications

Adam J. Oliner R.K. Sahoo José E. Moreira Manish Gupta Anand Sivasubramaniam

Summary form only given. Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. We evaluate the effectiveness previously developed job scheduling algorithm for in presence faults. have two new job-scheduling algorithms considering while jobs. also evaluated impact these on average bounded slowdown, response time utilization, different levels proactive failure prediction prevention techniques reported literature....

10.1109/ipdps.2004.1302991 article EN 2004-06-10

Enabling massive deep neural networks with the GraphBLAS

OPENALEX - Publications

Jeremy Kepner Manoj Kumar José E. Moreira Pratap Pattnaik Maurício Serrano and 1 more

Deep Neural Networks (DNNs) have emerged as a core tool for machine learning.The computations performed during DNN training and inference are dominated by operations on the weight matrices describing DNN.As DNNs incorporate more stages nodes per stage, these may be required to sparse because of memory limitations.The GraphBLAS.orgmath library standard was developed provide high performance manipulation input/output vectors.For sufficiently matrices, matrix requires significantly less than...

10.1109/hpec.2017.8091098 preprint EN 2017-09-01

LAGraph: A Community Effort to Collect Graph Algorithms Built on Top of the GraphBLAS

OPENALEX - Publications

Tim Mattson Timothy A. Davis Manoj Kumar Aydın Buluç Scott McMillan and 2 more

In 2013, we released a position paper to launch community effort define common set of building blocks for constructing graph algorithms in the language linear algebra. This led GraphBLAS. We specification C programming binding GraphBLAS 2017. Since that release, multiple libraries conform have been produced. this paper, next phase ongoing effort: project assemble high level built on top While many these are well-known with quality implementations available, they not assembled one place and...

10.1109/ipdpsw.2019.00053 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2019-05-01

Adherence to Inhalation Therapy Among COPD Patients: A Cross-Sectional Study in a Tertiary Hospital in Quito, Ecuador

OPENALEX - Publications

Juan S. Izquierdo‐Condoy Fernando Álvarez Estefanía Morales-Lapo Washington David Arias Calvache José E. Moreira and 2 more

Chronic Obstructive Pulmonary Disease (COPD), the third leading cause of death globally, poses a significant public health burden. Despite its high prevalence, underdiagnosis and poor treatment adherence remain major challenges, contributing to increased hospitalization mortality. This study aimed assess inhalation therapy among COPD patients treated at specialty hospital in Quito, Ecuador. A cross-sectional was conducted on 85 diagnosed with tertiary Quito. Data collected through...

10.2147/copd.s493992 article EN cc-by-nc International Journal of COPD 2025-02-01

Design and implementation of message-passing services for the Blue Gene/L supercomputer

OPENALEX - Publications

George Almási Charles J Archer José G. Castaños John A. Gunnels C. Christopher Erway and 8 more

The Blue Gene®/L (BG/L) supercomputer, with 65,536 dual-processor compute nodes, was designed from the ground up to support efficient execution of massively parallel message-passing programs. Part this is an optimized implementation Message Passing Interface (MPI), which leverages hardware features BG/L. MPI for BG/L implemented on top a more basic infrastructure called message layer. This layer can be used both implement other higher-level libraries and directly by applications. are in two...

10.1147/rd.492.0393 article EN IBM Journal of Research and Development 2005-03-01

Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems

OPENALEX - Publications

Adam J. Oliner R.K. Sahoo José E. Moreira Manish Gupta

Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is common technique for mitigating the amount work lost due job failures, but its effectiveness under realistic circumstances has not been studied. In this paper, we analyze system-level performance periodic using parameters similar those projected systems. Our results reflect simulations on toroidal interconnect...

10.1109/ipdps.2005.337 article EN 2005-04-19

High Performance File I/O for The Blue Gene/L Supercomputer

OPENALEX - Publications

Hang Yu R.K. Sahoo Christopher Howson George Almási José G. Castaños and 8 more

Parallel I/O plays a crucial role for most data-intensive applications running on massively parallel systems like Blue Gene/L that provides the promise of delivering enormous computational capability. We designed and implemented highly scalable file architecture Gene/L, which leverages benefit hierarchical functional partitioning design system software with separate cores. The exploits scalability aspect GPFS (General File System) at backend, while using MPI as an interface between...

10.1109/hpca.2006.1598125 article EN 2006-03-21

From flop to megaflops

OPENALEX - Publications

José E. Moreira Samuel P. Midkiff Manish Gupta

Although there has been some experimentation with Java as a language for numerically intensive computing, is perception by many that the unsuited such work because of performance deficiencies. In this article we show how optimizing array bounds checks and null pointer creates loop nests on which aggressive optimizations can be used. Applying these hand to simple matrix-multiply test case leads Java-compliant programs whose in excess 500 Mflops four-processor 332MHz RS/6000 model F50...

10.1145/349214.349222 article EN ACM Transactions on Programming Languages and Systems 2000-03-01

Evaluation of a multithreaded architecture for cellular computing

OPENALEX - Publications

Cǎlin Caşcaval José G. Castaños Luís Ceze Monty Denneau Manoj Gupta and 4 more

Cyclops is a new architecture for high-performance parallel computers that being developed at the IBM T. J. Watson Research Center. The basic cell of this single-chip SMP (symmetric multiprocessor) system with multiple threads execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism used to tolerate functional unit latencies. Large systems thousands chips can be built by replicating in regular pattern. In paper, we describe evaluate two its hardware...

10.1109/hpca.2002.995720 article EN 2004-04-23

Blue Gene system software---Designing a highly-scalable operating system

OPENALEX - Publications

José E. Moreira Pat McCarthy Michael Mundy Jeff Parker Brian Wallenfelt and 8 more

Blue Gene/L is currently the world's fastest and most scalable supercomputer. It has demonstrated essentially linear scaling all way to 131,072 processors in several benchmarks real applications. The operating systems for compute I/O nodes of Gene/L, are among components responsible that scalability. Compute dedicated running application processes, whereas performing system functions. adopted each these reflect this separation function. run a lightweight called node kernel. port Linux...

10.1145/1188455.1188578 article EN 2006-01-01

Coming Soon ...