- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Interconnection Networks and Systems
- Advanced Data Storage Technologies
- Embedded Systems Design Techniques
- Cloud Computing and Resource Management
- Distributed systems and fault tolerance
- Graph Theory and Algorithms
- Numerical Methods and Algorithms
- Scientific Computing and Data Management
- Advanced Graph Neural Networks
- Algorithms and Data Compression
- Cellular Automata and Applications
- Real-Time Systems Scheduling
- Low-power high-performance VLSI design
- Computational Physics and Python Applications
- Software System Performance and Reliability
- Logic, programming, and type systems
- Advanced Database Systems and Queries
- Security and Verification in Computing
- Scheduling and Optimization Algorithms
- Advanced Neural Network Applications
- Stochastic processes and financial applications
- Forecasting Techniques and Applications
- Advanced Queuing Theory Analysis
IBM Research - Thomas J. Watson Research Center
2013-2024
University of Aveiro
2011-2024
IBM (United States)
2010-2023
IBM (Belgium)
2022
Alliance for Safe Kids
2018
Indiana University
2016
Amazon Research Foundation
2006
Institut d'Investigació Biomèdica de Girona
2005
Universitat Politècnica de Catalunya
2005
IBM Research - Zurich
2004
This paper gives an overview of the BlueGene/L Supercomputer. is a jointly funded research partnership between IBM and Lawrence Livermore National Laboratory as part United States Department Energy ASCI Advanced Architecture Research Program. Application performance scaling studies have recently been initiated with partners at number academic government institutions, including San Diego Supercomputer Center California Institute Technology. massively parallel system 65,536 nodes based on new...
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms broadest possible audience. Mathematically, defines a core set operations that can be used implement wide class in range programming environments. This paper provides an introduction mathematics GraphBLAS. Graphs represent connections between vertices with edges. Matrices graphs using adjacency matrices or incidence matrices. Adjacency are often easier analyze while better for...
As the complexity of distributed computing systems increases, management tasks require significantly higher levels automation; examples include diagnosis and prediction based on real-time streams computer events, setting alarms, performing continuous monitoring. The core autonomic computing, a recently proposed initiative towards next-generation IT-systems capable 'self-healing', is ability to analyze data in predict potential problems. goal avoid catastrophic failures through prompt...
In December 1999, IBM announced the start of a five-year effort to build massively parallel computer, be applied study biomolecular phenomena such as protein folding. The project has two main goals: advance our understanding mechanisms behind folding via large-scale simulation, and explore novel ideas in machine architecture software. This should enable simulations that are orders magnitude larger than current technology permits. Major areas investigation include: how most effectively...
Given the scale of massively parallel systems, occurrence faults is no longer an exception but a regular event. Periodic checkpointing becoming increasingly important in these systems. However, huge memory footprints applications place severe limitations on scalability normal techniques. Incremental well researched technique that addresses concerns, most implementations require paging support from hardware and underlying operating system, which may not be always available. In this paper, we...
Scale-up solutions in the form of large SMPs have represented mainstream commercial computing for past several years. The major server vendors continue to provide increasingly larger and more powerful machines. More recently, scale-out solutions, clusters smaller machines, gained increased acceptance computing. Scale-out are particularly effective high-throughput Web-centric applications. In this paper, we investigate behavior two competing approaches parallelism, scale-up scale-out, an...
The POWER8™ processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM. It fabricated using company's 22-nm Silicon on Insulator (SOI) technology with 15 layers of metal, and it has been designed to significantly improve both single-thread performance single-core throughput over its predecessor, POWER7® processor. rate increase in frequency enabled by new silicon advancements decreased dramatically recent generations, as compared historic trend. This caused many...
BlueGene/L is currently the world's fastest supercomputer. It consists of a large number low power dual-processor compute nodes interconnected by high speed torus and collective networks, Because do not have shared memory, MPI natural programming model for this machine. The library port MPICH2.In paper we discuss implementation collectives on BlueGene/L. MPICH2 based point-to-point communication primitives. This turns out to be suboptimal reasons. Machine-optimized are necessary harness...
This paper gives an overview of the BlueGene/L Supercomputer. is a jointly funded research partnership between IBM and Lawrence Livermore National Laboratory as part United States Department Energy ASCI Advanced Architecture Research Program. Application performance scaling studies have recently been initiated with partners at number academic government institutions,including San Diego Supercomputer Center California Institute Technology. massively parallel system 65,536 nodes based on new...
The growing computational and storage needs of several scientific applications mandate the deployment extreme-scale parallel machines, such as IBM's BlueGene/L, which can accommodate many 128K processors. In this paper, we present our experiences in collecting filtering error event logs from a 8192 processor BlueGene/L prototype at IBM Rochester, is currently ranked #8 Top-500 list. We analyze collected machine over period 84 days starting August 26, 2004. perform three-step algorithm on...
The purpose of the GraphBLAS Forum is to standardize linear-algebraic building blocks for graph computations. An important part this standardization effort translate mathematical specification into an actual Application Programming Interface (API) that (i) faithful mathematics and (ii) enables efficient implementations on modern hardware. This paper documents approach taken by C language subcommittee presents main concepts, constructs, objects within API. Use API illustrated showing...
Effective scheduling strategies to improve response times, throughput, and utilization are an important consideration in large supercomputing environments. Parallel machines these environments have traditionally used space-sharing accommodate multiple jobs at the same time by dedicating nodes a single job until it completes. This approach, however, can result low system wait times. paper discusses three techniques that be beyond simple performance of parallel systems. The first technique we...
First proposed as a mechanism for enhancing Web content, the Java™ language has taken off serious general-purpose programming language. Industry and academia alike have expressed great interest in using Java scientific engineering computations. Applications these domains are characterized by intensive numerical computing often very high performance requirements. In this paper we discuss techniques that lead to codes with comparable FORTRAN or C, more traditional languages field. The centered...
Mapping virtual processes onto physical processos is one of the most important issues in parallel computing. The problem mapping processes/tasks processors equivalent to graph embedding which has been studied extensively. Although many techniques have proposed for embeddings two-dimensional grids, hypercubes, etc., there are few efforts on three-dimensional grids and tori. Motivated better support task Blue Gene/L supercomputer, this paper, we present integration topology library that based...
Two different approaches have been commonly used to address problems associated with space sharing scheduling strategies: (a) augmenting backfilling, which performs out of order job scheduling; and (b) time sharing, using a technique called coscheduling or gang scheduling. With three important experimental results-impact priority queue on impact overestimation execution times, comparison techniques-this paper presents an integrated strategy that combines backfilling Using extensive...
Summary form only given. Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. We evaluate the effectiveness previously developed job scheduling algorithm for in presence faults. have two new job-scheduling algorithms considering while jobs. also evaluated impact these on average bounded slowdown, response time utilization, different levels proactive failure prediction prevention techniques reported literature....
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning.The computations performed during DNN training and inference are dominated by operations on the weight matrices describing DNN.As DNNs incorporate more stages nodes per stage, these may be required to sparse because of memory limitations.The GraphBLAS.orgmath library standard was developed provide high performance manipulation input/output vectors.For sufficiently matrices, matrix requires significantly less than...
In 2013, we released a position paper to launch community effort define common set of building blocks for constructing graph algorithms in the language linear algebra. This led GraphBLAS. We specification C programming binding GraphBLAS 2017. Since that release, multiple libraries conform have been produced. this paper, next phase ongoing effort: project assemble high level built on top While many these are well-known with quality implementations available, they not assembled one place and...
Chronic Obstructive Pulmonary Disease (COPD), the third leading cause of death globally, poses a significant public health burden. Despite its high prevalence, underdiagnosis and poor treatment adherence remain major challenges, contributing to increased hospitalization mortality. This study aimed assess inhalation therapy among COPD patients treated at specialty hospital in Quito, Ecuador. A cross-sectional was conducted on 85 diagnosed with tertiary Quito. Data collected through...
The Blue Gene®/L (BG/L) supercomputer, with 65,536 dual-processor compute nodes, was designed from the ground up to support efficient execution of massively parallel message-passing programs. Part this is an optimized implementation Message Passing Interface (MPI), which leverages hardware features BG/L. MPI for BG/L implemented on top a more basic infrastructure called message layer. This layer can be used both implement other higher-level libraries and directly by applications. are in two...
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is common technique for mitigating the amount work lost due job failures, but its effectiveness under realistic circumstances has not been studied. In this paper, we analyze system-level performance periodic using parameters similar those projected systems. Our results reflect simulations on toroidal interconnect...
Parallel I/O plays a crucial role for most data-intensive applications running on massively parallel systems like Blue Gene/L that provides the promise of delivering enormous computational capability. We designed and implemented highly scalable file architecture Gene/L, which leverages benefit hierarchical functional partitioning design system software with separate cores. The exploits scalability aspect GPFS (General File System) at backend, while using MPI as an interface between...
Although there has been some experimentation with Java as a language for numerically intensive computing, is perception by many that the unsuited such work because of performance deficiencies. In this article we show how optimizing array bounds checks and null pointer creates loop nests on which aggressive optimizations can be used. Applying these hand to simple matrix-multiply test case leads Java-compliant programs whose in excess 500 Mflops four-processor 332MHz RS/6000 model F50...
Cyclops is a new architecture for high-performance parallel computers that being developed at the IBM T. J. Watson Research Center. The basic cell of this single-chip SMP (symmetric multiprocessor) system with multiple threads execution, embedded memory and integrated communications hardware. Massive intra-chip parallelism used to tolerate functional unit latencies. Large systems thousands chips can be built by replicating in regular pattern. In paper, we describe evaluate two its hardware...
Blue Gene/L is currently the world's fastest and most scalable supercomputer. It has demonstrated essentially linear scaling all way to 131,072 processors in several benchmarks real applications. The operating systems for compute I/O nodes of Gene/L, are among components responsible that scalability. Compute dedicated running application processes, whereas performing system functions. adopted each these reflect this separation function. run a lightweight called node kernel. port Linux...