NFDI4DS | UHH-SEMS - Publication Details

Abhinav Bhatelé

ORCID: 0000-0003-3069-3701

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5081506338

Research Areas

Parallel Computing and Optimization Techniques
Interconnection Networks and Systems
Distributed and Parallel Computing Systems
Cloud Computing and Resource Management
Advanced Data Storage Technologies
Software System Performance and Reliability
Embedded Systems Design Techniques
Software Engineering Research
Data Visualization and Analytics
Scientific Computing and Data Management
Complex Network Analysis Techniques
Advanced Neural Network Applications
Distributed systems and fault tolerance
Advanced Memory and Neural Computing
Protein Structure and Dynamics
Stochastic Gradient Optimization Techniques
Algorithms and Data Compression
Matrix Theory and Algorithms
Tensor decomposition and applications
Software-Defined Networks and 5G
Machine Learning and ELM
Opinion Dynamics and Social Influence
Caching and Content Delivery
Manufacturing Process and Optimization
Advanced Database Systems and Queries

University of Maryland, College Park
2019-2024

Nvidia (United Kingdom)
2024

Iowa State University
2023

University of Oregon
2023

Lawrence Livermore National Laboratory
2011-2022

Leibniz Supercomputing Centre
2019

University of Illinois Urbana-Champaign
2007-2011

Indian Institute of Technology Kanpur
2007

Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application

OPENALEX - Publications

Ian Karlin Abhinav Bhatelé Jeff Keasler Bradford L. Chamberlain Jonathan D. Cohen and 8 more

Parallel machines are becoming more complex with increasing core counts and heterogeneous architectures. However, the commonly used parallel programming models, C/C++ MPI and/or OpenMP, make it difficult to write source code that is easily tuned for many targets. Newer language approaches attempt ease this burden by providing optimization features such as automatic load balancing, overlap of computation communication, message-driven execution, implicit data layout optimizations. In paper, we...

10.1109/ipdps.2013.115 article EN 2013-05-01

There goes the neighborhood

OPENALEX - Publications

Abhinav Bhatelé Kathryn Mohror S. H. Langer Katherine E. Isaacs

Predictable performance is important for understanding and alleviating application issues; quantifying the effects of source code, compiler, or system software changes; estimating time required batch jobs; determining allocation requests proposals. Our experiments show that on a Cray XE system, execution communication-heavy parallel ranges from 28% faster to 41% slower than average observed performance. Blue Gene systems, other hand, demonstrate no noticeable run-to-run variability. In this...

10.1145/2503210.2503247 article EN 2013-10-30

HPC-Coder: Modeling Parallel Programs using Large Language Models

OPENALEX - Publications

Daniel Nichols Aniruddha Marathe Harshitha Menon S.J. Gamblin Abhinav Bhatelé

Parallel programs in high performance computing (HPC) continue to grow complexity and scale the exascale era. The diversity hardware parallel programming models make developing, optimizing, maintaining software even more burdensome for developers. One way alleviate some of these burdens is with automated development analysis tools. Such tools can perform complex and/or remedial tasks developers that increase their productivity decrease chance error. Until recently, such code have been...

10.23919/isc.2024.10528929 article EN 2024-05-01

State of the Art of Performance Visualization

OPENALEX - Publications

Katherine E. Isaacs Alfredo Giménez Ilir Jusufi Todd Gamblin Abhinav Bhatelé and 3 more

Performance visualization comprises techniques that aid developers and analysts in improving the time energy efficiency of their software. In this work, we discuss performance as it relates to survey existing approaches visualization. We present an overview what types data can be collected a categorization goals address. develop taxonomy for contexts which different visualizations reside describe state art research pertaining each. Finally, unaddressed future challenges

10.2312/eurovisstar.20141177 article EN Eurographics 2014-01-01

Can Large Language Models Write Parallel Code?

OPENALEX - Publications

Daniel Nichols Joshua Hoke Davis Zhaojun Xie Arjun Rajaram Abhinav Bhatelé

Large language models are increasingly becoming a popular tool for software development.Their ability to model and generate source code has been demonstrated in variety of contexts, including completion, summarization, translation, lookup.However, they often struggle complex programs.In this paper, we study the capabilities state-of-the-art parallel code.In order evaluate models, create benchmark, ParEval, consisting prompts that represent 420 different coding tasks related scientific...

10.1145/3625549.3658689 article EN 2024-06-03

Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system

OPENALEX - Publications

Sameer Kumar Chao Huang Guanjie Zheng Eric J. Bohm Abhinav Bhatelé and 3 more

NAMD (nanoscale molecular dynamics) is a production dynamics (MD) application for biomolecular simulations that include assemblages of proteins, cell membranes, and water molecules. In simulation, the problem size fixed large number iterations must be executed in order to understand interesting biological phenomena. Hence, we need MD applications scale thousands processors, even though individual timestep on one processor quite small. has demonstrated its performance several parallel...

10.1147/rd.521.0177 article EN IBM Journal of Research and Development 2008-01-01

Dynamic topology aware load balancing algorithms for molecular dynamics applications

OPENALEX - Publications

Abhinav Bhatelé Laxmikant V. Kalé Sameer Kumar

Molecular Dynamics applications enhance our understanding of biological phenomena through bio-molecular simulations. Large-scale parallelization MD simulations is challenging because the small number atoms and time scales involved. Load balancing in parallel programs crucial for good performance on large machines. This paper discusses load algorithms deployed a code called NAMD. It focuses new schemes balancers provides an analysis benefits achieved. Specifically, presents technique...

10.1145/1542275.1542295 article EN 2009-06-08

Overcoming scaling challenges in biomolecular simulations across multiple platforms

OPENALEX - Publications

Abhinav Bhatelé Sameer Kumar Chao Mei J. C. Phillips Gengbin Zheng and 1 more

NAMD is a portable parallel application for biomolecular simulations. pioneered the use of hybrid spatial and force decomposition, technique now used by most scalable programs simulations, including Blue Matter Desmond developed IBM D. E. Shaw respectively. has been using Charm++ benefits from its adaptive communication-computation overlap dynamic load balancing. This paper focuses on new scalability challenges in simulations: much larger machines simulating molecular systems with millions...

10.1109/ipdps.2008.4536317 article EN Proceedings - IEEE International Parallel and Distributed Processing Symposium 2008-04-01

Periodic hierarchical load balancing for large supercomputers

OPENALEX - Publications

Gengbin Zheng Abhinav Bhatelé Esteban Meneses Laxmikant V. Kalé

Large parallel machines with hundreds of thousands processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes applications on even processors. Centralized balancing algorithms suffer from scalability problems, especially a relatively small amount memory. Fully distributed algorithms, the other hand, tend to take longer arrive at solutions. In this paper, we present an automatic dynamic hierarchical method that overcomes challenges centralized...

10.1177/1094342010394383 article EN The International Journal of High Performance Computing Applications 2011-03-08

Avoiding hot-spots on two-level direct networks

OPENALEX - Publications

Abhinav Bhatelé Nikhil Jain William Gropp Laxmikant V. Kalé

A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. two-level direct has been proposed by several groups as scalable design future IBM's PERCS topology and the dragonfly discussed in DARPA hardware study are examples of this design. The presence multiple levels leads hot-spots on few links when processes grouped together at lowest level minimize total communication volume. This especially true graphs with small number neighbors per task....

10.1145/2063384.2063486 article EN 2011-11-08

Improving communication performance in dense linear algebra via topology aware collectives

OPENALEX - Publications

Edgar Solomonik Abhinav Bhatelé James Demmel

Recent results have shown that topology aware mapping reduces network contention in communication-intensive kernels on massively parallel machines. We demonstrate mesh interconnects, also allows for the utilization of highly-efficient collectives. map novel 2.5D dense linear algebra algorithms to exploit rectangular collectives cuboid partitions allocated by a Blue Gene/P supercomputer. Our mappings allow optimized line multicasts and reductions. Commonly used 2D cannot be mapped this...

10.1145/2063384.2063487 article EN 2011-11-08

Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time

OPENALEX - Publications

Katherine E. Isaacs Peer‐Timo Bremer Ilir Jusufi Todd Gamblin Abhinav Bhatelé and 2 more

With the continuous rise in complexity of modern supercomputers, optimizing performance large-scale parallel programs is becoming increasingly challenging. Simultaneously, growth scale magnifies impact even minor inefficiencies--potentially millions compute hours and megawatts power consumption can be wasted on avoidable mistakes or sub-optimal algorithms. This makes analysis optimization critical elements software development process. One most common forms to study execution traces, which...

10.1109/tvcg.2014.2346456 article EN IEEE Transactions on Visualization and Computer Graphics 2014-08-11

Massively parallel first-principles simulation of electron dynamics in materials

OPENALEX - Publications

Erik W. Draeger Xavier Andrade John A. Gunnels Abhinav Bhatelé André Schleife and 1 more

10.1016/j.jpdc.2017.02.005 article EN publisher-specific-oa Journal of Parallel and Distributed Computing 2017-03-05

Maximizing Throughput on a Dragonfly Network

OPENALEX - Publications

Nikhil Jain Abhinav Bhatelé Xiang Ni Nicholas J. Wright Laxmikant V. Kalé

Interconnection networks are a critical resource for large supercomputers. The dragonfly topology, which provides low network diameter and bisection bandwidth, is being explored as promising option building multi-Petaflop's Exaflop's systems. Unlike the extensively studied torus networks, best choices of message routing job placement strategies topology not well understood. This paper aims at analyzing behavior machine built using various strategies, policies, application communication...

10.1109/sc.2014.33 article EN 2014-11-01

Transformers Can Do Arithmetic with the Right Embeddings

OPENALEX - Publications

Sean McLeish Arpit Bansal Alex Stein Neel Jain John Kirchenbauer and 6 more

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability keep track the exact position each digit inside a span digits. We mend this problem by adding an embedding that encodes its relative start number. In addition boost these embeddings provide own, we show fix enables architectural modifications such as input injection and recurrent layers improve even further. With positions resolved, can study logical extrapolation ability transformers....

10.48550/arxiv.2405.17399 preprint EN arXiv (Cornell University) 2024-05-27

Automated mapping of regular communication graphs on mesh interconnects

OPENALEX - Publications

Abhinav Bhatelé Gagan Gupta Laxmikant V. Kalé I‐Hsin Chung

Network contention has a significantly adverse effect on the performance of parallel applications with increasing size machines. Machines petascale era are forcing application developers to map tasks intelligently job partitions achieve best possible. This paper presents framework for automated mapping regular communication graphs two and three dimensional mesh torus networks. will save much effort part generate mappings their individual applications. One component is process topology...

10.1109/hipc.2010.5713190 article EN 2010-12-01

Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations

OPENALEX - Publications

Aaditya G. Landge Joshua A. Levine Abhinav Bhatelé Katherine E. Isaacs Todd Gamblin and 4 more

The performance of massively parallel applications is often heavily impacted by the cost communication among compute nodes. However, determining how to best use network a formidable task, made challenging ever increasing size and complexity modern supercomputers. This paper applies visualization techniques aid application developers in understanding activity enabling detailed exploration flow packets through hardware interconnect. In order visualize this large complex data, we employ two...

10.1109/tvcg.2012.286 article EN IEEE Transactions on Visualization and Computer Graphics 2012-10-16

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers

OPENALEX - Publications

Gengbin Zheng Esteban Meneses Abhinav Bhatelé Laxmikant V. Kalé

Large parallel machines with hundreds of thousands processors are being built. Recent studies have shown that ensuring good load balance is critical for scaling certain classes applications on even processors. Centralized balancing algorithms suffer from scalability problems, especially relatively small amount memory. Fully distributed algorithms, the other hand, tend to yield poor very large machines. In this paper, we present an automatic dynamic hierarchical method overcomes challenges...

10.1109/icppw.2010.65 article EN 2010-09-01

Predicting application performance using supervised learning on communication features

OPENALEX - Publications

Nikhil Jain Abhinav Bhatelé Michael P. Robson Todd Gamblin Laxmikant V. Kalé

Task mapping on torus networks has traditionally focused either reducing the maximum dilation or average number of hops per byte for messages in an application. These metrics make simplified assumptions about cause network congestion, and do not provide accurate correlation with execution time. Hence, these cannot be used to reasonably predict compare application performance different mappings. In this paper, we attempt model using communication data, such as graph hardware counters. We use...

10.1145/2503210.2503263 article EN 2013-10-30

Identifying the Culprits Behind Network Congestion

OPENALEX - Publications

Abhinav Bhatelé Andrew R. Titus Jayaraman J. Thiagarajan Nikhil Jain Todd Gamblin and 3 more

Network congestion is one of the primary causes performance degradation, variability and poor scaling in communication-heavy parallel applications. However, mechanisms network on modern interconnection networks are not well understood. We need new approaches to analyze, model predict this critical behaviour order improve large-scale This paper applies supervised learning algorithms, such as forests extremely randomized trees gradient boosted regression trees, perform analysis communication...

10.1109/ipdps.2015.92 article EN 2015-05-01

Performance modeling under resource constraints using deep transfer learning

OPENALEX - Publications

Aniruddha Marathe Rushil Anirudh Nikhil Jain Abhinav Bhatelé Jayaraman J. Thiagarajan and 4 more

Tuning application parameters for optimal performance is a challenging combinatorial problem. Hence, techniques modeling the functional relationships between various input features in parameter space and are important. We show that simple statistical inference inadequate to capture these relationships. Even with more complex ensembles of models, minimum coverage required via experimental observations still quite large. propose deep learning based approach can combine information from...

10.1145/3126908.3126969 article EN 2017-11-08

Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers

OPENALEX - Publications

Abhinav Bhatelé Nikhil Jain Yarden Livnat Valerio Pascucci Peer‐Timo Bremer

The dragonfly topology is a popular choice for building high-radix, low-diameter, hierarchical networks with high-bandwidth links. On Cray installations of the network, job placement policies and routing inefficiencies can lead to significant network congestion single multi-job workloads. In this paper, we explore effects placement, parallel workloads configurations on health develop better understanding inter-job interference. We have developed functional simulator, Damselfly, model...

10.1109/ipdps.2016.123 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2016-05-01

Evaluating HPC Networks via Simulation of Parallel Workloads

OPENALEX - Publications

Nikhil Jain Abhinav Bhatelé Sam White Todd Gamblin Laxmikant V. Kalé

This paper presents an evaluation and comparison of three topologies that are popular for building interconnection networks in large-scale supercomputers: torus, fat-tree, dragonfly. To perform this evaluation, we propose a comprehensive methodology present scalable packet-level network simulator, TraceR. Our includes design prototype systems being evaluated, use proxy applications to determine computation communication load, simulating individual multi-job workloads, computing aggregated...

10.1109/sc.2016.13 article EN 2016-11-01

Coming Soon ...