- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Distributed systems and fault tolerance
- Software System Performance and Reliability
- Cloud Computing and Resource Management
- Logic, programming, and type systems
- Interconnection Networks and Systems
- Formal Methods in Verification
- Ferroelectric and Negative Capacitance Devices
- Scientific Computing and Data Management
- Software Engineering Research
- Software Testing and Debugging Techniques
- AI-based Problem Solving and Planning
- Advanced Software Engineering Methodologies
- Model-Driven Software Engineering Techniques
Rice University
2008-2024
University of Houston
2004-2007
University of Southampton
2000
Abstract HPCT OOLKIT is an integrated suite of tools that supports measurement, analysis, attribution, and presentation application performance for both sequential parallel programs. can pinpoint quantify scalability bottlenecks in fully optimized programs with a measurement overhead only few percent. Recently, new capabilities were added to collecting call path profiles codes without any compiler support, pinpointing quantifying multithreaded programs, exploring information source code...
Applications must scale well to make efficient use of today's class petascale computers, which contain hundreds thousands processor cores. Inefficiencies that do not even appear in modest-scale executions can become major bottlenecks large-scale executions. Because scaling problems are often difficult diagnose, there is a critical need for scalable tools guide scientists the root causes problems. Load imbalance one most common To provide actionable insight into load imbalance, we present...
In 1998, Numrich and Reid proposed Coarray Fortran as a simple set of extensions to 95 [7]. Their principal extension was support for shared data known coarrays. 2005, the Standards Committee began exploring addition coarrays 2008, which is now being finalized. Careful review drafts emerging 2008 standard led us identify several shortcomings with coarray extensions. this paper, we briefly critique outline new vision in language that far more expressive, describe our strategy implementing propose.
Applications must scale well to make efficient use of even medium-scale parallel systems. Because scaling problems are often difficult diagnose, there is a critical need for scalable tools that guide scientists the root causes performance bottlenecks.
Cutting-edge science and engineering applications require petascale computing. It is, however, a significant challenge to use computing platforms effectively. Consequently, there is critical need for performance tools that enable scientists understand impediments on emerging systems. In this paper, we describe HPCToolkit---a suite of multi-platform supports sampling-based analysis application platforms. HPCToolkit uses sampling pinpoint quantify both scaling node bottlenecks. We study...
As part of the U.S. Department Energy's Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. activities associated with SciDAC Center for Scalable Application Development Software (CScADS) Performance Engineering Research Institute (PERI), Rice University is building software tools performance analysis scientific applications leadership-class platforms. In this poster abstract, we...
Today's largest supercomputers have over two hundred thousand CPU cores and even larger systems are under development. Typically, these programmed using message passing. Over the past decade, there has been considerable interest in developing simpler more expressive programming models for them. Partitioned global address space (PGAS) languages viewed as perhaps most promising alternative. In this paper, we report on our experience a set of PGAS extensions to Fortran that call Co array 2.0...
Performance evaluation and modeling is a crucial process to enable the optimization of parallel programs. Programs written using two programming models, such as MPI OpenMP, require an analysis determine both performance efficiency most suitable numbers processes threads for their execution on given platform. To study these problems, we propose construction model that based upon small number parameters, but able capture complexity runtime system. We must incorporate measurements overheads...
Call path profiling is a scalable measurement technique that has been shown to provide insight into the performance characteristics of complex modular programs. However, poor presentation accurate and precise call profiles obscures insight. To enable rapid analysis an execution's bottlenecks, we make following contributions for effectively presenting profiles. First, combine relatively small set complementary techniques form coherent synthesis greater than constituent parts. Second, extend...
In Numrich and Reid's 1998 proposal [17], Coarray Fortran is a simple set of extensions to 95, principal among which support for shared data known as coarrays. Responding short-comings in the Standards Committee's addition coarrays 2008 standards, we at Rice envisioned an extensive update has come be 2.0 [15]. this paper, chronicle evolution it gains asynchronous point-to-point collective operations. We outline how these operations are implemented describe code fragments from several...
A program analysis tool can play an important role in helping users understand and improve large application codes. Dragon is a robust interactive based on the Open64 compiler, which Open source C/C++/Fortran77/90 compiler for Intel Itanium systems. We designed developed to support manual optimization parallelization of applications by exploiting powerful analyses compiler. enables visualize print essential structure obtains information their applications. Current features include call...
As part of the US Department Energy’s Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis applications executing on exascale supercomputers. To efficiently collect measurements GPU-accelerated applications, employs novel non-blocking data structures communicate between tool threads application threads. attribute information in detail source lines, loop nests, inlined call chains, performs parallel...
OpenMP was recently proposed by a group of vendors as programming model for shared memory parallel architectures. The growing popularity such systems, and the rapid availability product-strength compilers OpenMP, seem to guarantee broad take-up this paradigm if appropriate tools application development can be provided. POST is an EU-funded project that developing product, based on FORESYS from Simulog, which aims reduce human effort involved in creation code. Additional research within...
Analysis and optimization of long-running applications on large-scale parallel systems is important to avoid unacceptable inefficiencies. Tracing one the most popular techniques for understanding performance programs. Since tracing captures data in time dimension, size a trace linearly proportional execution time. For that reason, traces executions programs may contain gigabytes or even terabytes data. Presenting huge scalable fashion identifying bottlenecks hidden an ocean are challenging...
The number of cores in high-end systems for scientific computing are employingis increasing rapidly. As a result, there is an pressing need tools that can measure, model, and diagnose performance problems highly-parallel runs. We describe two employ complementary approaches analysis at scale we illustrate their use on DOE leadership-class systems.