- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Software Engineering Research
- Scientific Computing and Data Management
- Topic Modeling
- Software System Performance and Reliability
- Research Data Management Practices
- Stellar, planetary, and galactic studies
- Software Testing and Debugging Techniques
- Lattice Boltzmann Simulation Studies
- Machine Learning and Data Classification
- Distributed systems and fault tolerance
- Cloud Computing and Resource Management
- Formal Methods in Verification
- Astro and Planetary Science
- Advanced Neural Network Applications
- Embedded Systems Design Techniques
- Semantic Web and Ontologies
- Algorithms and Data Compression
- Logic, programming, and type systems
- Solar and Space Plasma Dynamics
- Natural Language Processing Techniques
- Meteorological Phenomena and Simulations
- Electrochemical sensors and biosensors
Lawrence Livermore National Laboratory
2015-2024
Iowa State University
2023
University of California, Merced
2023
Argonne National Laboratory
2023
National Taipei University of Technology
2018-2019
University of Minnesota
2008-2014
National Taipei University of Nursing and Health Science
2009-2010
We report on numerical simulations of the detailed evolution single mode Rayleigh-Taylor [Lord Rayleigh, Scientific Papers II (Cambridge University Press, Cambridge, 1900), p. 200; G. I. Taylor, “The instability liquid surfaces when accelerated in a direction perpendicular to their plane,” Proc. R. Soc. London, Ser. A 201, 192 (1950)10.1098/rspa.1950.0052; S. Chandrasekhar, Hydrodynamic and Hydromagnetic Stability (Oxford Oxford, 1961)] late times high aspect ratios. In contrast established...
We present the first three-dimensional, fully compressible gas-dynamics simulations in 4π geometry of He-shell flash convection with proton-rich fuel entrainment at upper boundary. This work is motivated by insufficiently understood observed consequences H-ingestion post-asymptotic giant branch (post-AGB) stars (Sakurai's object) and metal-poor AGB stars. Our investigation focused on process top boundary subsequent advection H-rich material into deeper layers, we therefore ignore burning...
We performed three-dimensional simulations of proton-rich material entrainment into 12C-rich He-shell flash convection and the subsequent H-ingestion that took place in post-asymptotic giant branch star Sakurai's object. Observations transient nature anomalous abundance features are available to validate our method assumptions, with aim applying them very low-metallicity stars future. include nuclear energy feedback from H burning cover full 4π geometry shell. Runs on 7683 15363 grids agree...
Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, performance in high-performance computing (HPC) domain tasks has been less than optimal due to specialized expertise required interpret model responses. In response this challenge, we propose HPC-GPT, a novel LLaMA-based that supervised fine-tuning using generated QA (Question-Answer) instances for HPC domain. To evaluate its...
Data races in multi-threaded parallel applications are notoriously damaging while extremely difficult to detect. Many tools have been developed help programmers find data races. However, there is no dedicated OpenMP benchmark suite systematically evaluate race detection for their strengths and limitations.
Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering fine-tuning techniques. We create dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing presence pairs...
Loop fusion is an important compiler optimization for improving memory hierarchy performance through enabling data reuse. Traditional compilers have approached loop in a manner decoupled from other high-level optimizations, missing several interesting solutions. Recently, the polyhedral framework with its ability to compose complex transformations, has proved be promising performing optimizations small programs. However, our experiments large programs using state-of-the-art frameworks reveal...
Transition metal carbides have shown potential for use in electrochemical applications due to their excellent electronic conductivity, stability and electrocatalysis.
Machine learning (ML) techniques have been widely studied to address various challenges of productively and efficiently running large-scale scientific applications on heterogeneous supercomputers. However, it is extremely difficult generate, access, maintain training datasets AI models accelerate ML-based research. The Future Research Communications e-Scholarship has proposed the FAIR data principles describing Findability, Accessibility, Interoperability, Reusability. In this paper, we...
Among the most common and hardest to debug types of bugs in concurrent systems are data races. In this paper, we present an approach for verifying that OpenMP program is race free. We use polyhedral analysis verify those parts where detect parallel affine loop nests. show applicability with analysis-enabling transformations detection HPC applications. evaluate our dedicated benchmark suite DataRaceBench LLNL Proxy Application AMG2013 which consists 75,000 LOC. Our evaluation shows can...
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in work-stations, servers supercomputers for scientific engineering applications. Efficiently exploiting the massive parallelism these accelerators provide requires designs implementations of productive programming models.
In recent years, language models (LMs), such as GPT-4, have been widely used in multiple domains, including natural processing, visualization, and so on. However, applying them for analyzing optimizing high-performance computing (HPC) software is still challenging due to the lack of HPC-specific support. this paper, we design LM4HPC framework facilitate research development HPC analyses optimizations using LMs. Tailored supporting datasets, AI models, pipelines, our built on top a range...
The IBM Cell processor represents the first and most extreme of a new generation multicore CPUs. For scientific codes that can be formulated in terms vector computing concepts, as far we know, is rewarding. In this article, present method for implementing numerical algorithms so they run efficiently on other We our using piecewise-parabolic (PPM) gas dynamics algorithm but believe many could benefit from approach. Nevertheless, code transformations are difficult to perform manually,...
Artificial Intelligence (AI) is being adopted in different domains at an unprecedented scale. A significant interest the scientific community also involves leveraging machine learning (ML) to effectively run high performance computing applications Given multiple efforts this arena, there are often duplicated when existing rich data sets and ML models could be leveraged instead. The primary challenge a lack of ecosystem reuse reproduce datasets. In work, we propose HPCFAIR, modular,...
In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability applicability, the is created from range of representative open-source benchmarks. It also refined using meticulous code similarity test. The effectiveness our assessed both quantitative (CodeBLEU) qualitative (human evaluation) methods. We showcase how significantly elevates translation competencies large language (LLMs). Specifically, without...
NVIDIA's unified memory (UM) creates a pool of managed mem- ory on top physically separated CPU and GPU memories. UM automatically migrates page-level data on-demand so program- mers can quickly write CUDA codes heterogeneous machines without tedious error-prone manual management. To improve performance, NVIDIA allows advanced programmers to pass additional use hints its driver. However, it is extremely difficult for decide when how effi- ciently memory, given the complex interactions...
Abstract We report initial experience with gas dynamics simulation on the Los Alamos Roadrunner machine. In this work, we have restricted our attention to flows in which flow Mach number is less than 2. This permits us use a simplified version of PPM algorithm that has been described detail by Woodward (2006). follow multifluid volume fraction using PPB moment‐conserving advection scheme, enforcing both pressure and temperature equilibrium between two monatomic ideal gases within each grid...
Seismic waves fourth order (SW4) solves the seismic wave equations on Cartesian and curvilinear grids using large compute clusters with O (100,000) cores. This article discusses porting of SW4 to run CORAL architecture RAJA performance portability abstraction layer. The performances key kernels CUDA are compared estimate penalty Code changes required for efficiency GPUs minimizing time spent in Message Passing Interface (MPI) discussed. describes a path efficiently code bases GPU-based...