- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Numerical Methods in Computational Mathematics
- Computational Fluid Dynamics and Aerodynamics
- Data Management and Algorithms
- Electromagnetic Simulation and Numerical Methods
- Matrix Theory and Algorithms
- Computational Geometry and Mesh Generation
- Advanced Image and Video Retrieval Techniques
- Medical Imaging Techniques and Applications
- Advanced Clustering Algorithms Research
- Interconnection Networks and Systems
- Advanced Data Storage Technologies
- Computer Graphics and Visualization Techniques
- Fluid Dynamics Simulations and Interactions
- Meteorological Phenomena and Simulations
- Cloud Computing and Resource Management
- Gas Dynamics and Kinetic Theory
- Hydrocarbon exploration and reservoir analysis
- Advanced Mathematical Modeling in Engineering
- Plasma and Flow Control in Aerodynamics
- Structural Analysis of Composite Materials
- Scientific Computing and Data Management
- Cooperative Communication and Network Coding
- Software Engineering Research
Oak Ridge National Laboratory
2017-2024
National Technical Information Service
2019-2022
Office of Scientific and Technical Information
2019-2022
Los Alamos National Laboratory
2019
Sandia National Laboratories
2013-2017
Sandia National Laboratories California
2016
University of Houston
2010-2012
National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”
2002
Physicotechnical Institute
1981
Trilinos is an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. While was originally designed scalable solutions large problems, fidelity needed by many simulations significantly greater than what one could have envisioned two decades ago. When problem sizes exceed a billion elements even applications solver stacks require complete revision. The second-generation employs C++ templates in order to solve arbitrarily...
Searching for geometric objects that are close in space is a fundamental component of many applications. The performance search algorithms comes to the forefront as size problem increases both terms total object count well number queries performed. Scientific applications requiring modern leadership-class supercomputers also pose an additional requirement portability, i.e., being able efficiently utilize variety hardware architectures. In this article, we introduce new open-source C++...
DBSCAN is a well-known density-based clustering algorithm to discover arbitrary shape clusters. While conceptually simple in serial, the challenging efficiently parallelize on manycore GPU architectures. Common pitfalls, such as asynchronous range query calls, result high thread execution divergence many implementations. In this paper, we propose new framework for GPU-accelerated DBSCAN, and describe two tree-based algorithms within that framework. Both fuse search neighbors with updating...
Trilinos is an object-oriented software framework for the solution of large-scale, complex multi-physics engineering and scientific problems. While original version was designed highly scalable solutions large problems, need increasingly higher fidelity simulations has pushed problem sizes beyond what could have been envisioned two decades ago. When exceed a billion elements even applications solver stacks require complete revision. The next-generation employs C++ templates in order to solve...
Although many active scientific codes use modern Fortran, most contemporary software "libraries" are implemented in C and C++. Providing their numerical, algorithmic, or data management features to Fortran requires writing maintaining substantial amounts of glue code. This article introduces a tool that automatically generates native 2003 interfaces C++ libraries. The supports have no direct analog, such as templated functions exceptions. A set simple examples demonstrate the utility scope...
Algebraic multigrid (AMG) preconditioners are considered for discretized systems of partial differential equations (PDEs) where unknowns associated with different physical quantities not necessarily co-located at mesh points. Specifically, we investigate a $Q_2-Q_1$ mixed finite element discretization the incompressible Navier-Stokes number velocity nodes is much greater than pressure nodes. Consequently, some degrees-of-freedom (dofs) defined spatial locations there no corresponding dofs....
Computing the Euclidean minimum spanning tree (Emst) is a computationally demanding step of many algorithms. While work-efficient serial and multithreaded algorithms for computing Emst are known, designing an efficient GPU algorithm challenging due to complex branching structure, data dependencies, load imbalances. In this paper, we propose single-tree Borůvka-based on GPUs. We use nearest neighbor reduce number required distance calculations by avoiding traversing subtrees with leaf nodes...
Abstract We develop and analyze a new multilevel preconditioner for algebraic systems arising from the finite volume discretization of 3D diffusion–reaction problems in highly heterogeneous media. The system matrices are assumed to be symmetric M ‐matrices. is based on special coarsening algorithm inner Chebyshev iterative procedure. condition number preconditioned matrix does not depend coefficients diffusion operator. Numerical experiments confirm theoretical results reveal competitiveness...
In this paper we consider implementation algorithms and applications of the discretization method for diffusion equations on polygonal (2D) polyhedral (3D) meshes recently proposed by one authors in [11]. The is based approximation fluxes mixed finite element appropriate piecewise constant vector functions inside mesh cells. new are discontinuous cells but their normal components continuous interfaces between neighbouring Numerical results given test problems relevant to reservoir simulation...
As a rule, Top 500 class supercomputers are extensively benchmarked as part of their acceptance testing process. However, barring publicly posted LINPACK / HPCG results, most benchmark results often inaccessible outside the hosting institution. Moreover, these higher level benchmarks do not provide easy answers to common questions such "What is realizable memory bandwidth?" or launch latency on accelerator?" To partially address issues, we executed selected single-node micro-benchmarks —...
We demonstrate use of a modern Fortran solver interface to manage algorithms for an implicit barotropic mode in the Model Predictions Across Scales-Ocean (MPAS-O). ForTrilinos, Trilinos that contains large collection capabilities written C++, has been implemented MPAS-O provide access suite linear options. By virtue simplified wrapper and generator (SWIG) automation tool generates interfaces C++ code, we were able implement using familiar coding style while minimizing performance...
This paper introduces Pandora, a parallel algorithm for computing dendrograms, the hierarchical cluster trees single linkage clustering (SLC). Current approaches construct dendrograms by partitioning minimum spanning tree and removing edges. However, they struggle with skewed, hard-to-parallelize real-world dendrograms. Consequently, is sequential bottleneck in HDBSCAN*[21], popular SLC variant.
ArborX is a performance portable geometric search library developed as part of the Exascale Computing Project (ECP). In this paper, we explore collaboration between and cosmological simulation code HACC. Large simulations on exascale platforms encounter bottleneck due to in-situ analysis requirements halo finding, problem identifying dense clusters dark matter (halos). This solved by using density-based DBSCAN clustering algorithm. With each MPI rank handling hundreds millions particles, it...