- Matrix Theory and Algorithms
- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Numerical Methods and Algorithms
- Electromagnetic Scattering and Analysis
- Advanced Numerical Methods in Computational Mathematics
- Scientific Computing and Data Management
- Advanced Data Storage Technologies
- Stochastic Gradient Optimization Techniques
- Cloud Computing and Resource Management
- Interconnection Networks and Systems
- Model Reduction and Neural Networks
- Advanced Optimization Algorithms Research
- Numerical methods for differential equations
- Tensor decomposition and applications
- Quantum Computing Algorithms and Architecture
- Low-power high-performance VLSI design
- Embedded Systems Design Techniques
- Research Data Management Practices
- Neural Networks and Applications
- Sparse and Compressive Sensing Techniques
- Radiation Effects in Electronics
- Algorithms and Data Compression
- Polynomial and algebraic computation
- Advanced Database Systems and Queries
University of Tennessee at Knoxville
2015-2024
Heilbronn University
2024
Technical University of Munich
2024
Karlsruhe Institute of Technology
2012-2024
Universitat Politècnica de València
2023
University of Tennessee System
2015
The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration low-precision special-function units designed for machine learning applications, traditional community urgently needs reconsider floating point formats used in distinct operations efficiently leverage available compute power. In this work, we provide a comprehensive survey routines, including...
In this article, we present Ginkgo , a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, ’s design principle abstracts all functionality as “linear operators,” motivating the notation of operator library.” current focus is oriented toward providing sparse graphics processing unit (GPU) architectures, but given design, can be easily extended to accommodate other algorithms hardware architectures. We...
Abstract. To manage Earth in the Anthropocene, new tools, institutions, and forms of international cooperation will be required. Virtualization Engines is proposed as an federation centers excellence to empower all people respond immense urgent challenges posed by climate change.
Summary We propose an adaptive scheme to reduce communication overhead caused by data movement selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). This specialized can then be combined with any Krylov subspace method for solution sparse linear systems perform all arithmetic double precision. assess effects on iteration count and transfer cost preconditioned conjugate gradient solver. A is, general, memory...
The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems equations. In recent years, techniques to avoid communication in GMRES have gained attention because comparison floating-point operations, becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now crucial component computing, we investigate effectiveness these multicore CPUs with multiple GPUs. While present...
Many problems in engineering and scientific computing require the solution of a large number small systems linear equations. Due to their high processing power, Graphics Processing Units became an attractive target for this class problems, routines based on LU QR factorization have been provided by NVIDIA cuBLAS library. This work addresses situation where equations are symmetric positive definite. The paper describes implementation tuning kernels Cholesky forward backward substitution....
<ns3:p>Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements embeds knowledge, constitutes an essential product itself. Research must be sustainable order to understand, replicate, reproduce, build upon or conduct effectively. In other words, available, discoverable, usable, adaptable needs, both now the future. therefore requires environment that supports sustainability.</ns3:p><ns3:p> </ns3:p><ns3:p> Hence,...
We analyze a Balancing Domain Decomposition by Constraints (BDDC) preconditioner for the solution of three dimensional composite Discontinuous Galerkin discretizations reaction-diffusion systems ordinary and partial differential equations arising in cardiac cell-by-cell models like Extracellular space, Membrane Intracellular space (EMI) Model. These microscopic are essential understanding events aging structurally diseased hearts which macroscopic relying on homogenized descriptions tissue,...
While testing is increasingly recognized as essential in scientific software development, it not yet standard practice within the OpenFOAM community for developing new solvers and features. This gap stems partly from challenges of integrating into typical workflows limited guidance on implementing effective tests. Writing tests complex like based projects presents unique obstacles, including difficulty configuring various cases. paper addresses these issues by discussing established test...
Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development data formats, computational techniques, and implementations that strike balance between thread divergence, which inherent for Matrices, padding, alleviates performance-detrimental divergence but introduces artificial overheads. To this end, article, we address challenge designing high performance sparse...
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements embeds knowledge, constitutes an essential product itself. must be sustainable order to understand, replicate, reproduce, build upon or conduct effectively. In other words, available, discoverable, usable, adaptable needs, both now the future. therefore requires environment that supports sustainability. Hence, change is needed way development maintenance are...
Within the past years, hardware vendors have started designing low precision special function units in response to demand of Machine Learning community and their for high compute power formats. Also server-line products are increasingly featuring low-precision units, such as NVIDIA tensor cores ORNL's Summit supercomputer providing more than an order magnitude higher performance what is available IEEE double precision. At same time, gap between on one hand memory bandwidth other keeps...
Abstract. To manage Earth in the Anthropocene, new tools, institutions, and forms of international cooperation will be required. Virtualization Engines are proposed as federation centers excellence to empower all people respond immense urgent challenges posed by climate change.
The US Exascale Computing Project (ECP) has succeeded in preparing applications to run efficiently on the first reported supercomputers world. To achieve this, it modernized whole leadership software stack, from libraries simulation codes. In this article, we contrast selected before and after ECP. We discuss how sustainable research development for computing can embrace conversation with hardware vendors, facilities, community, domain scientists who are application developers integrators of...
On the eve of exascale computing, traditional wisdom no longer applies. High-performance computing is gone as we know it. This article discusses a range new algorithmic techniques emerging in context many which defy common high-performance and are considered unorthodox, but could turn out to be necessity near future.
This paper presents a heterogeneous CPU-GPU implementation for sparse iterative eigensolver -- the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG). For key routine generating Krylov search spaces via product of matrix and block vectors, we propose GPU kernel based on modified sliced ELLPACK format. Blocking set vectors processing them simultaneously accelerates computation consecutive SpMVs significantly. Comparing performance against similar routines from Intel's MKL...
Ginkgo is a production-ready sparse linear algebra library for high performance computing on GPU-centric architectures with level of portability and focuses software sustainability.
The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption specialized hardware and data formats low-precision arithmetic high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing working order to speed up computations. For whose performance bound by memory bandwidth, idea compressing its before (and after) accesses received considerable attention. One store an...
In this paper we accelerate the Alternating Least Squares (ALS) algorithm used for generating product recommendations on basis of implicit feedback datasets. We approach with concepts proven to be successful in High Performance Computing. This includes formulation as a mix cache-optimized algorithm-specific kernels and standard BLAS routines, acceleration via graphics processing units (GPUs), use parallel batched kernels, autotuning identify performance winners. For benchmark datasets,...