Hartwig Anzt

ORCID: 0000-0003-2177-952X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Matrix Theory and Algorithms
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Numerical Methods and Algorithms
  • Electromagnetic Scattering and Analysis
  • Advanced Numerical Methods in Computational Mathematics
  • Scientific Computing and Data Management
  • Advanced Data Storage Technologies
  • Stochastic Gradient Optimization Techniques
  • Cloud Computing and Resource Management
  • Interconnection Networks and Systems
  • Model Reduction and Neural Networks
  • Advanced Optimization Algorithms Research
  • Numerical methods for differential equations
  • Tensor decomposition and applications
  • Quantum Computing Algorithms and Architecture
  • Low-power high-performance VLSI design
  • Embedded Systems Design Techniques
  • Research Data Management Practices
  • Neural Networks and Applications
  • Sparse and Compressive Sensing Techniques
  • Radiation Effects in Electronics
  • Algorithms and Data Compression
  • Polynomial and algebraic computation
  • Advanced Database Systems and Queries

University of Tennessee at Knoxville
2015-2024

Heilbronn University
2024

Technical University of Munich
2024

Karlsruhe Institute of Technology
2012-2024

Universitat Politècnica de València
2023

University of Tennessee System
2015

The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration low-precision special-function units designed for machine learning applications, traditional community urgently needs reconsider floating point formats used in distinct operations efficiently leverage available compute power. In this work, we provide a comprehensive survey routines, including...

10.1177/10943420211003313 article EN The International Journal of High Performance Computing Applications 2021-03-19

In this article, we present Ginkgo , a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, ’s design principle abstracts all functionality as “linear operators,” motivating the notation of operator library.” current focus is oriented toward providing sparse graphics processing unit (GPU) architectures, but given design, can be easily extended to accommodate other algorithms hardware architectures. We...

10.1145/3480935 article EN ACM Transactions on Mathematical Software 2022-02-16
Björn Stevens Stefan Adami Tariq Ali Hartwig Anzt Zafer Aslan and 95 more Sabine Attinger Jaana Bäck Johanna Baehr Péter Bauer Natacha B. Bernier Bob Bishop Hendryk Bockelmann Sandrine Bony Guy Brasseur David N. Bresch Sean Breyer Gilbert Brunet Pier Luigi Buttigieg Junji Cao Christelle Castet Yafang Cheng Ayantika Dey Choudhury Deborah R. Coen Susanne Crewell Atish Dabholkar Qing Dai Francisco J. Doblas‐Reyes Dale R. Durran Ayoub El Gaidi Charlie Ewen Eleftheria Exarchou Veronika Eyring Florencia Falkinhoff David Farrell Piers M. Forster Ariane Frassoni Claudia Frauen Oliver Fuhrer Shahzad Gani Edwin P. Gerber Debra Goldfarb Jens Grieger Nicolas Gruber Wilco Hazeleger Rolf Herken Chris Hewitt Torsten Hoefler Huang‐Hsiung Hsu Daniela Jacob Alexandra Jahn Christian Jakob Thomas Jung Christopher Kadow In‐Sik Kang Sarah M. Kang Karthik Kashinath Katharina Kleinen‐von Königslöw Daniel Klocke Uta Kloenne Milan Klöwer Chihiro Kodama Stefan Kollet Tobias Kölling Jenni Kontkanen Steve Kopp Michal Koran Markku Kulmala Hanna K. Lappalainen Fakhria Latifi Bryan Lawrence June‐Yi Lee Quentin Lejeun Christian Lessig Chao Li Thomas Lippert Jürg Luterbacher Pekka Manninen Jochem Marotzke Satoshi Matsouoka Charlotte Merchant Peter Messmer Gero Michel Kristel Michielsen Tomoki Miyakawa Jens Daniel Müller Ramsha Munir Sandeep Narayanasetti Ousmane Ndiaye Carlos A. Nobre Achim Oberg Riko Oki Tuba Özkan-Haller T. N. Palmer Stan Posey Andreas F. Prein Odessa Primus Mike Pritchard Julie Pullen Dian Putrasahan Johannes Quaas

Abstract. To manage Earth in the Anthropocene, new tools, institutions, and forms of international cooperation will be required. Virtualization Engines is proposed as an federation centers excellence to empower all people respond immense urgent challenges posed by climate change.

10.5194/essd-16-2113-2024 article EN cc-by Earth system science data 2024-04-30

Summary We propose an adaptive scheme to reduce communication overhead caused by data movement selectively storing the diagonal blocks of a block‐Jacobi preconditioner in different precision formats (half, single, or double). This specialized can then be combined with any Krylov subspace method for solution sparse linear systems perform all arithmetic double precision. assess effects on iteration count and transfer cost preconditioned conjugate gradient solver. A is, general, memory...

10.1002/cpe.4460 article EN Concurrency and Computation Practice and Experience 2018-03-12

The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems equations. In recent years, techniques to avoid communication in GMRES have gained attention because comparison floating-point operations, becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now crucial component computing, we investigate effectiveness these multicore CPUs with multiple GPUs. While present...

10.1109/ipdps.2014.48 article EN 2014-05-01

Many problems in engineering and scientific computing require the solution of a large number small systems linear equations. Due to their high processing power, Graphics Processing Units became an attractive target for this class problems, routines based on LU QR factorization have been provided by NVIDIA cuBLAS library. This work addresses situation where equations are symmetric positive definite. The paper describes implementation tuning kernels Cholesky forward backward substitution....

10.1109/tpds.2015.2481890 article EN IEEE Transactions on Parallel and Distributed Systems 2015-09-24

<ns3:p>Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements embeds knowledge, constitutes an essential product itself. Research must be sustainable order to understand, replicate, reproduce, build upon or conduct effectively. In other words, available, discoverable, usable, adaptable needs, both now the future. therefore requires environment that supports sustainability.</ns3:p><ns3:p> </ns3:p><ns3:p> Hence,...

10.12688/f1000research.23224.2 preprint EN cc-by F1000Research 2021-01-26

We analyze a Balancing Domain Decomposition by Constraints (BDDC) preconditioner for the solution of three dimensional composite Discontinuous Galerkin discretizations reaction-diffusion systems ordinary and partial differential equations arising in cardiac cell-by-cell models like Extracellular space, Membrane Intracellular space (EMI) Model. These microscopic are essential understanding events aging structurally diseased hearts which macroscopic relying on homogenized descriptions tissue,...

10.48550/arxiv.2502.07722 preprint EN arXiv (Cornell University) 2025-02-11

While testing is increasingly recognized as essential in scientific software development, it not yet standard practice within the OpenFOAM community for developing new solvers and features. This gap stems partly from challenges of integrating into typical workflows limited guidance on implementing effective tests. Writing tests complex like based projects presents unique obstacles, including difficulty configuring various cases. paper addresses these issues by discussing established test...

10.51560/ofj.v5.134 article EN OpenFOAM® Journal 2025-04-26

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development data formats, computational techniques, and implementations that strike balance between thread divergence, which inherent for Matrices, padding, alleviates performance-detrimental divergence but introduces artificial overheads. To this end, article, we address challenge designing high performance sparse...

10.1145/3380930 article EN ACM Transactions on Parallel Computing 2020-03-29

Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements embeds knowledge, constitutes an essential product itself. must be sustainable order to understand, replicate, reproduce, build upon or conduct effectively. In other words, available, discoverable, usable, adaptable needs, both now the future. therefore requires environment that supports sustainability. Hence, change is needed way development maintenance are...

10.12688/f1000research.23224.1 preprint EN cc-by F1000Research 2020-04-27

Within the past years, hardware vendors have started designing low precision special function units in response to demand of Machine Learning community and their for high compute power formats. Also server-line products are increasingly featuring low-precision units, such as NVIDIA tensor cores ORNL's Summit supercomputer providing more than an order magnitude higher performance what is available IEEE double precision. At same time, gap between on one hand memory bandwidth other keeps...

10.48550/arxiv.2007.06674 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Björn Stevens Stefan Adami Tariq Ali Hartwig Anzt Zafer Aslan and 95 more Sabine Attinger Jaana Bäck Johanna Baehr Péter Bauer Natacha B. Bernier Bob Bishop Hendryk Bockelmann Sandrine Bony V. S. Bouchet Guy Brasseur David N. Bresch Sean Breyer Gilbert Brunet Pier Luigi Buttigieg Junji Cao Christelle Castet Yafang Cheng Ayantika Dey Choudhury Deborah R. Coen Susanne Crewell Atish Dabholkar Qing Dai Francisco J. Doblas‐Reyes Dale R. Durran Ayoub El Gaidi Charlie Ewen Eleftheria Exarchou Veronika Eyring Florencia Falkinhoff David Farrell Piers M. Forster Ariane Frassoni Claudia Frauen Oliver Fuhrer Shahzad Gani Edwin P. Gerber Debra Goldfarb Jens Grieger Nicolas Gruber Wilco Hazeleger Rolf Herken Chris Hewitt Torsten Hoefler Huang‐Hsiung Hsu Daniela Jacob Alexandra Jahn Christian Jakob Thomas Jung Christopher Kadow In‐Sik Kang Sarah M. Kang Karthik Kashinath Katharina Kleinen‐von Königslöw Daniel Klocke Uta Kloenne Milan Klöwer Chihiro Kodama Stefan Kollet Tobias Kölling Jenni Kontkanen Steve Kopp Michal Koran Markku Kulmala Hanna K. Lappalainen Fakhria Latifi Bryan Lawrence June‐Yi Lee Quentin Lejeun Christian Lessig Chao Li Thomas Lippert Jürg Luterbacher Pekka Manninen Jochem Marotzke Satoshi Matsouoka Charlotte Merchant Peter Messmer Gero Michel Kristel Michielsen Tomoki Miyakawa Jens Daniel Müller Ramsha Munir Sandeep Narayanasetti Ousmane Ndiaye Carlos A. Nobre Achim Oberg Riko Oki Tuba Özkan-Haller T. N. Palmer Stan Posey Andreas F. Prein Odessa Primus Mike Pritchard Julie Pullen Dian Putrasahan

Abstract. To manage Earth in the Anthropocene, new tools, institutions, and forms of international cooperation will be required. Virtualization Engines are proposed as federation centers excellence to empower all people respond immense urgent challenges posed by climate change.

10.5194/essd-2023-376 preprint EN cc-by 2023-09-22

The US Exascale Computing Project (ECP) has succeeded in preparing applications to run efficiently on the first reported supercomputers world. To achieve this, it modernized whole leadership software stack, from libraries simulation codes. In this article, we contrast selected before and after ECP. We discuss how sustainable research development for computing can embrace conversation with hardware vendors, facilities, community, domain scientists who are application developers integrators of...

10.1109/mcse.2024.3387302 article EN cc-by Computing in Science & Engineering 2024-01-01

On the eve of exascale computing, traditional wisdom no longer applies. High-performance computing is gone as we know it. This article discusses a range new algorithmic techniques emerging in context many which defy common high-performance and are considered unorthodox, but could turn out to be necessity near future.

10.1109/mcse.2017.48 article EN Computing in Science & Engineering 2017-04-28

This paper presents a heterogeneous CPU-GPU implementation for sparse iterative eigensolver -- the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG). For key routine generating Krylov search spaces via product of matrix and block vectors, we propose GPU kernel based on modified sliced ELLPACK format. Blocking set vectors processing them simultaneously accelerates computation consecutive SpMVs significantly. Comparing performance against similar routines from Intel's MKL...

10.5555/2872599.2872609 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2015-04-12

Ginkgo is a production-ready sparse linear algebra library for high performance computing on GPU-centric architectures with level of portability and focuses software sustainability.

10.21105/joss.02260 article EN cc-by The Journal of Open Source Software 2020-08-31

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption specialized hardware and data formats low-precision arithmetic high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing working order to speed up computations. For whose performance bound by memory bandwidth, idea compressing its before (and after) accesses received considerable attention. One store an...

10.1145/3441850 article EN ACM Transactions on Mathematical Software 2021-04-26

In this paper we accelerate the Alternating Least Squares (ALS) algorithm used for generating product recommendations on basis of implicit feedback datasets. We approach with concepts proven to be successful in High Performance Computing. This includes formulation as a mix cache-optimized algorithm-specific kernels and standard BLAS routines, acceleration via graphics processing units (GPUs), use parallel batched kernels, autotuning identify performance winners. For benchmark datasets,...

10.1109/bigdata.2015.7363811 article EN 2021 IEEE International Conference on Big Data (Big Data) 2015-10-01
Coming Soon ...