NFDI4DS | UHH-SEMS - Publication Details

Robert Strzodka

ORCID: 0000-0003-0468-0472

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5047006398

Research Areas

Parallel Computing and Optimization Techniques
Computer Graphics and Visualization Techniques
Matrix Theory and Algorithms
Distributed and Parallel Computing Systems
Advanced Numerical Methods in Computational Mathematics
Advanced Vision and Imaging
Medical Image Segmentation Techniques
Advanced Data Storage Technologies
Numerical Methods and Algorithms
Interconnection Networks and Systems
Advanced Numerical Analysis Techniques
3D Shape Modeling and Analysis
Computational Geometry and Mesh Generation
Tensor decomposition and applications
Graph Theory and Algorithms
Digital Image Processing Techniques
Data Visualization and Analytics
Electromagnetic Scattering and Analysis
Digital Filter Design and Implementation
Video Surveillance and Tracking Methods
Robotics and Sensor-Based Localization
Advanced Mathematical Modeling in Engineering
Numerical methods for differential equations
Generative Adversarial Networks and Image Synthesis
Iterative Methods for Nonlinear Equations

Heidelberg University
2020-2024

Max Planck Institute for Informatics
2008-2014

Max Planck Society
2010-2013

Nvidia (United States)
2013

Stanford University
2006-2008

Center of Advanced European Studies and Research
2003-2006

Caesar Systems (United Kingdom)
2006

University of Duisburg-Essen
2002

University of Bonn
2000-2001

AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods

OPENALEX - Publications

Maxim Naumov M. Arsaev Patrice Castonguay Jonathan Cohen Julien Demouth and 7 more

The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so that they require scale distributed parallel computing to obtain interest a reasonable time. this paper we discuss design implementation AmgX library, which provides drop-in GPU acceleration algebraic multigrid (AMG) preconditioned iterative methods. library implements both classical aggregation-based AMG...

10.1137/140980260 article EN SIAM Journal on Scientific Computing 2015-01-01

Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations

OPENALEX - Publications

Dominik Göddeke Robert Strzodka Stefan Turek

In this survey paper, we compare native double precision solvers with emulated- and mixed-precision of linear systems equations as they typically arise in finite element discretisations. The emulation utilises two single float numbers to achieve higher precision, while the mixed iterative refinement computes residuals updates solution vector but solves residual precision. Both techniques have been known since 1960s, little attention has devoted their performance aspects. Motivated by...

10.1080/17445760601122076 article EN International Journal of Parallel Emergent and Distributed Systems 2007-06-14

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

OPENALEX - Publications

Dominik Göddeke Robert Strzodka Jamaludin Mohd-Yusof Patrick McCormick Sven H. M. Buijssen and 2 more

10.1016/j.parco.2007.09.002 article EN Parallel Computing 2007-10-01

Glift

OPENALEX - Publications

Aaron Lefohn Shubhabrata Sengupta Joe Kniss Robert Strzodka John D. Owens

This article presents Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures. Like modern CPU structure libraries, Glift enables GPU programmers to separate algorithms from definitions; thereby greatly simplifying algorithmic development enabling reusable interchangeable We characterize a large body of previously published structures in terms our present several new The structures, stack, quadtree, octree, are explained...

10.1145/1122501.1122505 article EN ACM Transactions on Graphics 2006-01-01

Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid

OPENALEX - Publications

Dominik Göddeke Robert Strzodka

We have previously suggested mixed precision iterative solvers specifically tailored to the solution of sparse linear equation systems as they typically arise in finite element discretization partial differential equations. These schemes been evaluated for a number hardware platforms, particular, single-precision GPUs accelerators general purpose CPU. This paper reevaluates situation with new that run entirely on GPU: demonstrate constitute significant performance gain over native double...

10.1109/tpds.2010.61 article EN IEEE Transactions on Parallel and Distributed Systems 2010-04-09

General Purpose Computation on Graphics Hardware

OPENALEX - Publications

Aaron Lefohn Ian Buck Patrick McCormick John D. Owens Timothy J. Purcell and 1 more

10.1109/vis.2005.43 article EN 2006-01-05

Using GPUs to improve multigrid solver performance on a cluster

OPENALEX - Publications

Dominik Göddeke Robert Strzodka Jamaludin Mohd-Yusof Patrick McCormick Hilmar Wobker and 2 more

This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers.The focus lies both system performance a minimally invasive integration hardware acceleration into an existing software package, requiring no changes to application code.Because their excellent price ratio, we demonstrate viability our approach by using commodity graphics processors (GPUs) as preconditioners.We address issue limited precision...

10.1504/ijcse.2008.021111 article EN International Journal of Computational Science and Engineering 2008-01-01

Cache Accurate Time Skewing in Iterative Stencil Computations

OPENALEX - Publications

Robert Strzodka Mohammed Shaheen Dawid Pająk Hans‐Peter Seidel

We present a time skewing algorithm that breaks the memory wall for certain iterative stencil computations. A computation, even with constant weights, is completely memory-bound algorithm. For example, large 3D domain of 500 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> doubles and 100 iterations on quad-core Xeon X5482 3.2GHz system, hand-vectorized parallelized naive 7-point implementation achieves only 1.4 GFLOPS because system...

10.1109/icpp.2011.47 article EN International Conference on Parallel Processing 2011-09-01

Level set segmentation in graphics hardware

OPENALEX - Publications

Martin Rumpf Robert Strzodka

Implicit active contours are a very flexible technique in the segmentation of digital images. A novel type hardware implementation is presented here to approach real time applications We propose exploit high performance modern graphics cards for numerical computations. Vectors regarded as images and linear algebraic operations on vectors realized by image blending. Thus, benefits from memory bandwidth economy command transfers, while restricted precision does not infect qualitative behavior...

10.1109/icip.2001.958320 article EN 2002-11-13

Generalized distance transforms and skeletons in graphics hardware

OPENALEX - Publications

Robert Strzodka Alexandru Telea

We present a framework for computing generalized distance transforms and skeletons of two-dimensional objects using graphics hardware. Our method is based on the concept footprint splatting. Combining different splats produces weighted metrics, as well corresponding Voronoi diagrams. hierarchical acceleration scheme subdivision that allows visualizing computed with subpixel accuracy in real time. splatting approach one to easily change all metric parameters, treat any 2D boundaries, produce...

10.5555/2384225.2384258 article EN Eurographics 2004-05-19

Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components

OPENALEX - Publications

Robert Strzodka Dominik Göddeke

FPGAs are becoming more and attractive for high precision scientific computations. One of the main problems in efficient resource utilization is quadratically growing usage multipliers depending on operand size. Many research efforts have been devoted to optimization individual arithmetic linear algebra operations. In this paper authors take a higher level approach seek reduce intermediate computational algorithmic by optimizing accuracy towards final result an algorithm. our case accurate...

10.1109/fccm.2006.57 article EN 2006-04-01

Cache oblivious parallelograms in iterative stencil computations

OPENALEX - Publications

Robert Strzodka Mohammed Shaheen Dawid Pająk Hans‐Peter Seidel

We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. compare execution times 2D and 3D spatial domains with up to 128 million double precision elements constant variable stencils against hand-optimized naive code the automatic polyhedral parallelizer locality optimizer PluTo demonstrate clear superiority our results.

10.1145/1810085.1810096 article EN 2010-06-02

An Implementation of Tensor Product Patch Smoothers on GPUs

OPENALEX - Publications

Cu Cui Paul Grosse-Bley Guido Kanschat Robert Strzodka

10.1137/24m1642706 article EN SIAM Journal on Scientific Computing 2025-03-05

Image Registration by a Regularized Gradient Flow. A Streaming Implementation in DX9 Graphics Hardware

OPENALEX - Publications

Robert Strzodka Marc Droske Martin Rumpf

10.1007/s00607-004-0087-x article EN Computing 2004-09-28

Real-time motion estimation and visualization on graphics cards

OPENALEX - Publications

Robert Strzodka Christoph S. Garbe

We present a tool for real-time visualization of motion features in 2D image sequences. The is estimated through an eigenvector analysis the spatio-temporal structure tensor at every pixel location. This approach computationally demanding but allows reliable velocity estimates as well quality indicators obtained results. use color map and region interest selector velocities. On selected velocities we apply hierarchical smoothing scheme which choice desired scale field. demonstrate several...

10.1109/visual.2004.88 article EN IEEE Visualization 2005-03-21

Nonlinear diffusion in graphics hardware

OPENALEX - Publications

Martin Rumpf Robert Strzodka

10.2312/vissym/vissym01/075-084 article EN Eurographics 2001-05-28

A graphics hardware implementation of the generalized Hough transform for fast object recognition, scale, and 3D pose detection

OPENALEX - Publications

Robert Strzodka Ivo Ihrke Marcus Magnor

The generalized Hough transform constitutes a wellknown approach to object recognition and pose detection. To attain reliable detection results, however, very large number of candidate poses scale factors need be considered. We employ an inexpensive, consumer-market graphics-card as the "poor man's" parallel processing system. describe implementation fast enhanced version on graphics hardware. Thanks high bandwidth on-board texture memory, single can evaluated in less than 3 ms, independent...

10.1109/iciap.2003.1234048 article EN 2004-02-03

Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU

OPENALEX - Publications

Dominik Göddeke Hilmar Wobker Robert Strzodka Jamaludin Mohd-Yusof Patrick McCormick and 1 more

We have previously presented an approach to include graphics processing units as co-processors in a parallel Finite Element multigrid solver called FEAST. In this paper we show that the acceleration transfers real applications built on top of FEAST, without any modifications application code. The chosen solid mechanics code is well suited assess practicability our due higher accuracy requirements and more diverse CPU/co-processor interaction. demonstrate detail single precision execution...

10.1504/ijcse.2009.029162 article EN International Journal of Computational Science and Engineering 2009-01-01

Scientific computation for simulations on programmable graphics hardware

OPENALEX - Publications

Robert Strzodka Michael Doggett Andreas Kolb

10.1016/j.simpat.2005.08.001 article EN Simulation Modelling Practice and Theory 2005-09-15

NUMA Aware Iterative Stencil Computations on Many-Core Systems

OPENALEX - Publications

Mohammed Shaheen Robert Strzodka

Temporal blocking in iterative stencil computations allows to surpass the performance of peak system bandwidth that holds for a single computation. However, effectiveness temporal depends strongly on tiling scheme, which must account contradicting goals spatio-temporal data locality, regular memory access patterns, parallelization into many independent tasks, and datato-core affinity NUMA-aware distribution. Despite prevalence cache coherent non-uniform (ccNUMA) todays many-core systems,...

10.1109/ipdps.2012.50 article EN 2012-05-01

Dynamic adaptive shadow maps on graphics hardware

OPENALEX - Publications

Aaron Lefohn Shubhabrata Sengupta Joe Kniss Robert Strzodka John D. Owens

Author(s): Lefohn, Aaron; Sengupta, Shubhabrata; Kniss, Joe M.; Strzodka, Robert; Owens, John D. | Abstract: We present a novel implementation of adaptive shadow maps (ASMs) that performs all lookups and scene analysis on the GPU, enabling interactive rendering with ASMs while moving both light camera. Adaptive offer rigorous solution to projective perspective map aliasing maintaining simplicity purely image-based technique. The complexity ASM data structure, however, has prevented full...

10.1145/1187112.1187126 article EN 2005-01-01

Coming Soon ...