- Parallel Computing and Optimization Techniques
- Computer Graphics and Visualization Techniques
- Matrix Theory and Algorithms
- Distributed and Parallel Computing Systems
- Advanced Numerical Methods in Computational Mathematics
- Advanced Vision and Imaging
- Medical Image Segmentation Techniques
- Advanced Data Storage Technologies
- Numerical Methods and Algorithms
- Interconnection Networks and Systems
- Advanced Numerical Analysis Techniques
- 3D Shape Modeling and Analysis
- Computational Geometry and Mesh Generation
- Tensor decomposition and applications
- Graph Theory and Algorithms
- Digital Image Processing Techniques
- Data Visualization and Analytics
- Electromagnetic Scattering and Analysis
- Digital Filter Design and Implementation
- Video Surveillance and Tracking Methods
- Robotics and Sensor-Based Localization
- Advanced Mathematical Modeling in Engineering
- Numerical methods for differential equations
- Generative Adversarial Networks and Image Synthesis
- Iterative Methods for Nonlinear Equations
Heidelberg University
2020-2024
Max Planck Institute for Informatics
2008-2014
Max Planck Society
2010-2013
Nvidia (United States)
2013
Stanford University
2006-2008
Center of Advanced European Studies and Research
2003-2006
Caesar Systems (United Kingdom)
2006
University of Duisburg-Essen
2002
University of Bonn
2000-2001
The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so that they require scale distributed parallel computing to obtain interest a reasonable time. this paper we discuss design implementation AmgX library, which provides drop-in GPU acceleration algebraic multigrid (AMG) preconditioned iterative methods. library implements both classical aggregation-based AMG...
In this survey paper, we compare native double precision solvers with emulated- and mixed-precision of linear systems equations as they typically arise in finite element discretisations. The emulation utilises two single float numbers to achieve higher precision, while the mixed iterative refinement computes residuals updates solution vector but solves residual precision. Both techniques have been known since 1960s, little attention has devoted their performance aspects. Motivated by...
This article presents Glift, an abstraction and generic template library for defining complex, random-access graphics processor (GPU) data structures. Like modern CPU structure libraries, Glift enables GPU programmers to separate algorithms from definitions; thereby greatly simplifying algorithmic development enabling reusable interchangeable We characterize a large body of previously published structures in terms our present several new The structures, stack, quadtree, octree, are explained...
We have previously suggested mixed precision iterative solvers specifically tailored to the solution of sparse linear equation systems as they typically arise in finite element discretization partial differential equations. These schemes been evaluated for a number hardware platforms, particular, single-precision GPUs accelerators general purpose CPU. This paper reevaluates situation with new that run entirely on GPU: demonstrate constitute significant performance gain over native double...
This article explores the coupling of coarse and fine-grained parallelism for Finite Element simulations based on efficient parallel multigrid solvers.The focus lies both system performance a minimally invasive integration hardware acceleration into an existing software package, requiring no changes to application code.Because their excellent price ratio, we demonstrate viability our approach by using commodity graphics processors (GPUs) as preconditioners.We address issue limited precision...
We present a time skewing algorithm that breaks the memory wall for certain iterative stencil computations. A computation, even with constant weights, is completely memory-bound algorithm. For example, large 3D domain of 500 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> doubles and 100 iterations on quad-core Xeon X5482 3.2GHz system, hand-vectorized parallelized naive 7-point implementation achieves only 1.4 GFLOPS because system...
Implicit active contours are a very flexible technique in the segmentation of digital images. A novel type hardware implementation is presented here to approach real time applications We propose exploit high performance modern graphics cards for numerical computations. Vectors regarded as images and linear algebraic operations on vectors realized by image blending. Thus, benefits from memory bandwidth economy command transfers, while restricted precision does not infect qualitative behavior...
We present a framework for computing generalized distance transforms and skeletons of two-dimensional objects using graphics hardware. Our method is based on the concept footprint splatting. Combining different splats produces weighted metrics, as well corresponding Voronoi diagrams. hierarchical acceleration scheme subdivision that allows visualizing computed with subpixel accuracy in real time. splatting approach one to easily change all metric parameters, treat any 2D boundaries, produce...
FPGAs are becoming more and attractive for high precision scientific computations. One of the main problems in efficient resource utilization is quadratically growing usage multipliers depending on operand size. Many research efforts have been devoted to optimization individual arithmetic linear algebra operations. In this paper authors take a higher level approach seek reduce intermediate computational algorithmic by optimizing accuracy towards final result an algorithm. our case accurate...
We present a new cache oblivious scheme for iterative stencil computations that performs beyond system bandwidth limitations as though gigabytes of data could reside in an enormous on-chip cache. compare execution times 2D and 3D spatial domains with up to 128 million double precision elements constant variable stencils against hand-optimized naive code the automatic polyhedral parallelizer locality optimizer PluTo demonstrate clear superiority our results.
We present a tool for real-time visualization of motion features in 2D image sequences. The is estimated through an eigenvector analysis the spatio-temporal structure tensor at every pixel location. This approach computationally demanding but allows reliable velocity estimates as well quality indicators obtained results. use color map and region interest selector velocities. On selected velocities we apply hierarchical smoothing scheme which choice desired scale field. demonstrate several...
The generalized Hough transform constitutes a wellknown approach to object recognition and pose detection. To attain reliable detection results, however, very large number of candidate poses scale factors need be considered. We employ an inexpensive, consumer-market graphics-card as the "poor man's" parallel processing system. describe implementation fast enhanced version on graphics hardware. Thanks high bandwidth on-board texture memory, single can evaluated in less than 3 ms, independent...
We have previously presented an approach to include graphics processing units as co-processors in a parallel Finite Element multigrid solver called FEAST. In this paper we show that the acceleration transfers real applications built on top of FEAST, without any modifications application code. The chosen solid mechanics code is well suited assess practicability our due higher accuracy requirements and more diverse CPU/co-processor interaction. demonstrate detail single precision execution...
Temporal blocking in iterative stencil computations allows to surpass the performance of peak system bandwidth that holds for a single computation. However, effectiveness temporal depends strongly on tiling scheme, which must account contradicting goals spatio-temporal data locality, regular memory access patterns, parallelization into many independent tasks, and datato-core affinity NUMA-aware distribution. Despite prevalence cache coherent non-uniform (ccNUMA) todays many-core systems,...
Author(s): Lefohn, Aaron; Sengupta, Shubhabrata; Kniss, Joe M.; Strzodka, Robert; Owens, John D. | Abstract: We present a novel implementation of adaptive shadow maps (ASMs) that performs all lookups and scene analysis on the GPU, enabling interactive rendering with ASMs while moving both light camera. Adaptive offer rigorous solution to projective perspective map aliasing maintaining simplicity purely image-based technique. The complexity ASM data structure, however, has prevented full...