- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Interconnection Networks and Systems
- Cloud Computing and Resource Management
- Embedded Systems Design Techniques
- Advanced Data Storage Technologies
- Algorithms and Data Compression
- Software System Performance and Reliability
- Advanced Software Engineering Methodologies
- Real-Time Systems Scheduling
- Distributed systems and fault tolerance
- Air Quality and Health Impacts
- Scientific Computing and Data Management
- Formal Methods in Verification
- Climate Change and Health Impacts
- Logic, programming, and type systems
- Graph Theory and Algorithms
- Simulation Techniques and Applications
- Quantum Computing Algorithms and Architecture
- Computer Graphics and Visualization Techniques
- Complex Network Analysis Techniques
- Opinion Dynamics and Social Influence
- Game Theory and Applications
- Cellular Automata and Applications
- Low-power high-performance VLSI design
Linköping University
2015-2024
Hasso Plattner Institute
2016-2017
Rhenish Institute for Environmental Research
2010-2014
University of Cologne
2010-2014
University of Kaiserslautern
2012
Linnaeus University
2011
Karlsruhe Institute of Technology
2011
Kreditanstalt für Wiederaufbau
2009
Universität Trier
1995-2003
Saarland University
1994-2002
We present SkePU, a C++ template library which provides simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA OpenCL. The is also general enough to support other architectures, SkePU implements both sequential CPU parallel OpenMP backend. It supports multi-GPU systems.
Recent studies have shown an association of short-term exposure to fine particulate matter (PM) with transient increases in blood pressure (BP), but it is unclear whether long-term has effect on arterial BP and hypertension.
Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. However, exploiting the available of architectures be challenging. There various programming frameworks OpenMP, OpenCL, OpenACC, CUDA) selecting one is for a target context not straightforward. In this paper, we study empirically...
PEPPHER, a three-year European FP7 project, addresses efficient utilization of hybrid (heterogeneous) computer systems consisting multicore CPUs with GPU-type accelerators. This article outlines the PEPPHER performance-aware component model, performance prediction means, runtime system, and other aspects project. A larger example demonstrates portability approach across one to four GPUs.
In this article we present SkePU 2, the next generation of C++ skeleton programming framework for heterogeneous parallel systems. We critically examine design and limitations 1 interface. a new, flexible type-safe, interface in source-to-source transformation tool which knows about 2 constructs such as skeletons user functions. demonstrate how compiler transforms programs to enable efficient execution on show enables new use-cases applications by increasing flexibility from 1, errors can be...
SkePU is a C++ template library that provides simple and unified interface for specifying data-parallel computations with the help of skeletons on GPUs using CUDA OpenCL. The also general enough to support other architectures, implements both sequential CPU parallel OpenMP backend. It supports multi-GPU systems. Currently available in include map, reduce, mapreduce, map-with-overlap, maparray, scan. performance generated code comparable hand-written code, even more complex applications such...
We discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming heterogeneous multicore systems. Together, these also support performance portability, as currently investigated in EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by integration SkePU C++ skeleton library with StarPU runtime system dynamic scheduling selection suitable execution units parallel tasks; (2)...
We describe the principles of a novel framework for performance-aware composition sequential and explicitly parallel software components with implementation variants. Automatic results in table-driven that, each call component, looks up expected best variant, processor allocation schedule given current problem, group sizes. The dispatch tables are computed off-line at component deployment time by an interleaved dynamic programming algorithm from time-prediction meta-code provided supplier....
Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for collection streaming tasks include parallelizable or moldable on generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical sets in all are running concurrently, so cores typically run several scheduled round-robin at user level data-driven way. A...
Abstract We present the third generation of C++-based open-source skeleton programming framework SkePU. Its main new features include skeletons, data container types, support for returning multiple objects from instances and user functions, specifying alternative platform-specific functions to exploit e.g. custom SIMD instructions, generalized scheduling variants multicore CPU backends, a cluster-backend targeting MPI interface provided by StarPU task-based runtime system. have also revised...
Cell Broadband Engine is a heterogeneous multicore processor for high-performance computing and gaming. Its architecture allows an impressive peak performance but, at the same time, makes it very hard to write efficient code. The need simultaneously exploit SIMD instructions, coordinate parallel execution of slave processors, overlap DMA memory traffic with computation, keep data properly aligned in memory, explicitly manage small on-chip buffers leads complex In this work, we adopt skeleton...
We discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming heterogeneous multicore systems. Together, these also support performance portability, as currently investigated in EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by integration SkePU C++ skeleton library with StarPU runtime system dynamic scheduling selection suitable execution units parallel tasks; (2)...
The PEPPHER component model defines an environment for annotation of native C/C++ based components homogeneous and heterogeneous multicore manycore systems, including GPU multi-GPU systems. For the same computational functionality, captured as a component, different sequential explicitly parallel implementation variants using various types execution units might be provided, together with metadata such exposed tunable parameters. goal is to compose application from its that, depending on...
This article describes a knowledge‐based system for automatic parallelization of wide class sequential numerical codes operating on vectors and dense matrices, execution distributed memory message‐passing multiprocessors. Its main feature is fast powerful pattern recognition tool that locally identifies frequently occurring computations programming concepts in the source code. also works dusty deck have been "encrypted" by former machine‐specific code transformations. Successful guides...
In this paper we present two algorithms for integrated code generation clustered VLIW architectures. One algorithm is a heuristic based on genetic algorithms, the other integer linear programming. The performance of are compared portion Mediabench [10] benchmark suite. We found results to be within one or clock cycles from optimal cases where optimum known. addition produces in predictable time also when program fails.
In this work we report results from a new integrated method of automatically generating parallel code Modelica models by combining parallelization at two levels abstraction. Performing inline expansion Runge-Kutta solver combined with fine-grained automatic the right-hand side resulting equation system opens up possibilities for high performance code, which is becoming increasingly relevant when multi-core computers are commonplace. An implementation, in form backend module OpenModelica...