- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Embedded Systems Design Techniques
- Computer Graphics and Visualization Techniques
- Computational Geometry and Mesh Generation
- Interconnection Networks and Systems
- Geological Modeling and Analysis
- Seismic Imaging and Inversion Techniques
- Stochastic Gradient Optimization Techniques
- Seismology and Earthquake Studies
- 3D Shape Modeling and Analysis
- Advanced Data Compression Techniques
- Methane Hydrates and Related Phenomena
- Advanced Numerical Methods in Computational Mathematics
Norwegian University of Science and Technology
2016-2017
Simula Research Laboratory
2012-2016
University of Oslo
2012-2015
Energy efficiency is an important aspect of future exascale systems, mainly due to rising energy cost. Although High performance computing (HPC) applications are compute centric, they still exhibit varying computational characteristics in different regions the program, such as compute-, memory-, and I/O-bound code regions. Some today's clusters already offer mechanisms adjust system resource requirements application, e.g., by controlling CPU frequency. However, manually tuning for improved a...
In the context of multiple GPUs that share same PCIe bus, we propose a new communication scheme leads to more effective overlap and computation. Multiple CUDA streams OpenMP threads are adopted so data can simultaneously be sent received. A representative 3D stencil example is used demonstrate effectiveness our scheme. We compare performance with an MPI-based state-of-the-art Results show approach outperforms scheme, being up 1.85× faster. However, results also indicate current underlying...
A recent trend in modern high-performance computing environments is the introduction of powerful, energy-efficient hardware accelerators such as GPUs and Xeon Phi coprocessors. These specialized devices coexist with CPUs are optimized for highly parallel applications. In regular computing-intensive applications predictable data access patterns, these often far outperform thus relegate latter to pure control functions instead computations. For irregular applications, however, performance gap...
There is a consensus that exascale systems should operate within power envelope of 20MW. Consequently, energy conservation still considered as the most crucial constraint if such are to be realized.
On modern GPU clusters, the role of CPUs is often restricted to controlling GPUs and handling MPI communication. The unused computing power CPUs, however, can be considerable for computations whose performance bounded by memory traffic. This paper investigates challenges simultaneous usage computation. Our emphasis on deriving a heterogeneous CPU+GPU programming approach that combines MPI, OpenMP CUDA. To effectively hide overhead various inter-and intra-node communications, new level task...
We study the problem of contention for memory bandwidth between computation and communication in supercomputers that feature multicore CPUs. The arises when are overlapped both operations compete same bandwidth. This is most visible at limits scalability, take similar amounts time thus must be taken into account order to reach maximum scalability bound applications. Typical examples codes affected by sparse matrix-vector computations, graph algorithms, many machine learning problems, as they...
We present a novel method for 3D anisotropic front propagation and apply it to the simulation of geological folding. The new iterative algorithm has simple structure abundant parallelism, is easily adapted multithreaded architectures using OpenMP. Moreover, we have used automated C-to-CUDA source code translator, Mint, achieve greatly enhanced computing speed on GPUs. Both OpenMP CUDA implementations been tested benchmarked several examples
Two new algorithms for numerical solution of static Hamilton-Jacobi equations are presented. These designed to work efficiently on different parallel computing architectures, and results multicore CPU GPU implementations reported discussed. The experiments show that the proposed strategies scale well with computational power hardware. performance methods investigate tow types formulations investigated, isotropic eikonal equation an anisotropic formulation used simulate geological folding....
This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken mask overhead of various data movements between GPUs. Multiple OpenMP threads on CPU side should combined streams per GPU hide transfer cost related halo computation each 2D plane. Moreover, technique peer-to-peer motion can used reduce impact volumetric shuffles that have done mandatory...
Using large-scale multicore systems to get the maximum performance and energy efficiency with manageable programmability is a major challenge. The partitioned global address space (PGAS) programming model enhances by providing over computing systems. However, so far of PGAS on multicore-based parallel architectures have not been investigated thoroughly. In this paper we use set selected kernels from well-known NAS Parallel Benchmarks evaluate UPC language, which widely used implementation...