Mohammed Sourouri

ORCID: 0000-0003-1231-6355
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Embedded Systems Design Techniques
  • Computer Graphics and Visualization Techniques
  • Computational Geometry and Mesh Generation
  • Interconnection Networks and Systems
  • Geological Modeling and Analysis
  • Seismic Imaging and Inversion Techniques
  • Stochastic Gradient Optimization Techniques
  • Seismology and Earthquake Studies
  • 3D Shape Modeling and Analysis
  • Advanced Data Compression Techniques
  • Methane Hydrates and Related Phenomena
  • Advanced Numerical Methods in Computational Mathematics

Norwegian University of Science and Technology
2016-2017

Simula Research Laboratory
2012-2016

University of Oslo
2012-2015

Energy efficiency is an important aspect of future exascale systems, mainly due to rising energy cost. Although High performance computing (HPC) applications are compute centric, they still exhibit varying computational characteristics in different regions the program, such as compute-, memory-, and I/O-bound code regions. Some today's clusters already offer mechanisms adjust system resource requirements application, e.g., by controlling CPU frequency. However, manually tuning for improved a...

10.1007/s00607-016-0532-7 article EN cc-by Computing 2017-01-10

In the context of multiple GPUs that share same PCIe bus, we propose a new communication scheme leads to more effective overlap and computation. Multiple CUDA streams OpenMP threads are adopted so data can simultaneously be sent received. A representative 3D stencil example is used demonstrate effectiveness our scheme. We compare performance with an MPI-based state-of-the-art Results show approach outperforms scheme, being up 1.85× faster. However, results also indicate current underlying...

10.1109/padsw.2014.7097919 article EN 2014-12-01

A recent trend in modern high-performance computing environments is the introduction of powerful, energy-efficient hardware accelerators such as GPUs and Xeon Phi coprocessors. These specialized devices coexist with CPUs are optimized for highly parallel applications. In regular computing-intensive applications predictable data access patterns, these often far outperform thus relegate latter to pure control functions instead computations. For irregular applications, however, performance gap...

10.1109/mm.2015.70 article EN IEEE Micro 2015-07-01

There is a consensus that exascale systems should operate within power envelope of 20MW. Consequently, energy conservation still considered as the most crucial constraint if such are to be realized.

10.1145/3126908.3126945 article EN 2017-11-08

On modern GPU clusters, the role of CPUs is often restricted to controlling GPUs and handling MPI communication. The unused computing power CPUs, however, can be considerable for computations whose performance bounded by memory traffic. This paper investigates challenges simultaneous usage computation. Our emphasis on deriving a heterogeneous CPU+GPU programming approach that combines MPI, OpenMP CUDA. To effectively hide overhead various inter-and intra-node communications, new level task...

10.1109/cse.2015.33 article EN 2015-10-01

We study the problem of contention for memory bandwidth between computation and communication in supercomputers that feature multicore CPUs. The arises when are overlapped both operations compete same bandwidth. This is most visible at limits scalability, take similar amounts time thus must be taken into account order to reach maximum scalability bound applications. Typical examples codes affected by sparse matrix-vector computations, graph algorithms, many machine learning problems, as they...

10.1109/padsw.2018.8644601 article EN 2018-12-01

We present a novel method for 3D anisotropic front propagation and apply it to the simulation of geological folding. The new iterative algorithm has simple structure abundant parallelism, is easily adapted multithreaded architectures using OpenMP. Moreover, we have used automated C-to-CUDA source code translator, Mint, achieve greatly enhanced computing speed on GPUs. Both OpenMP CUDA implementations been tested benchmarked several examples

10.1016/j.procs.2012.04.101 article EN Procedia Computer Science 2012-01-01

Two new algorithms for numerical solution of static Hamilton-Jacobi equations are presented. These designed to work efficiently on different parallel computing architectures, and results multicore CPU GPU implementations reported discussed. The experiments show that the proposed strategies scale well with computational power hardware. performance methods investigate tow types formulations investigated, isotropic eikonal equation an anisotropic formulation used simulate geological folding....

10.1186/2190-5983-4-10 article EN cc-by Journal of Mathematics in Industry 2014-01-01

This paper studies the CUDA programming challenges with using multiple GPUs inside a single machine to carry out plane-by-plane updates in parallel 3D sweeping algorithms. In particular, care must be taken mask overhead of various data movements between GPUs. Multiple OpenMP threads on CPU side should combined streams per GPU hide transfer cost related halo computation each 2D plane. Moreover, technique peer-to-peer motion can used reduce impact volumetric shuffles that have done mandatory...

10.1016/j.procs.2015.05.339 article EN Procedia Computer Science 2015-01-01

Using large-scale multicore systems to get the maximum performance and energy efficiency with manageable programmability is a major challenge. The partitioned global address space (PGAS) programming model enhances by providing over computing systems. However, so far of PGAS on multicore-based parallel architectures have not been investigated thoroughly. In this paper we use set selected kernels from well-known NAS Parallel Benchmarks evaluate UPC language, which widely used implementation...

10.1109/hpcsim.2016.7568416 preprint EN 2016-07-01
Coming Soon ...