- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- EEG and Brain-Computer Interfaces
- Computational Physics and Python Applications
- Algorithms and Data Compression
- Advanced Neural Network Applications
- Scientific Computing and Data Management
- Caching and Content Delivery
- Interconnection Networks and Systems
- Petri Nets in System Modeling
- Non-Invasive Vital Sign Monitoring
- Advanced MRI Techniques and Applications
- Quantum Computing Algorithms and Architecture
- Real-Time Systems Scheduling
- Distributed systems and fault tolerance
- Formal Methods in Verification
- Security and Verification in Computing
- Context-Aware Activity Recognition Systems
- Online Learning and Analytics
- Teaching and Learning Programming
- Neural dynamics and brain function
- Functional Brain Connectivity Studies
- Medical Image Segmentation Techniques
University of Hagen
2019-2025
Argonne National Laboratory
2017-2025
Forschungszentrum Jülich
2019-2024
Fraunhofer Institute for Industrial Mathematics
2013-2016
Fraunhofer Society
2014-2015
Supply Chain Competence Center (Germany)
2014
Heidelberg University
2013-2014
Geospatial Research (United Kingdom)
2013
Abstract In recent years, brain research has indisputably entered a new epoch, driven by substantial methodological advances and digitally enabled data integration modelling at multiple scales—from molecules to the whole brain. Major are emerging intersection of neuroscience with technology computing. This science combines high-quality research, across scales, culture multidisciplinary large-scale collaboration, translation into applications. As pioneered in Europe’s Human Brain Project...
The advent of exascale supercomputers heralds a new era scientific discovery, yet it introduces significant architectural challenges that must be overcome for MPI applications to fully exploit its potential. Among these is the adoption heterogeneous architectures, particularly integration GPUs accelerate computation. Additionally, complexity multithreaded programming models has also become critical factor in achieving performance at scale. efficient utilization hardware acceleration...
Modern GPUs are powerful high-core-count processors, which no longer used solely for graphics applications, but also employed to accelerate computationally intensive general-purpose tasks. For utmost performance, distributed throughout the cluster process parallel programs. In fact, many recent high-performance systems in TOP500 list heterogeneous architectures. Despite being highly effective processing units, on different hosts incapable of communicating without assistance from a CPU. As...
This paper provides an in-depth analysis of the software overheads in MPI performance-critical path and exposes mandatory performance that are unavoidable based on MPI-3.1 specification. We first present a highly optimized implementation standard which communication stack---all way from application to low-level network API---takes only few tens instructions. carefully study these instructions analyze root cause specific requirements under current standard. recommend potential changes can...
Python as programming language is increasingly gaining importance, especially in data science, scientific, and parallel programming. It faster easier to learn than classical languages such C. However, usability often comes at the cost of performance applications written are considered be much slower C or FORTRAN. Further, it does not allow usage GPUs-besides pre-compiled libraries.However, Numba package promises similar code for compute intensive parts a application supports CUDA, which...
Due to their massive parallelism and high performance per watt GPUs gain popularity in computing are a strong candidate for future exacscale systems. But communication data transfer GPU accelerated systems remain challenging problem. Since the normally is not able control network device, today hybrid-programming model preferred, whereby used calculation CPU handles communication. As result, between distributed suffers from unnecessary overhead, introduced by switching flow CPUs vice versa....
UCX is an open-source communication framework with a two-level API design targeted at addressing the needs of large supercomputing systems. The lower-level interface, UCT, adds minimal overhead to data transfer but requires considerable effort from user. higher-level UCP, easier use, some communication. This work focuses on charting performance over InfiniBand, motivated by usage as middleware for high-level libraries. We analyze shortcomings that stem and sources these losses. In...
GPUs gain high popularity in High Performance Computing, due to their massive parallelism and performance per Watt. Despite popularity, data transfer between multiple a cluster remains problem. Most communication models require the CPU control flow, also intermediate staging copies host memory are often inevitable. These two facts lead higher utilization. As result, overall decreases power consumption increases. Collective operations like reduce all very common scientific simulations...
Heterogeneity in memory is becoming increasingly common high-end computing. Several modern supercomputers, such as those based on the Intel Knights Landing or NVIDIA P100 GPU architectures, already showcase multiple domains that are directly accessible by user applications, including on-chip high-bandwidth and off-chip traditional DDR memory. The next generation of supercomputers expected to take this architectural trend one step further NVRAM an additional byte-addressable option. Despite...
Accelerated computing has become pervasive for increasing the computational power and energy efficiency in terms of GFLOPs/Watt. For application areas with highest demands, instance high performance computing, data warehousing analytics, accelerators like GPUs or Intel's MICs are distributed throughout cluster. Since current analyses predictions show that movement will be main contributor to consumption, we entering an era communication-centric heterogeneous systems operating hard...
Wearables providing fall detection can provide faster emergency services for elderly, yet privacy concerns limit acceptance of this technology. In work, we evaluate a machine learning algorithm, called Bosnai, embedded edge devices to detect falls. The prototype is Arduino based and be integrated into fabrics clothes, belts, or other accessories. performed offline on the device. We used data from public datasets movement events train tree-based model. evaluated different combinations...
In High-Performance Computing (HPC), GPU-based accelerators are pervasive for two reasons: first, GPUs provide a much higher raw computational power than traditional CPUs. Second, consumption increases sub-linearly with the performance increase, making more energy-efficient in terms of GFLOPS/Watt Although these advantages limited to selected set workloads, most HPC applications can benefit lot from GPUs. The top 11 entries current Green500 list (November 2013) all GPU-accelerated systems,...
Abstract Python is becoming increasingly popular in scientific computing. The package MPI for (mpi4py) allows writing efficient parallel programs that scale across multiple nodes. However, it does not support non-contiguous data via slices, which a well-known feature of NumPy. In this work, we therefore evaluate several methods to the direct transfer arrays mpi4py. This significantly simplifies code, while performance basically stays same. PingPong-, Stencil- and Lattice-Boltzmann-Benchmark,...
The use of Deep Learning methods have been identified as a key opportunity for enabling processing extreme-scale scientific datasets. Feeding data into compute nodes equipped with several high-end GPUs at sufficiently high rate is known challenge. Facilitating these datasets thus requires the ability to store petabytes well access very bandwidth. In this work, we look two cases cytoarchitectonic brain mapping. These applications are challenging underlying IO system. We present an in depth...
Due to their massive parallelism and high performance per Watt, GPUs have gained popularity in high-performance computing are a strong candidate for future exascale systems. But communication data transfer GPU-accelerated systems remain challenging problem. Since the GPU normally is not able control network device, hybrid-programming model preferred whereby used calculation CPU handles communication. As result, between distributed suffers from unnecessary overhead, introduced by switching...
GPUs are widely used in high performance computing, due to their computational power and per Watt. Still, one of the main bottlenecks GPU-accelerated cluster computing is data transfer between distributed GPUs. This not only affects performance, but also consumption. The most common way utilize a GPU hybrid model, which accelerate computation while CPU responsible for communication. approach always requires dedicated thread, consumes additional cycles therefore increases consumption complete...
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. compare four algorithms for executing such on GPUs. demonstrate different distributions, problem sizes, and platforms the best strategy varies. support our analytic results by extensive experiments two GPUs, from sides performance spectrum: A high GPU (Nvidia Volta) an energy saving system chip (Jetson Nano).