NFDI4DS | UHH-SEMS - Publication Details

Lena Oden

ORCID: 0000-0002-9670-5296

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5011121841

Research Areas

Parallel Computing and Optimization Techniques
Distributed and Parallel Computing Systems
Advanced Data Storage Technologies
Cloud Computing and Resource Management
EEG and Brain-Computer Interfaces
Computational Physics and Python Applications
Algorithms and Data Compression
Advanced Neural Network Applications
Scientific Computing and Data Management
Caching and Content Delivery
Interconnection Networks and Systems
Petri Nets in System Modeling
Non-Invasive Vital Sign Monitoring
Advanced MRI Techniques and Applications
Quantum Computing Algorithms and Architecture
Real-Time Systems Scheduling
Distributed systems and fault tolerance
Formal Methods in Verification
Security and Verification in Computing
Context-Aware Activity Recognition Systems
Online Learning and Analytics
Teaching and Learning Programming
Neural dynamics and brain function
Functional Brain Connectivity Studies
Medical Image Segmentation Techniques

University of Hagen
2019-2025

Argonne National Laboratory
2017-2025

Forschungszentrum Jülich
2019-2024

Fraunhofer Institute for Industrial Mathematics
2013-2016

Fraunhofer Society
2014-2015

Supply Chain Competence Center (Germany)
2014

Heidelberg University
2013-2014

Geospatial Research (United Kingdom)
2013

The coming decade of digital brain research: A vision for neuroscience at the intersection of technology and computing

OPENALEX - Publications

Katrin Amunts Markus Axer Swati Banerjee Lise Bitsch Jan G. Bjaalie and 95 more

Abstract In recent years, brain research has indisputably entered a new epoch, driven by substantial methodological advances and digitally enabled data integration modelling at multiple scales—from molecules to the whole brain. Major are emerging intersection of neuroscience with technology computing. This science combines high-quality research, across scales, culture multidisciplinary large-scale collaboration, translation into applications. As pioneered in Europe’s Human Brain Project...

10.1162/imag_a_00137 article EN cc-by Imaging Neuroscience 2024-04-01

Preparing MPICH for exascale

OPENALEX - Publications

Yanfei Guo Ken Raffenetti Hui Zhou Pavan Balaji Min Si and 24 more

The advent of exascale supercomputers heralds a new era scientific discovery, yet it introduces significant architectural challenges that must be overcome for MPI applications to fully exploit its potential. Among these is the adoption heterogeneous architectures, particularly integration GPUs accelerate computation. Additionally, complexity multithreaded programming models has also become critical factor in achieving performance at scale. efficient utilization hardware acceleration...

10.1177/10943420241311608 article EN The International Journal of High Performance Computing Applications 2025-01-09

GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters

OPENALEX - Publications

Lena Oden Holger Fröning

Modern GPUs are powerful high-core-count processors, which no longer used solely for graphics applications, but also employed to accelerate computationally intensive general-purpose tasks. For utmost performance, distributed throughout the cluster process parallel programs. In fact, many recent high-performance systems in TOP500 list heterogeneous architectures. Despite being highly effective processing units, on different hosts incapable of communicating without assistance from a CPU. As...

10.1109/cluster.2013.6702638 article EN 2013-09-01

Why is MPI so slow?

OPENALEX - Publications

Ken Raffenetti Abdelhalim Amer Lena Oden Charles J Archer Wesley Bland and 23 more

This paper provides an in-depth analysis of the software overheads in MPI performance-critical path and exposes mandatory performance that are unavoidable based on MPI-3.1 specification. We first present a highly optimized implementation standard which communication stack---all way from application to low-level network API---takes only few tens instructions. carefully study these instructions analyze root cause specific requirements under current standard. recommend potential changes can...

10.1145/3126908.3126963 article EN 2017-11-08

Lessons learned from comparing C-CUDA and Python-Numba for GPU-Computing

OPENALEX - Publications

Lena Oden

Python as programming language is increasingly gaining importance, especially in data science, scientific, and parallel programming. It faster easier to learn than classical languages such C. However, usability often comes at the cost of performance applications written are considered be much slower C or FORTRAN. Further, it does not allow usage GPUs-besides pre-compiled libraries.However, Numba package promises similar code for compute intensive parts a application supports CUDA, which...

10.1109/pdp50117.2020.00041 article EN 2020-03-01

Infiniband-Verbs on GPU: A Case Study of Controlling an Infiniband Network Device from the GPU

OPENALEX - Publications

Lena Oden Holger Fröning Franz-Joseph Pfreundt

Due to their massive parallelism and high performance per watt GPUs gain popularity in computing are a strong candidate for future exacscale systems. But communication data transfer GPU accelerated systems remain challenging problem. Since the normally is not able control network device, today hybrid-programming model preferred, whereby used calculation CPU handles communication. As result, between distributed suffers from unnecessary overhead, introduced by switching flow CPUs vice versa....

10.1109/ipdpsw.2014.111 article EN 2014-05-01

A Performance Study of UCX over InfiniBand

OPENALEX - Publications

Νικέλα Παπαδοπούλου Lena Oden Pavan Balaji

UCX is an open-source communication framework with a two-level API design targeted at addressing the needs of large supercomputing systems. The lower-level interface, UCT, adds minimal overhead to data transfer but requires considerable effort from user. higher-level UCP, easier use, some communication. This work focuses on charting performance over InfiniBand, motivated by usage as middleware for high-level libraries. We analyze shortcomings that stem and sources these losses. In...

10.1109/ccgrid.2017.149 article EN 2017-05-01

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs

OPENALEX - Publications

Lena Oden Benjamin Klenk Holger Fröning

GPUs gain high popularity in High Performance Computing, due to their massive parallelism and performance per Watt. Despite popularity, data transfer between multiple a cluster remains problem. Most communication models require the CPU control flow, also intermediate staging copies host memory are often inevitable. These two facts lead higher utilization. As result, overall decreases power consumption increases. Collective operations like reduce all very common scientific simulations...

10.1109/ccgrid.2014.21 article EN 2014-05-01

Integrating Interactive Performance Analysis in Jupyter Notebooks for Parallel Programming Education

OPENALEX - Publications

Lena Oden Klaus Nölp Philipp Brauner

10.1109/ipdpsw63119.2024.00084 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2024-05-27

Hexe: A Toolkit for Heterogeneous Memory Management

OPENALEX - Publications

Lena Oden Pavan Balaji

Heterogeneity in memory is becoming increasingly common high-end computing. Several modern supercomputers, such as those based on the Intel Knights Landing or NVIDIA P100 GPU architectures, already showcase multiple domains that are directly accessible by user applications, including on-chip high-bandwidth and off-chip traditional DDR memory. The next generation of supercomputers expected to take this architectural trend one step further NVRAM an additional byte-addressable option. Despite...

10.1109/icpads.2017.00090 article EN 2017-12-01

Analyzing communication models for distributed thread-collaborative processors in terms of energy and time

OPENALEX - Publications

Benjamin Klenk Lena Oden Holger Fröning

Accelerated computing has become pervasive for increasing the computational power and energy efficiency in terms of GFLOPs/Watt. For application areas with highest demands, instance high performance computing, data warehousing analytics, accelerators like GPUs or Intel's MICs are distributed throughout cluster. Since current analyses predictions show that movement will be main contributor to consumption, we entering an era communication-centric heterogeneous systems operating hard...

10.1109/ispass.2015.7095817 article EN 2015-03-01

Fall-detection on a wearable micro controller using machine learning algorithms

OPENALEX - Publications

Lena Oden Thorsten Witt

Wearables providing fall detection can provide faster emergency services for elderly, yet privacy concerns limit acceptance of this technology. In work, we evaluate a machine learning algorithm, called Bosnai, embedded edge devices to detect falls. The prototype is Arduino based and be integrated into fabrics clothes, belts, or other accessories. performed offline on the device. We used data from public datasets movement events train tree-based model. evaluated different combinations...

10.1109/smartcomp50058.2020.00067 article EN 2020-09-01

Analyzing Put/Get APIs for Thread-Collaborative Processors

OPENALEX - Publications

Benjamin Klenk Lena Oden Holger Froening

In High-Performance Computing (HPC), GPU-based accelerators are pervasive for two reasons: first, GPUs provide a much higher raw computational power than traditional CPUs. Second, consumption increases sub-linearly with the performance increase, making more energy-efficient in terms of GFLOPS/Watt Although these advantages limited to selected set workloads, most HPC applications can benefit lot from GPUs. The top 11 entries current Green500 list (November 2013) all GPU-accelerated systems,...

10.1109/icppw.2014.61 article EN 2014-09-01

Simplifying non-contiguous data transfer with MPI for Python

OPENALEX - Publications

Klaus Nölp Lena Oden

Abstract Python is becoming increasingly popular in scientific computing. The package MPI for (mpi4py) allows writing efficient parallel programs that scale across multiple nodes. However, it does not support non-contiguous data via slices, which a well-known feature of NumPy. In this work, we therefore evaluate several methods to the direct transfer arrays mpi4py. This significantly simplifies code, while performance basically stays same. PingPong-, Stencil- and Lattice-Boltzmann-Benchmark,...

10.1007/s11227-023-05398-7 article EN cc-by The Journal of Supercomputing 2023-06-07

IO Challenges for Human Brain Atlasing Using Deep Learning Methods - An In-Depth Analysis

OPENALEX - Publications

Lena Oden Christian Schiffer Hannah Spitzer Timo Dickscheid D. Pleiter

The use of Deep Learning methods have been identified as a key opportunity for enabling processing extreme-scale scientific datasets. Feeding data into compute nodes equipped with several high-end GPUs at sufficiently high rate is known challenge. Facilitating these datasets thus requires the ability to store petabytes well access very bandwidth. In this work, we look two cases cytoarchitectonic brain mapping. These applications are challenging underlying IO system. We present an in depth...

10.1109/empdp.2019.8671630 article EN 2019-02-01

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

OPENALEX - Publications

Lena Oden Holger Fröning

Due to their massive parallelism and high performance per Watt, GPUs have gained popularity in high-performance computing are a strong candidate for future exascale systems. But communication data transfer GPU-accelerated systems remain challenging problem. Since the GPU normally is not able control network device, hybrid-programming model preferred whereby used calculation CPU handles communication. As result, between distributed suffers from unnecessary overhead, introduced by switching...

10.1177/1094342015588142 article EN The International Journal of High Performance Computing Applications 2015-06-25

Energy-Efficient Stencil Computations on Distributed GPUs Using Dynamic Parallelism and GPU-Controlled Communication

OPENALEX - Publications

Lena Oden Benjamin Klenk Holger Fröning

GPUs are widely used in high performance computing, due to their computational power and per Watt. Still, one of the main bottlenecks GPU-accelerated cluster computing is data transfer between distributed GPUs. This not only affects performance, but also consumption. The most common way utilize a GPU hybrid model, which accelerate computation while CPU responsible for communication. approach always requires dedicated thread, consumes additional cycles therefore increases consumption complete...

10.1109/e2sc.2014.14 article EN 2014-11-01

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy

OPENALEX - Publications

Lena Oden Benjamin Klenk Holger Fröning

10.1016/j.parco.2016.02.005 article EN Parallel Computing 2016-03-29

Improving Cryptanalytic Applications with Stochastic Runtimes on GPUs

OPENALEX - Publications

Lena Oden Jörg Keller

We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. compare four algorithms for executing such on GPUs. demonstrate different distributions, problem sizes, and platforms the best strategy varies. support our analytic results by extensive experiments two GPUs, from sides performance spectrum: A high GPU (Nvidia Volta) an energy saving system chip (Jetson Nano).

10.1109/ipdpsw52791.2021.00077 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2021-06-01

Coming Soon ...