NFDI4DS | UHH-SEMS - Publication Details

Harnessing CUDA-Q's MPS for Tensor Network Simulations of Large-Scale Quantum Circuits

OPENALEX - Publications

Gabin Schieffer Stefano Markidis Ivy Bo Peng

Quantum computer simulators are an indispensable tool for prototyping quantum algorithms and verifying the functioning of existing hardware. The current largest computers feature more than one thousand qubits, challenging their classical simulators. State-vector challenged by exponential increase representable states with respect to number making fifty qubits practically unfeasible. A appealing approach simulating is adopting tensor network approach, whose memory requirements fundamentally...

10.48550/arxiv.2501.15939 preprint EN arXiv (Cornell University) 2025-01-27

Harnessing CUDA-Q’s MPS for Tensor Network Simulations of Large-Scale Quantum Circuits

OPENALEX - Publications

Gabin Schieffer Stefano Markidis Ivy Bo Peng

10.1109/pdp66500.2025.00022 article EN 2025-03-12

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems

OPENALEX - Publications

Jacob Wahlgren Gabin Schieffer Maya Gokhale Ivy Bo Peng

Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising non-disruptive option for is rack-scale pooling, where node-local supplemented shared pools. This work outlines the prospects requirements adoption clarifies several misconceptions. We propose a quantitative method dissecting application system from top...

10.1145/3581784.3607108 article EN 2023-10-30

On the Rise of AMD Matrix Cores: Performance, Power Efficiency, and Programmability

OPENALEX - Publications

Gabin Schieffer Daniel Araújo De Medeiros Jennifer Faj Aniruddha Marathe Ivy Bo Peng

10.1109/ispass61541.2024.00022 article EN 2024-05-05

Kub: Enabling Elastic HPC Workloads on Containerized Environments

OPENALEX - Publications

Daniel Medeiros Jacob Wahlgren Gabin Schieffer Ivy Bo Peng

The conventional model of resource allocation in HPC systems is static. Thus, a job cannot leverage newly available resources the system or release underutilized during execution. In this paper, we present Kub, methodology that enables elastic execution workloads on Kubernetes so allocated to can be dynamically scaled One main optimization our method maximize reuse originally disruption running minimized. scaling procedure coordinated among nodes through remote calls for deploying cloud. We...

10.1109/sbac-pad59825.2023.00031 article EN 2023-10-17

Understanding Layered Portability from HPC to Cloud in Containerized Environments

OPENALEX - Publications

Daniel Medeiros Gabin Schieffer Jacob Wahlgren Ivy Bo Peng

Recent development in lightweight OS-level virtualization, containers, provides a potential solution for running HPC applications on the cloud platform. In this work, we focus impact of different layers containerized environment when migrating containers from dedicated system to On three ARM-based platforms, including latest Nvidia Grace CPU, use six representative characterize container host OS and kernel, rootless privileged execution. Our results indicate less than 4\% overhead DGEMM,...

10.48550/arxiv.2406.11760 preprint EN arXiv (Cornell University) 2024-06-17

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

OPENALEX - Publications

Gabin Schieffer Jacob Wahlgren Jie Ren Jennifer Faj Ivy Bo Peng

Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit allocations data copy or unified virtual memory. The Grace Hopper Superchip, for the first time, supports an integrated CPU-GPU system page table, hardware-level addressing of allocated memory, cache-coherent NVLink-C2C interconnect, bringing alternative solution enabling a Unified system. In this work, we provide in-depth study on in both in-memory oversubscription scenarios. We suite...

10.48550/arxiv.2407.07850 preprint EN arXiv (Cornell University) 2024-07-10

Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

OPENALEX - Publications

Gabin Schieffer Jacob Wahlgren Jie Ren Jennifer Faj Ivy Bo Peng

Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit allocations data copy or unified virtual memory. The Grace Hopper Superchip, for the first time, supports an integrated CPU-GPU system page table, hardware-level addressing of allocated memory, cache-coherent NVLink-C2C interconnect, bringing alternative solution enabling a Unified system. In this work, we provide in-depth study on in both in-memory oversubscription scenarios. We suite...

10.1145/3673038.3673110 article EN cc-by 2024-08-08

OpenCUBE: Building an Open Source Cloud Blueprint with EPI Systems

OPENALEX - Publications

Ivy Bo Peng Martin Schulz Utz‐Uwe Haus Craig Prunty Pedro Marcuello and 6 more

OpenCUBE aims to develop an open-source full software stack for Cloud computing blueprint deployed on EPI hardware, adaptable emerging workloads across the continuum. prioritizes energy awareness and utilizes open APIs, Open Source components, advanced SiPearl Rhea processors, RISC-V accelerator. The project leverages representative workloads, such as cloud-native workflows of weather forecast data management, molecular docking, space weather, evaluation validation.

10.48550/arxiv.2410.10423 preprint EN arXiv (Cornell University) 2024-10-14

Accelerating Drug Discovery in AutoDock-GPU with Tensor Cores

OPENALEX - Publications

Gabin Schieffer Ivy Bo Peng

In drug discovery, molecular docking aims at characterizing the binding of a drug-like molecule to macromolecule. AutoDock-GPU, state-of-the-art software, estimates geometrical conformation docked ligand-protein complex by minimizing scoring function. Our profiling results indicate that current reduction operation is heavily used in function sub-optimal. Thus, we developed method accelerate sum four-element vectors using matrix operations on NVIDIA Tensor Cores. We integrated new into...

10.48550/arxiv.2410.10447 preprint EN arXiv (Cornell University) 2024-10-14

A GPU-accelerated Molecular Docking Workflow with Kubernetes and Apache Airflow

OPENALEX - Publications

Daniel Medeiros Gabin Schieffer Jacob Wahlgren Ivy Bo Peng

Complex workflows play a critical role in accelerating scientific discovery. In many domains, efficient workflow management can lead to faster output and broader user groups. Workflows that leverage resources across the boundary between cloud HPC are strong driver for convergence of cloud. This study investigates transition deployment GPU-accelerated molecular docking was designed systems onto cloud-native environment with Kubernetes Apache Airflow. The case focuses on state-of-of-the-art...

10.48550/arxiv.2410.10634 preprint EN arXiv (Cornell University) 2024-10-14

Kub: Enabling Elastic HPC Workloads on Containerized Environments

OPENALEX - Publications

Daniel Medeiros Jacob Wahlgren Gabin Schieffer Ivy Bo Peng

The conventional model of resource allocation in HPC systems is static. Thus, a job cannot leverage newly available resources the system or release underutilized during execution. In this paper, we present Kub, methodology that enables elastic execution workloads on Kubernetes so allocated to can be dynamically scaled One main optimization our method maximize reuse originally disruption running minimized. scaling procedure coordinated among nodes through remote calls for deploying cloud. We...

10.48550/arxiv.2410.10655 preprint EN arXiv (Cornell University) 2024-10-14

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

OPENALEX - Publications

Gabin Schieffer R. S. Shi Stefano Markidis Andreas Herten Jennifer Faj and 1 more

Modern GPU systems are constantly evolving to meet the needs of computing-intensive applications in scientific and machine learning domains. However, there is typically a gap between hardware capacity achievable application performance. This work aims provide better understanding Infinity Fabric interconnects on AMD GPUs CPUs. We propose test evaluation methodology for characterizing performance data movements multi-GPU systems, stressing different communication options MI250X GPUs,...

10.48550/arxiv.2410.00801 preprint EN arXiv (Cornell University) 2024-10-01

Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE

OPENALEX - Publications

Samuel Miksits R. S. Shi Maya Gokhale Jacob Wahlgren Gabin Schieffer and 1 more

High-end ARM processors are emerging in data centers and HPC systems, posing as a strong contender to x86 machines. Memory-centric profiling is an important approach for dissecting application's bottlenecks on memory access guiding optimizations. Many existing tools leverage hardware performance counters precise event sampling, such Intel PEBS AMD IBS, achieve high accuracy low overhead. In this work, we present multi-level tool processors, leveraging Statistical Profiling Extension (SPE)....

10.48550/arxiv.2410.01514 preprint EN arXiv (Cornell University) 2024-10-02

Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing

OPENALEX - Publications

Jacob Wahlgren Gabin Schieffer Maya Gokhale Roger Pearce Ivy Bo Peng

Disaggregated memory breaks the boundary of monolithic servers to enable provisioning on demand. Using network-attached provide expansion for memory-intensive applications compute nodes can improve overall utilization a cluster and reduce total cost ownership. However, current software solutions leveraging must consume resources node management tasks. Emerging off-path smartNICs general-purpose programmability at low-cost low-power cores. This work provides general architecture design that...

10.48550/arxiv.2410.02599 preprint EN arXiv (Cornell University) 2024-10-03

Disaggregated Memory with SmartNIC Offloading: a Case Study on Graph Processing

OPENALEX - Publications

Jacob Wahlgren Gabin Schieffer Maya Gokhale Roger Pearce Ivy Bo Peng

10.1109/sbac-pad63648.2024.00022 article EN 2024-11-13

Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

OPENALEX - Publications

Gabin Schieffer R. S. Shi Stefano Markidis Andreas Herten Jennifer Faj and 1 more

10.1109/scw63240.2024.00079 article EN 2024-11-17

Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE

OPENALEX - Publications

Samuel Miksits R. S. Shi Maya Gokhale Jacob Wahlgren Gabin Schieffer and 1 more

10.1109/scw63240.2024.00139 article EN 2024-11-17

Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU

OPENALEX - Publications

Gabin Schieffer Nattawat Pornthisan Daniel Araújo De Medeiros Stefano Markidis Jacob Wahlgren and 1 more

High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores compare their performance accuracy with single- double-precision baselines Nvidia V100, A100, A40 T4 GPUs. To mitigate numerical instability precision losses, introduce algorithmic...

10.48550/arxiv.2308.00763 preprint EN other-oa arXiv (Cornell University) 2023-01-01