Ivy Bo Peng

ORCID: 0000-0003-4158-3583
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Cloud Computing and Resource Management
  • Ionosphere and magnetosphere dynamics
  • Solar and Space Plasma Dynamics
  • Magnetic confinement fusion research
  • Scientific Computing and Data Management
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Distributed systems and fault tolerance
  • Advanced Memory and Neural Computing
  • Gas Dynamics and Kinetic Theory
  • Astro and Planetary Science
  • Geomagnetism and Paleomagnetism Studies
  • Advanced Neural Network Applications
  • Caching and Content Delivery
  • Plasma Diagnostics and Applications
  • Quantum Computing Algorithms and Architecture
  • Earthquake Detection and Analysis
  • Low-power high-performance VLSI design
  • Seismology and Earthquake Studies
  • Target Tracking and Data Fusion in Sensor Networks
  • Laser-induced spectroscopy and plasma
  • Gamma-ray bursts and supernovae

KTH Royal Institute of Technology
2014-2024

Lawrence Livermore National Laboratory
2019-2023

Sandia National Laboratories California
2021

Oak Ridge National Laboratory
2018-2019

Huawei Technologies (China)
2017

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called Tensor Core that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. Tesla V100 accelerator, featuring the microarchitecture, provides 640 Cores with theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to program Cores, their performances and precision loss due computation Currently, three different ways programming Cores: CUDA...

10.1109/ipdpsw.2018.00091 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2018-05-01

Collisionless shock nonstationarity arising from microscale physics influences structure and particle acceleration mechanisms. Nonstationarity has been difficult to quantify due the small spatial temporal scales. We use closely spaced (subgyroscale), high-time-resolution measurements one rapid crossing of Earth's quasiperpendicular bow by Magnetospheric Multiscale (MMS) spacecraft compare competing processes. Using MMS's high-cadence kinetic plasma measurements, we show that exhibits in form ripples.

10.1103/physrevlett.117.165101 article EN cc-by Physical Review Letters 2016-10-12

Abstract We have recently developed a new modeling capability to embed the implicit particle‐in‐cell (PIC) model iPIC3D into Block‐Adaptive‐Tree‐Solarwind‐Roe‐Upwind‐Scheme magnetohydrodynamic (MHD) model. The MHD with embedded PIC domains (MHD‐EPIC) algorithm is two‐way coupled kinetic‐fluid As one of very first applications MHD‐EPIC algorithm, we simulate interaction between Jupiter's magnetospheric plasma and Ganymede's magnetosphere. compare simulations pure Hall both results Galileo...

10.1002/2015ja021997 article EN publisher-specific-oa Journal of Geophysical Research Space Physics 2016-01-22

Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the hierarchy. Nevertheless, has asymmetric read write considerably higher energy than DRAM. Our work provides an in-depth evaluation of first commercially available byte-addressable -- Intel Optane® DC™ persistent memory. The part our study quantifies latency, bandwidth, power efficiency, consumption under eight...

10.1145/3357526.3357568 article EN Proceedings of the International Symposium on Memory Systems 2019-09-30

We perform a three-dimensional (3D) global simulation of Earth's magnetosphere with kinetic reconnection physics to study the flux transfer events (FTEs) and dayside magnetic recently developed magnetohydrodynamics embedded particle-in-cell model (MHD-EPIC). During one-hour long simulation, FTEs are generated quasi-periodically near subsolar point move toward poles. find field signature at their early formation stage is similar `crater FTE', which characterized by strength dip FTE center....

10.1002/2017ja024186 article EN publisher-specific-oa Journal of Geophysical Research Space Physics 2017-09-18

Abstract We investigate the use of artificially increased ion and electron kinetic scales in global plasma simulations. argue that as long inertial remain well separated, (1) overall solution is not strongly sensitive to value scale, while (2) scale dynamics will also be similar original system, but it occurs at a larger spatial (3) structures intermediate scales, such magnetic islands, grow self‐similar manner. To validity limitations our scaling hypotheses, we carry out many simulations...

10.1002/2017ja024189 article EN publisher-specific-oa Journal of Geophysical Research Space Physics 2017-09-18

Graphics Processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck with GPU-accelerated is overhead from launching several fine-grained kernels. CUDA Graph addresses these challenges by enabling a graph-based execution model that captures operations nodes and dependence edges static graph. Thereby consolidating kernel launches into graph launch. We propose...

10.48550/arxiv.2501.09398 preprint EN arXiv (Cornell University) 2025-01-16

Quantum computer simulators are an indispensable tool for prototyping quantum algorithms and verifying the functioning of existing hardware. The current largest computers feature more than one thousand qubits, challenging their classical simulators. State-vector challenged by exponential increase representable states with respect to number making fifty qubits practically unfeasible. A appealing approach simulating is adopting tensor network approach, whose memory requirements fundamentally...

10.48550/arxiv.2501.15939 preprint EN arXiv (Cornell University) 2025-01-27

This paper investigates the architectural features and performance potential of Apple Silicon M-Series SoCs (M1, M2, M3, M4) for HPC. We provide a detailed review CPU GPU designs, unified memory architecture, coprocessors such as Advanced Matrix Extensions (AMX). design develop benchmarks in Metal Shading Language Objective-C++ to assess computational performance. also measure power consumption efficiency using Apple's powermetrics tool. Our results show that chips offer relatively high...

10.48550/arxiv.2502.05317 preprint EN arXiv (Cornell University) 2025-02-07

Traditional scientific and emerging data analytics applications require fast, power-efficient, large, persistent memories. Combining all these characteristics within a single memory technology is expensive hence future supercomputers will feature different technologies side-by-side. However, it complex task to program hybrid-memory systems identify the best object-to-memory mapping. We envision that programmers probably resort use default configurations only minimal interventions on...

10.1145/3092255.3092273 article EN 2017-06-09

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern feature high-bandwidth memory next computing cores. For example, Intel Knights Landing (KNL) processor is equipped with 16 GB of (HBM) works together conventional DRAM memory. Theoretically, HBM can provide ~4× higher bandwidth than DRAM. However, many factors impact effective achieved by applications,...

10.1109/ipdpsw.2017.115 article EN 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2017-05-01

We present the design and implementation of a spectral code, called SpectralPlasmaSolver (SPS), for solution multi-dimensional Vlasov-Maxwell equations. The method is based on Hermite-Fourier decomposition particle distribution function. code written in Fortran uses PETSc library solving non-linear equations preconditioning FFTW convolutions. SPS parallelized shared- memory machines using OpenMP. As verification example, we discuss simulations two-dimensional Orszag-Tang vortex problem...

10.1088/1742-6596/719/1/012022 article EN Journal of Physics Conference Series 2016-05-01

We present a systematic attempt to study magnetic null points and the associated energy conversion in kinetic Particle-in-Cell simulations of various plasma configurations. address three-dimensional performed with semi-implicit electromagnetic code iPic3D different setups: variations Harris current sheet, dipolar quadrupolar magnetospheres interacting solar wind; relaxing turbulent configuration multiple points. Spiral nulls are more likely created space plasmas: all our except lunar anomaly...

10.3847/0004-637x/819/1/52 article EN The Astrophysical Journal 2016-02-26

Large-scale high-performance computing (HPC) systems consist of massive compute and memory resources tightly coupled in nodes. We perform a large-scale study utilization on four production HPC clusters. Our results show that more than 90% jobs utilize less 15% the node capacity, for time, is 35%. Recently, disaggregated architecture gaining traction because it can selectively scale up resource improve utilization. Based these observations, we explore using to support memory-intensive...

10.1109/sbac-pad49847.2020.00034 article EN 2020-09-01

A spectral method for kinetic plasma simulations based on the expansion of velocity dis- tribution function in a variable number Hermite polynomials is presented. The set non-linear equations that solved to determine coefficients satisfying Vlasov and Poisson equations. In this paper, we first show technique combines fluid approaches into one framework. Second, present an adaptive strategy increase decrease functions dynamically during simulation. applied Landau damping two-stream...

10.1016/j.procs.2015.05.284 article EN Procedia Computer Science 2015-01-01

Active development in new memory devices, such as non-volatile memories and high-bandwidth memories, brings heterogeneous systems (HMS) a promising solution for implementing large-scale with cost, area, power limitations. Typical HMS consists of small-capacity high-performance large-capacity low-performance memory. Data placement on plays critical role performance optimization. Existing efforts have explored coarse-grained data applications dense structures; however, thorough study that are...

10.1145/3368826.3377922 article EN 2020-02-21

We demonstrate the improvements to an implicit Particle-in-Cell code, iPic3D, on example of dipolar magnetic field immersed in flow plasma and show formation a mag- netosphere. address problem modelling multi-scale phenomena during magnetosphere by implementing adaptive sub-cycling technique resolve motion particles located close dipole centre, where intensity is maximum. In addition, we implemented new open boundary conditions model inflow outflow plasma. present results global...

10.1016/j.procs.2015.05.288 article EN Procedia Computer Science 2015-01-01

Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume data and more move infrastructures, it necessary investigate feasibility combining message-passing programming models. MPI, de facto standard for on HPC, cannot intuitively express communication pattern functional operations required in In this work, we designed implemented a library MPIStream atop MPI allocate producers consumers, stream...

10.1145/2831129.2831131 article EN 2015-11-09

Abstract Mars Atmosphere and Volatile EvolutioN (MAVEN) mission observations show clear evidence of the occurrence magnetic reconnection process in Martian plasma tail. In this study, we use sophisticated numerical models to help us understand effects The used study are (a) a multispecies global Hall‐magnetohydrodynamic (HMHD) model (b) HMHD two‐way coupled an embedded fully kinetic particle‐in‐cell code. Comparison with MAVEN clearly shows that general interaction pattern is well reproduced...

10.1029/2017ja024729 article EN Journal of Geophysical Research Space Physics 2018-04-30

Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on evolving. Diverse lead to a need for configurable achieve high performance utilization. In this study, we evaluate subsystem design leveraging CXL-enabled pooling. Two promising use cases of composable subsystems studied – fine-grained capacity provisioning scalable bandwidth provisioning. We developed an emulator explore the impact various compositions....

10.1109/mchpc56545.2022.00007 preprint EN 2022-11-01

The emergence of high-density byte-addressable non-volatile memory (NVM) is promising to accelerate data-and compute-intensive applications. Current NVM technologies have lower performance than DRAM and, thus, are often paired with in a heterogeneous main memory. Recently, hardware becomes available. This work provides timely evaluation representative HPC applications from the "Seven Dwarfs" on NVM-based Our results quantify effectiveness DRAM-cached-NVM for accelerating and enabling large...

10.1109/ipdps47924.2020.00098 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2020-05-01

Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different technologies working side-by-side. A critical question is whether at large scale existing HPC applications emerging data-analytics workloads have performance improvement or degradation on these systems. We propose a systematic fair methodology to identify the trend of application hybrid-memory model system next-generation as combination "fast" "slow" memories. then analyze dynamic...

10.1109/hpcc-smartcity-dss.2016.0074 preprint EN 2016-12-01

We carried out a 3D fully kinetic simulation of Earth's magnetotail magnetic reconnection to study the dynamics energetic particles. developed and implemented new relativistic particle mover in iPIC3D, an implicit Particle-in-Cell code, correctly model Before onset reconnection, electrons are found localized close current sheet accelerated by lower hybrid drift instability. During particles region along x -line separatrices region. The first present stripes finally cover all separatrix...

10.1017/s0022377814001123 article EN Journal of Plasma Physics 2014-12-04
Coming Soon ...