NFDI4DS | UHH-SEMS - Publication Details

Balazs Gerofi

ORCID: 0009-0004-8585-6031

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5001838603

Research Areas

Parallel Computing and Optimization Techniques
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Cloud Computing and Resource Management
Caching and Content Delivery
Distributed systems and fault tolerance
Interconnection Networks and Systems
Scientific Computing and Data Management
Advanced Neural Network Applications
Matrix Theory and Algorithms
Meteorological Phenomena and Simulations
Network Traffic and Congestion Control
Advanced Memory and Neural Computing
Stochastic Gradient Optimization Techniques
Computational Geometry and Mesh Generation
Advanced Electron Microscopy Techniques and Applications
Tropical and Extratropical Cyclones Research
Advanced Optimization Algorithms Research
Adversarial Robustness in Machine Learning
Security and Verification in Computing
X-ray Spectroscopy and Fluorescence Analysis
Neural Networks and Applications
Peer-to-Peer Network Technologies
Advanced Algorithms and Applications
Ferroelectric and Negative Capacitance Devices

RIKEN Center for Computational Science
2015-2024

Intel (United States)
2022-2024

Tokyo Institute of Technology
2022

University of Chicago
2022

National Institute of Informatics
2022

Argonne National Laboratory
2022

Fujitsu (Japan)
2022

Institut Polytechnique de Paris
2022

Lawrence Berkeley National Laboratory
2022

Columbia University
2022

“Big Data Assimilation” Toward Post-Petascale Severe Weather Prediction: An Overview and Progress

OPENALEX - Publications

Takemasa Miyoshi Guo‐Yuan Lien Shinsuke Satoh Tomoo Ushio Kotaro Bessho and 14 more

Following the invention of telegraph, electronic computer, and remote sensing, "big data" is bringing another revolution to weather prediction. As sensor computer technologies advance, orders magnitude bigger data are produced by new sensors high-precision simulation or simulation." Data assimilation (DA) a key numerical prediction (NWP) integrating real-world into simulation. However, current DA NWP systems not designed handle from next-generation big Therefore, we propose assimilation"...

10.1109/jproc.2016.2602560 article EN cc-by Proceedings of the IEEE 2016-09-26

On the Scalability, Performance Isolation and Device Driver Transparency of the IHK/McKernel Hybrid Lightweight Kernel

OPENALEX - Publications

Balazs Gerofi Masamichi Takagi Atsushi Hori Gou Nakamura T. Shirasawa and 1 more

Extreme degree of parallelism in high-end computing requires low operating system noise so that large scale, bulk-synchronous parallel applications can be run efficiently. Noiseless execution has been historically achieved by deploying lightweight kernels (LWK), which, on the other hand, provide only a restricted set POSIX API exchange for scalability. However, increasing prevalence more complex application constructs, such as in-situ analysis and workflow composition, dictates need rich...

10.1109/ipdps.2016.80 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2016-05-01

Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning

OPENALEX - Publications

Truong Thao Nguyen François Trahay Jens Domke Aleksandr Drozd Emil Vatai and 3 more

Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural Networks (DNN). SGD iterates input data set in each epoch processing samples a random access fashion. Because this puts enormous pressure on I/O subsystem, common approach to distributed HPC environments replicate entire dataset node local SSDs. However, due rapidly growing sizes has become increasingly infeasible. Surprisingly, questions of why and what extent required have not received lot attention...

10.1109/ipdps53621.2022.00109 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2022-05-01

Interface for heterogeneous kernels: A framework to enable hybrid OS designs targeting high performance computing on manycore architectures

OPENALEX - Publications

Taku Shimosawa Balazs Gerofi Masamichi Takagi Gou Nakamura T. Shirasawa and 4 more

Turning towards exascale systems and beyond, it has been widely argued that the currently available software is not going to be feasible due various requirements such as ability deal with heterogeneous architectures, need for level optimization targeting specific applications, elimination of OS noise, at same time, compatibility legacy applications. To cope these issues, a hybrid design operating where light-weight specialized kernels can cooperate traditional kernel seems adequate, number...

10.1109/hipc.2014.7116885 article EN 2014-12-01

Process-in-process

OPENALEX - Publications

Atsushi Hori Min Si Balazs Gerofi Masamichi Takagi Jai Dayal and 2 more

The two most common parallel execution models for many-core CPUs today are multiprocess (e.g., MPI) and multithread OpenMP). model allows each process to own a private address space, although processes can explicitly allocate shared-memory regions. multithreaded shares all space by default, threads move data thread-private storage. In this paper, we present third called process-in-process (PiP), where multiple mapped into single virtual space. Thus, still owns its process-private storage...

10.1145/3208040.3208045 article EN 2018-06-11

Pattern-Based Prefetching with Adaptive Cache Management Inside of Solid-State Drives

OPENALEX - Publications

Jun Li Xiaofei Xu Zhigang Cai Jianwei Liao Kenli Li and 2 more

This article proposes a pattern-based prefetching scheme with the support of adaptive cache management, at flash translation layer solid-state drives ( SSDs ). It works inside and has features OS dependence uses transparency. Specifically, it first mines frequent block access patterns that reflect correlation among occurred I/O requests. Then, compares requests in current time window identified to direct data into SSDs. More importantly, maximize use efficiency, we build mathematical model...

10.1145/3474393 article EN ACM Transactions on Storage 2022-01-29

Partially Separated Page Tables for Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous Architectures

OPENALEX - Publications

Balazs Gerofi Akio Shimada Atsushi Hori Yutaka Ishikawa

Heterogeneous architectures, where a multicore processor is accompanied with large number of simpler, but more power-efficient CPU cores optimized for parallel workloads, are receiving lot attention recently. At present, these co-processors, such as the Intel Xeon Phi product family, come limited on-board memory, which requires partitioning computational problems manually into pieces that can fit device's RAM, well efficiently overlapping computation and communication. In this paper we...

10.1109/ccgrid.2013.59 article EN 2013-05-01

MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

OPENALEX - Publications

Steven Farrell Murali Emani Jacob Balma Lukas Drescher Aleksandr Drozd and 38 more

Scientific communities are increasingly adopting machine learning and deep models in their applications to accelerate scientific insights. High performance computing systems pushing the frontiers of with a rich diversity hardware resources massive scale-out capabilities. There is critical need understand fair effective benchmarking that representative real-world use cases. MLPerf<sup>™</sup> community-driven standard benchmark workloads, focusing on end-to-end metrics. In this paper,...

10.1109/mlhpc54614.2021.00009 article EN 2021-11-01

Prefetching on Storage Servers through Mining Access Patterns on Blocks

OPENALEX - Publications

Jianwei Liao François Trahay Balazs Gerofi Yutaka Ishikawa

Distributed file systems have been widely deployed as back-end storage to offer I/O services for parallel/distributed applications that process large amounts of data. Data prefetching in distributed is a well-known optimization technique which can mask both network and disk latency consequently boost performance. Traditionally, data initiated by the client systems, however, conventional schemes are not well suited machines limited memory computing capacity. To an efficient approach...

10.1109/tpds.2015.2496595 article EN IEEE Transactions on Parallel and Distributed Systems 2015-10-30

An Efficient Process Live Migration Mechanism for Load Balanced Distributed Virtual Environments

OPENALEX - Publications

Balazs Gerofi Hajime Fujita Yutaka Ishikawa

Distributed virtual environments (DVE), such as multi-player online games and distributed simulations may involve a massive amount of concurrent clients. Deploying server architectures is currently the most prevalent way providing large-scale services, where typically space divided into several distinct regions requiring each to handle only part world. Inequalities in client distribution may, however, cause certain servers become overloaded, which potentially degrades interactivity...

10.1109/cluster.2010.25 article EN 2010-09-01

Adaptive Management With Request Granularity for DRAM Cache Inside nand-Based SSDs

OPENALEX - Publications

Haodong Lin Jun Li Zhibing Sha Zhigang Cai Yuanquan Shi and 2 more

Most flash-based solid-state drives (SSDs) adopt an onboard dynamic random access memory (DRAM) to buffer hot write data. Then, the or overwrite operations can be absorbed by DRAM cache, given that there is sufficient locality in applications' I/O pattern, consequently avoid flushing data onto underlying SSD cells. After analyzing typical real-world workloads over SSDs, we observed buffered of small-size requests are more likely reaccessed than those large requests. To efficiently utilize...

10.1109/tcad.2022.3229293 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2022-12-14

Utilizing Memory Content Similarity for Improving the Performance of Replicated Virtual Machines

OPENALEX - Publications

Balazs Gerofi Zoltan Vass Yutaka Ishikawa

Checkpoint-recovery based Virtual Machine (VM) replication is an emerging approach towards accommodating VM installations with high availability. However, it comes the price of significant performance degradation application executed in due to large amount state that needs be synchronized between primary and backup machines. It therefore critical find new ways for attaining good performance, at same time, maintaining fault tolerant execution. In this paper, we present a novel improve...

10.1109/ucc.2011.20 article EN 2011-12-01

Exploring the Design Space of Combining Linux with Lightweight Kernels for Extreme Scale Computing

OPENALEX - Publications

Balazs Gerofi Masamichi Takagi Yutaka Ishikawa Rolf Riesen Evan T. Powers and 1 more

As systems sizes increase to exascale and beyond, there is a need enhance the system software meet needs challenges of applications. The evolutionary versus revolutionary debate can be set aside by providing that simultaneously supports existing new programming models. seemingly contradictory requirements scalable performance traditional rich APIs (POSIX, Linux in particular) suggest approach, has lead class research. Traditionally, operating for extreme-scale computing have followed two...

10.1145/2768405.2768410 article EN 2015-06-12

What is a Lightweight Kernel?

OPENALEX - Publications

Rolf Riesen Arthur B. Maccabe Balazs Gerofi David N. Lombard John Jack Lange and 9 more

Lightweight kernels (LWK) have been in use on the compute nodes of supercomputers for decades. Although many high-end systems now run Linux, interest options and alternatives has increased last couple years. Future extreme-scale require rethinking operating system, modern LWKs may well play a role final solution.

10.1145/2768405.2768414 article EN 2015-06-12

Mitigating Negative Impacts of Read Disturb in SSDs

OPENALEX - Publications

Jun Li Bowen Huang Zhibing Sha Zhigang Cai Jianwei Liao and 2 more

Read disturb is a circuit-level noise in solid-state drives (SSDs), which may corrupt existing data SSD blocks and then cause high read error rate longer latency. The approach of refresh commonly used to avoid errors by periodically migrating the hot other free blocks, but it places considerable negative impacts on I/O (Input/Output) responsiveness. This article proposes scheduling approaches write operations, mitigate effects caused disturb. To be specific, we first construct model classify...

10.1145/3410332 article EN ACM Transactions on Design Automation of Electronic Systems 2020-09-01

CMCP

OPENALEX - Publications

Balazs Gerofi Akio Shimada Atsushi Hori Masamichi Takagi Yutaka Ishikawa

The increasing prevalence of co-processors such as the Intel Xeon Phi, has been reshaping high performance computing (HPC) landscape. Phi comes with a large number power efficient CPU cores, but at same time, it's highly memory constraint environment leaving task management entirely up to application developers. To reduce programming complexity, we are focusing on transparent, operating system (OS) level hierarchical management.

10.1145/2600212.2600231 article EN 2014-06-20

Performance and Scalability of Lightweight Multi-kernel Based Operating Systems

OPENALEX - Publications

Balazs Gerofi Rolf Riesen Masamichi Takagi Taisuke Boku Kengo Nakajima and 2 more

Multi-kernels leverage today's multi-core chips to run multiple operating system (OS) kernels, typically a Light Weight Kernel (LWK) and Linux kernel, simultaneously. The LWK provides high performance scalability, while the kernel compatibility. show promise of being able meet tomorrow's extreme-scale computing needs providing strong isolation, yielding scalability needed by classical HPC applications. McKernel mOS started as independent research initiatives explore above potential. Previous...

10.1109/ipdps.2018.00022 article EN 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2018-05-01

A flexible I/O arbitration framework for netCDF‐based big data processing workflows on high‐end supercomputers

OPENALEX - Publications

Jianwei Liao Balazs Gerofi Guo‐Yuan Lien Takemasa Miyoshi Seiya Nishizawa and 4 more

Summary On the verge of convergence between high‐performance computing and Big Data processing, it has become increasingly prevalent to deploy large‐scale data analytics workloads on high‐end supercomputers. Such applications often come in form complex workflows with various different components, assimilating from scientific simulations as well measurements streamed sensor networks, such radars satellites. For example, part Flagship 2020 (post‐K) supercomputer project Japan, RIKEN is...

10.1002/cpe.4161 article EN Concurrency and Computation Practice and Experience 2017-05-15

A Multi-Kernel Survey for High-Performance Computing

OPENALEX - Publications

Balazs Gerofi Yutaka Ishikawa Rolf Riesen Robert W. Wisniewski Yoonho Park and 1 more

In HPC, two trends have led to the emergence and popularity of an operating-system approach in which multiple kernels are run simultaneously on each compute node. The first trend has been increase complexity HPC software environment, placed traditional kernel approaches under stress. Meanwhile, microprocessors with more cores being produced, allowing specialization within a As is typical emerging field, different groups considering many deploying multi-kernels.

10.1145/2931088.2931092 article EN 2016-05-25

Revisiting virtual memory for high performance computing on manycore architectures

OPENALEX - Publications

Y. Soma Balazs Gerofi Yutaka Ishikawa

Page-based memory management (paging) is utilized by most of the current operating systems (OSs) due to its rich features such as prevention fragmentation and fine-grained access control. virtual memory, however, stores physical mappings in page tables that also reside main memory. Because translating addresses requires walking tables, which turn implies additional accesses, modern CPUs employ translation lookaside buffers (TLBs) cache mappings. Nevertheless, TLBs are limited size...

10.1145/2612262.2612264 article EN 2014-06-10

Exploring Data Migration for Future Deep-Memory Many-Core Systems

OPENALEX - Publications

Swann Perarnau Judicael A. Zounmevo Balazs Gerofi Kamil Iskra Pete Beckman

Upcoming high-performance computing (HPC) platforms will have more complex memory hierarchies with high-bandwidth on-package and in the future also non-volatile memory. How to use such deep effectively remains an open research question. In this paper we evaluate performance implications of a scheme based on software-managed scratchpad coarse-grained memory-copy operations migrating application data structures between hierarchy levels. We expect that can, under specificcircumstances,...

10.1109/cluster.2016.42 article EN 2016-09-01

At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads

OPENALEX - Publications

Jens Domke Emil Vatai Balazs Gerofi Yuetsu Kodama Mohamed Wahib and 7 more

Over the last three decades, innovations in memory subsystem were primarily targeted at overcoming data movement bottleneck. In this paper, we focus on a specific market trend technology: 3D-stacked and caches. We investigate impact of extending on-chip capabilities future HPC-focused processors, particularly by SRAM. First, propose method oblivious to gauge upper-bound performance improvements when costs are eliminated. Then, using gem5 simulator, model two variants hypothetical LARge Cache...

10.1145/3629520 article EN ACM Transactions on Architecture and Code Optimization 2023-10-25

RDMA Based Replication of Multiprocessor Virtual Machines over High-Performance Interconnects

OPENALEX - Publications

Balazs Gerofi Yutaka Ishikawa

With the growing prevalence of cloud computing and increasing number CPU cores in modern processors, symmetric multiprocessing (SMP) Virtual Machines (VM), i.e. virtual machines with multiple CPUs, are gaining significance. However, accommodating SMP high availability at low overhead is still an open problem. Checkpoint-recovery based VM replication emerging approach, but it comes price significant performance degradation application executed due to large amount state that needs be...

10.1109/cluster.2011.13 article EN 2011-09-01

Proposing a new task model towards many-core architecture

OPENALEX - Publications

Akio Shimada Balazs Gerofi Atsushi Hori Yutaka Ishikawa

Many-core processors are gathering attention in the areas of embedded systems due to their power-performance ratios. To utilize cores a many-core processor parallel, programmers build multi-task applications that use task models provided by operating systems. However, conventional cause some scalability problems when executed on processors. In this paper, new model named Partitioned Virtual Address Space (PVAS), which solves problems, is proposed. PVAS enhances inter-task communications and...

10.1145/2489068.2489075 article EN 2013-06-24

Coming Soon ...