- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Metaheuristic Optimization Algorithms Research
- Evolutionary Algorithms and Applications
- Interconnection Networks and Systems
- Advanced Numerical Methods in Computational Mathematics
- Matrix Theory and Algorithms
- Quantum Computing Algorithms and Architecture
- Embedded Systems Design Techniques
- Cloud Computing and Resource Management
- Nuclear reactor physics and engineering
- Model Reduction and Neural Networks
- Advanced Multi-Objective Optimization Algorithms
- Semiconductor materials and devices
- Nuclear Physics and Applications
- Data Mining Algorithms and Applications
- Electromagnetic Simulation and Numerical Methods
- Software System Performance and Reliability
- Electromagnetic Scattering and Analysis
- Quantum and electron transport phenomena
- Numerical methods for differential equations
- Genomics and Phylogenetic Studies
- Multi-Criteria Decision Making
- Particle physics theoretical and experimental studies
RIKEN Center for Computational Science
2013-2024
RIKEN
2022
Hokkaido University
1998-2015
University of Tsukuba
2009-2014
Institut Lavoisier de Versailles
2013
Muroran Institute of Technology
2008
Fujitsu (Japan)
2002
Osaka University
1998
Enamine (Ukraine)
1948
We have been carrying out the FLAGSHIP 2020 Project to develop Japanese next-generation flagship supercomputer, Post-K, recently named "Fugaku". designed an original many core processor based on Armv8 instruction sets with Scalable Vector Extension (SVE), A64FX processor, as well a system including interconnect and storage subsystem industry partner, Fujitsu. The "co-design" of applications is key making it power efficient high performance. determined architectural parameters by reflecting...
Real space DFT (RSDFT) is a simulation technique most suitable for massively-parallel architectures to perform first-principles electronic-structure calculations based on density functional theory. We here report unprecedented simulations the electron states of silicon nanowires with up 107,292 atoms carried out during initial performance evaluation phase K computer being developed at RIKEN.
RIKEN Center for Computational Science has been installing the supercomputer Fugaku. The Fujitsu A64FX, based on Armv8.2-A+SVE architecture, is used in system. In this paper, we evaluated seven HPC applications and benchmarks A64FX. a performance comparison with Marvell (Cavium) ThunderX2 processor Intel Xeon Skylake processor, A64FX achieved higher memory bandwidth-intensive application thanks to its high bandwidth. However, confirmed that of decreased from lack out-of-order resources. To...
Silicon nanowires are potentially useful in next-generation field-effect transistors, and it is important to clarify the electron states of silicon know behavior new devices. Computer simulations promising tools for calculating states. Real-space density functional theory (RSDFT) code performs first-principles electronic structure calculations. To obtain higher performance, we applied various optimization techniques code: multi-level parallelization, load balance management, sub-mesh/torus...
We use a random search technique to find quantum gate sequences that implement perfect state preparation or unitary operator synthesis with arbitrary targets. This approach is based on the recent discovery there large multiplicity of circuits achieve unit fidelity in performing given target operation, even at minimum number single-qubit and two-qubit gates needed fidelity. show fraction perfect-fidelity increases rapidly as soon circuit size exceeds required for achieving result implies...
Modern high performance processors are equipped with very wide SIMD instruction set. SVE (Scalable Vector Extension) is an ARM® technology that supports vector lengths from 128 bits to 2048 bits. One of its promising features offer "vector-length agnostic" programming allow the same code run on hardware any length without modification code. This feature would be useful explore best appropriate resources in space various combinations parameters order make more efficient use resources, since...
The supercomputer "Fugaku" is an exascale manycore-based parallel system developed as a Japanese national flagship in the FLAGSHIP 2020 Project. While was ranked first for several benchmarks such TOP500, HPCG, HPL-AI, and Graph500 2020, major design concept application-first by co-design power efficiency high performance. We have designed original manycore processor based on Armv8 instruction sets with scalable vector extension, A64FX processor, Fujitsu, our industry partner. consists of...
Future HPC systems, including post-exascale supercomputers, will face severe problems such as the slowing-down of Moore's law and limitation power supply. To achieve desired system performance improvement while counteracting these issues, hardware design optimization is a key factor. In this paper, we investigate future directions SIMD-based processor architectures by using A64FX chip customized version power/performance/area simulators, i.e., Gem5 McPAT. More specifically, based on chip,...
In this paper, we propose a new development and execution environment based on workflow PGAS methodologies for parallel programmings in post-petascale systems. It is expected that systems will have huge highly hierarchical architecture with nodes of many-core processors accelerators. For current programs, MPI, MPI/OpenMP hybrid, so on, it would be sometimes difficult to exploit the efficiently. The proposed environment, called FP2C (Framework Post-Petascale Computing), supports multi-program...
We have used data learning and low-precision computation to develop an implicit solver that demonstrates high performance up 152,352 computer nodes (609,408 MPI processes × 12 OpenMP threads = 7,312,896 parallel computation) conducted unprecedented ultra-large-scale analysis of ultra-high-fidelity fault-structure systems using nonlinear dynamic finite element on three-dimensional low-order unstructured elements. The developed achieved 25.45-fold speedup from the state art Fugaku attained...
Genetic Algorithms perform crossovers effectively when linkage sets - of variables tightly linked to form building blocks are identified. Several methods have been proposed detect the sets. Perturbation (PMs) investigate fitness differences by perturbations gene values and Estimation distribution algorithms (EDAs) estimate promising strings. In this paper, we propose a novel approach combining both them, which detects dependencies estimating strings clustered according differences. The...
The adoption of ARM processor architectures is on the rise in HPC ecosystem. Fugaku supercomputer a homogeneous ARM-based machine, and one among most powerful machine world. In programming world, dependent task-based models are gaining tractions due to their many advantages: dynamic load balancing, implicit expression communication/computation overlap, early-bird communication posting,...MPI OpenMP two widespreads standards that make possible at distributed memory level. Despite its...
We propose a crossover method to combine complexly overlapping building blocks (BBs). Although there have been several techniques identify linkage sets of loci o form BB [4, 6, 7, 10, 11], the way realize effective from information such has not studied enough. Especially for problems with BBs, proposed by Yu et al. [13] is first and only known research, however it cannot perform well BBs due insufficient variety sites. In this paper, we which examines values given parental strings minutely...
Abstract The effect of initial bubble conditions on the transient dynamics a small ( d = 1.5 mm) rising in water is computationally considered by fully three‐dimensional direct numerical simulation. algorithm based coupled level set/volume‐of‐fluid (CLSVOF) method for representing and updating air‐water interface sharp approach used to treat interfacial boundary conditions. are investigated using bubbles with five different kinds shapes as It shown that states shape, trajectory, terminal...
In previous reports, the authors showed results of experiments on growth inhibition tests 12 kinds acylamino acid against Staphylococci (Terashima strain) and in present expt., same were made with 17 acids. Results as follows: (1) Growth increases following order. Lauryl-dl-alanine (1, 000-2, 000); lauryl-dl-α amino-n-butyric (2, lauryl-dl valine (4, lauryl-dl-phenylalanine (16, 000-32, 000). (2) Laurination amino is more effective than caprinylation. Benzoylation ineffective. (3)...
Non-uniform memory access (NUMA) systems, where each processor has its own memory, have been popular platform in high-end computing. While some early studies had reported that a flat-MPI programming model outperformed an OpenMP/MPI hybrid on SMP clusters, the of shared-memory, thread-based and distributed-memory, message passing is considered to be promising multi-core multi-socket NUMA clusters. We explore performance large scale cluster called T2K Open Supercomputer. Both benchmark (NPB,...
A technique is presented for solving neutron diffusion equations with the boundary element method (BEM) based on a hierarchical domain decomposition technique. In this method, reactor decomposed into homogeneous regions and condition common of initially assumed. The equation solved iteratively at two levels structure: First, BEM applied to solve each region under given assumed conditions an multiplication factor. Then, these values are modified satisfy continuity flux current.The proposed...
Widely used benchmarks, such as High Performance Linpack (HPL), do not always provide direct insights are notoriously poor indicators of into the actual application performance systems. When real applications used, and there have been criticisms indicating that simplified benchmarks HPL no longer strongly correlate to performance. In contrast, evaluations based on or mini may give a estimation The Sustained System (SSP) metric, which is evaluate systems at scale various applications, has...
In this paper, we focus on a distributed and parallel programming paradigm for massively multicore supercomputers. We introduce YML, development execution environment applications based graph of task components scheduled at runtime optimized several middlewares. Then show why YML may be well adapted to running lot cores. The tasks are developed with the PGAS language XMP directives. use YML/XMP implement block-wise Gaussian elimination solve linear systems. also implemented it MPI without...