- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Interconnection Networks and Systems
- Embedded Systems Design Techniques
- Advanced Chemical Physics Studies
- Opportunistic and Delay-Tolerant Networks
- Vehicular Ad Hoc Networks (VANETs)
- Magnetic confinement fusion research
- Distributed systems and fault tolerance
- Spectroscopy and Quantum Chemical Studies
- Mobile Ad Hoc Networks
- Quantum and electron transport phenomena
- Scientific Computing and Data Management
- Particle accelerators and beam dynamics
- Physics of Superconductivity and Magnetism
- Fluid Dynamics and Vibration Analysis
- Theoretical and Computational Physics
- Ionosphere and magnetosphere dynamics
- Fluid Dynamics and Turbulent Flows
- Quantum many-body systems
- Machine Learning in Materials Science
- Quantum Computing Algorithms and Architecture
- Graph Theory and Algorithms
Lawrence Berkeley National Laboratory
2016-2025
Silicon Austria Labs (Austria)
2023-2024
Graz University of Technology
2024
South Valley University
2024
New Valley University
2024
Cairo University
2008-2023
Microsoft Research (United Kingdom)
2023
University of California, Riverside
2023
Central Metallurgical Research and Development Institute
2022
FH Kärnten
2021
Live migration is a widely used technique for resource consolidation and fault tolerance. KVM Xen use iterative pre-copy approaches which work well in practice commercial applications. In this paper, we study live of MPI OpenMP scientific applications running on present detailed performance analysis the process. We show that due to high rate memory changes, current control target downtime heuristics do not cope with HPC applications: statically choosing limits downtimes infeasible mechanisms...
We report our experiences porting Spark to large production HPC systems. While performance in a data center installation (with local disks) is dominated by the network, results show that file system metadata access latency can dominate using Lustre: it determines single node up 4x slower than typical workstation. evaluate combination of software techniques and hardware configurations designed address this problem. For example, on side we develop pooling layer able improve per 2.8x. On with...
We present a method for accurate aggregation of highway traffic information in vehicular ad hoc networks (VANETs). Highway congestion notification applications need to disseminate about conditions distant vehicles. In dense traffic, is needed allow single frame carry large number Our technique, CASCADE, uses compression provide without losing accuracy. show that CASCADE makes efficient use the wireless channel while providing each vehicle with data highly accurate, represents area front...
In this paper, we study the effect of dense vehicular networks on data dissemination. When using intelligent broadcasting techniques, such as Inter-Vehicle Geocast, have discovered spatial broadcast storm problem in which multiple vehicles will be chosen to re-broadcast frames at nearly same time, resulting channel contention and collisions. We present a probabilistic version IVG (p-IVG) address problem. p-IVG, is depending upon traffic density surrounding vehicles. show that p-IVG solves...
The gyrokinetic Particle-in-Cell (PIC) method is a critical computational tool enabling petascale fusion simulation research. In this work, we present novel multi- and manycore-centric optimizations to enhance performance of GTC, PIC-based production code for studying plasma microturbulence in tokamak devices. Our encompass all six GTC sub-routines include multi-level particle grid decompositions designed improve multi-node parallel scaling, binning improved load balance, GPU acceleration...
We present the first investigation of unusual nonlinear Hall effects in twisted multilayer 2D materials. Contrary to expectations, our study shows that these are not merely extensions their monolayer counterparts. Instead, we find stacking order and pairwise interactions between neighboring layers, mediated by Berry curvatures, play a pivotal role shaping collective optical response. By combining large-scale Real-Time Time-Dependent Density Functional Theory (RT-TDDFT) simulations with model...
Abstract We present the first investigation of unusual nonlinear Hall effects in twisted multilayer 2D materials. Contrary to expectations, our study shows that these are not merely extensions their monolayer counterparts. Instead, we find stacking order and pairwise interactions between neighboring layers, mediated by Berry curvatures, play a pivotal role shaping collective optical response. By combining large-scale Real-Time Time-Dependent Density Functional Theory (RT-TDDFT) simulations...
The Hartree-Fock generalized Kadanoff-Baym ansatz (HF-GKBA) offers an approximate numerical procedure for propagating the two-time nonequilibrium Green's function (NEGF). Here, using $GW$ self-energy, we compare HF-GKBA to exact results a variety of systems with long- and short-range interactions, different two-body interaction strengths, various preparations. We find excellent agreement between time evolution in models when more realistic long-range exponentially decaying interactions are...
Calculations of excited states in the Green's function formalism often invokes diagonal approximation, which quasiparticle are taken from a mean-field calculation. In this paper, we extend stochastic approaches applied many-body perturbation theory and overcome limitation for large systems interested small subset states. We separate problem into core subspace whose coupling to remainder system environment is stochastically sampled. This method exemplified on computing hole injection energies...
Abstract Dynamic (temporal) graphs are a convenient mathematical abstraction for many practical complex systems including social contacts, business transactions, and computer communications. Community discovery is an extensively used graph analysis kernel with rich literature static graphs. However, community in dynamic setting challenging two specific reasons. Firstly, the notion of temporal lacks widely accepted formalization, only limited work exists on understanding how communities...
In this paper we characterize the behavior with respect to memory locality management of scientific computing applications running in virtualized environments. NUMA on current solutions (KVM and Xen) is enforced by pinning virtual machines CPUs providing aware allocation hyper visors. Our analysis shows that due two-level lack integration page reclamation mechanisms, warm VMs suffer from a ``leakage'' locality. results using MPI, UPC Open MP implementations NAS Parallel Benchmarks, Intel AMD...
Alkaline water electrolysis is considered to be a basic technique for hydrogen production.Many researchers have investigated the alkaline in order promote electrochemical reaction.In present paper, effects of voltage, electrolyte concentration and space between pair electrodes on amount produced consequently overall efficiency are experimentally investigated.The experimental measurements carried out by authors at fluid mechanics laboratory Menoufiya University.The different potassium...
The power and procurement cost of bandwidth in system-wide networks has forced a steady drop the byte/flop ratio. This trend computation becoming faster relative to network is expected hold. In this paper, we explore how cost-oriented task placement enables reducing by enabling high performance even on tapered topologies where more provisioned at lower levels. We describe APHiD, an efficient hierarchical algorithm that uses new techniques improve quality heuristic solutions reduces demand...
Most VANET simulators do not allow for feedback between the vehicle mobility model and network simulator. This limits realistic simulation of safety traffic information applications that might cause drivers to change their routes. To address this issue, we developed extensions SWANS wireless Our modules, which collectively call ASH (Application-aware with Highway mobility), make several contributions. allows needed two-way communication networking model. support highway scenarios, built in a...
Reliable predictive simulation capability addressing confinement properties in magnetically confined fusion plasmas is critically-important for ITER, a 20 billion dollar international burning plasma device under construction France. The complex study of kinetic turbulence, which can severely limit the energy and impact economic viability systems, requires simulations at extreme scale such an unprecedented size. Our newly optimized, global, ab initio particle-in-cell code solving nonlinear...
Efficient communication is a requirement for application scalability on High Performance Computing systems. In this paper we argue incorporating proactive congestion avoidance mechanisms into the design of layers manycore This in contrast with status quo which employs reactive approach, \emph{e.g.} control are activated only when resources have been exhausted. We present core stateless optimization approach based open loop end-point throttling, implemented two UPC runtimes (Cray and Berkeley...
The gyrokinetic toroidal code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5-D Vlasov–Poisson equation featuring efficient utilization of modern parallel computer architectures petascale beyond. Motivated by goal developing capable dealing with physics challenge increasing problem size sufficient resolution, new thread-level optimizations have been introduced as well key additional domain decomposition. GTC-P’s multiple levels parallelism,...
We present a new velocity-gauge real-time, time-dependent density functional tight-binding (VG-rtTDDFTB) implementation in the open-source DFTB+ software package (https://dftbplus.org) for probing electronic excitations large, condensed matter systems. Our VG-rtTDDFTB approach enables real-time electron dynamics simulations of periodic, systems containing thousands atoms with favorable computational scaling as function system size. provide details and benchmark calculations to demonstrate...
We present an open-source software package, TRAVOLTA (Terrific Refinements to Accelerate, Validate, and Optimize Large Time-dependent Algorithms), for carrying out massively parallelized quantum optimal control calculations on GPUs. The package is a significant overhaul of our previous NIC-CAGE algorithm also includes algorithmic improvements the gradient ascent procedure enable faster convergence. examine three different variants GPU parallelization assess their performance in constructing...
Data aggregation is an important issue for vehicular ad-hoc networks (VANETs). Congestion notification applications are built to warn drivers of traffic slowdowns far enough in advance that the may take alternate routes. broadcast should be self-contained and fit into a single MAC-layer frame. With dense traffic, needed represent large number vehicles relatively small We present new technique aggregating vehicles' data without losing accuracy. Vehicles build local view based on speed...
The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This work presents novel analysis and optimization techniques enhance performance of GTC on large-scale machines. We introduce cell access better manage locality vs. synchronization tradeoffs CPU GPU-based architectures. Our optimized hybrid parallel implementation MPI, OpenMP, NVIDIA CUDA, achieves up a 2× speedup over reference Fortran version multiple systems, scales tens...
Libtensor is a framework designed to implement the tensor contractions arising form coupled cluster and equations of motion computational quantum chemistry equations. It has been optimized for symmetry sparsity be memory efficient. This allows it run efficiently on ubiquitous cost-effective SMP architectures. Unfortunately, movement controllers chip endowed these systems with strong NUMA properties. Moreover, many core trend in processor architecture demands that implementation extremely...