- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Distributed systems and fault tolerance
- Software System Performance and Reliability
- Cloud Computing and Resource Management
- Interconnection Networks and Systems
- Medical Imaging Techniques and Applications
- Particle Detector Development and Performance
- Gene Regulatory Network Analysis
- Scientific Computing and Data Management
- Evolutionary Algorithms and Applications
- Advanced Memory and Neural Computing
- Lattice Boltzmann Simulation Studies
- Traffic Prediction and Management Techniques
- Advanced MEMS and NEMS Technologies
- Radiation Detection and Scintillator Technologies
- Simulation Techniques and Applications
- Industrial Vision Systems and Defect Detection
- Neural dynamics and brain function
- Embedded Systems Design Techniques
- Neuroscience and Neural Engineering
- Genetics, Bioinformatics, and Biomedical Research
- Animal Genetics and Reproduction
- Semiconductor materials and devices
Forschungszentrum Jülich
2011-2024
Ernst Ruska Centre
2022
San Francisco Department of Public Health
2022
John von Neumann Institute for Computing
2006-2007
Sandia National Laboratories California
2007
Oracle (United States)
2003
CSCS - Swiss National Supercomputing Centre
1996-2002
Organizzazione Sociopsichiatrica Cantonale
1996-2002
Swisscom (Switzerland)
2002
University of Wisconsin–Madison
2000-2002
Abstract Scalasca is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large‐scale systems with many thousands of processors. It offers an incremental performance‐analysis procedure integrates runtime summaries in‐depth studies concurrent via event tracing, adopting strategy successively refined measurement configurations. Distinctive features are its ability identify wait states in applications very large numbers processes and...
Although memory performance is often a limiting factor in application performance, most tools only show data relating to the instructions program, not its data. In this paper, we describe technique for directly measuring profile of an application. We and their user model, then discuss particular code, MCFbenchmark from SPEC CPU 2000. structures elements, use improve program performance. Finally, extensions work provide feedback compiler prefetching generate additional reports
Drug targeting promises to substantially enhance future therapies, for example through the focussing of chemotherapeutic drugs at site a tumor, thus reducing exposure healthy tissue unwanted damage. Promising work on steering medication in human body employs magnetic fields acting nanoparticles made paramagnetic materials. We develop computational tool aid optimization physical parameters these particles and configuration, estimating fraction reaching given target large patient-specific...
The processing power and memory capacity of independent heterogeneous parallel machines can be combined to form a single system that is more powerful than any its constituents. However, achieving satisfactory application performance on such metacomputer hard because the high latency inter-machine communication as well differences in hardware constituent may introduce various types wait states. In our earlier work, we have demonstrated automatic pattern search event traces identify sources...
In message-passing applications, the temporal or spatial distance between cause and symptom of a performance problem constitutes major difficulty in deriving helpful conclusions from data. Just knowing locations wait states program is often insufficient to understand reason for their occurrence. We present method verifying hypotheses on causality temporally spatially distant phenomena applications without altering application itself. The verification accomplished by modifying MPI event...
Many scientific and medical researchers are working towards the creation of a virtual human—a personalized digital copy an individual—that will assist in patient’s diagnosis, treatment recovery. The complex nature living systems means that development this remains major challenge. We describe progress enabling HemeLB lattice Boltzmann code to simulate 3D macroscopic blood flow on full human scale. Significant developments memory management load balancing allow near linear scaling performance...
Simulation is a third pillar next to experiment and theory in the study of complex dynamic systems such as biological neural networks. Contemporary brain-scale networks correspond directed random graphs few million nodes, each with an in-degree out-degree several thousands edges, where nodes edges fundamental units, neurons synapses, respectively. The activity neuronal also sparse. Each neuron occasionally transmits brief signal, called spike, via its outgoing synapses corresponding target...
PARAMICS is a PARAllel MICroscopic Traffic Simulator which is, to our knowledge, the most powerful of its type in world. The simulator can model around 200,000 vehicles on 7,000 roads (taken from real road traffic network data) at faster than 'real-time' rates, making use 16 K processor TMC Connection Machine CM-200 for simulation aspect. project aims make available planners new range tools, and demonstrates that high performance computing applications possible worthwhile, while yielding...
Developers of applications with large-scale computing requirements are currently presented a variety high-performance systems optimised for message-passing, however, effectively exploiting the available resources remains major challenge. In addition to fundamental application scalability characteristics, and system peculiarities often only manifest at extreme scales, requiring highly scalable performance measurement analysis tools that convenient incorporate in development tuning activities....
The performance behavior of parallel simulations often changes considerably as the simulation progresses --- with potentially process-dependent variations temporal patterns. While call-path profiling is an established method linking a problem to context in which it occurs, call paths reveal only little information about evolution phenomena. However, generating profiles separately for thousands iterations may exceed available buffer space especially when tree large and more than one metric...
Cray XT and IBM Blue Gene systems present current alternative approaches to constructing leadership computer relying on applications being able exploit very large configurations of processor cores, associated analysis tools must also scale commensurately isolate quantify performance issues that manifest at the largest scales. In studying scalability Scalasca toolset several hundred thousand MPI processes XT5 BG/P systems, we investigated a progressive execution deterioration well-known ASCI...
Generic simulation code for spiking neuronal networks spends the major part of time in phase where spikes have arrived at a compute node and need to be delivered their target neurons. These were emitted over last interval between communication steps by source neurons distributed across many nodes are inherently irregular unsorted with respect targets. For finding those targets, dispatched three-dimensional data structure decisions on thread synapse type made way. With growing network size,...
We can profile the performance behavior of parallel programs at level individual call paths through sampling or direct instrumentation. While we easily control measurement dilation by adjusting frequency, statistical nature and difficulty accessing parameters sampled events make it unsuitable for obtaining certain communication metrics, such as size message payloads. Alternatively, instrumentation, which is preferable capturing message-passing events, excessively dilate measurements,...
In studying the scalability of Scalasca performance analysis toolset to several hundred thousand MPI processes on IBM Blue Gene/P, we investigated a progressive execution deterioration well-known ASCI Sweep3D compact application. runtime summarization quantified communication time that correlated wth computational imbalance, and automated trace confirmed growing amounts waiting times. Further instrumentation, measurement analyses pinpointed conditional section highly imbalanced computation...