- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Neural Networks and Applications
- Matrix Theory and Algorithms
- T-cell and B-cell Immunology
- Artificial Immune Systems Applications
- Low-power high-performance VLSI design
- Evolutionary Algorithms and Applications
- Interconnection Networks and Systems
- Advanced Memory and Neural Computing
- Numerical Methods and Algorithms
- Algorithms and Data Compression
- Embedded Systems Design Techniques
- Scientific Computing and Data Management
- Neural dynamics and brain function
- Animal and Plant Science Education
- Flowering Plant Growth and Cultivation
- Gene Regulatory Network Analysis
- Cloud Computing and Resource Management
- Greenhouse Technology and Climate Control
- Seedling growth and survival studies
- Advanced Vision and Imaging
- Scientific Research and Discoveries
- Quantum Computing Algorithms and Architecture
CSCS - Swiss National Supercomputing Centre
2017-2021
Swisscom (Switzerland)
2017-2021
ETH Zurich
2018-2019
National Technical University of Athens
2007-2013
National and Kapodistrian University of Athens
2006-2012
Institute of Communication and Computer Systems
2012
The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy improve kernel's performance is drastically reduce volume involved in computations. Since storage formats for sparse matrices include metadata describing structure non-zero elements within matrix, we propose a generalized approach compress by exploiting...
In this paper we revisit the performance issues of widely used sparse matrix-vector multiplication (SpMxV) kernel on modern microarchitectures. Previous scientific work reports a number different factors that may significantly reduce performance. However, interaction these with underlying architectural characteristics is not clearly understood, fact lead to misguided and thus unsuccessful attempts for optimization. order gain an insight details SpMxV performance, conduct suite experiments...
Sparse matrix-vector multiplication (SpM × V) has been characterized as one of the most significant computational scientific kernels. The key algorithmic characteristic SpM V kernel, that inhibits it from achieving high performance, is its very low flop:byte ratio. In this paper, we present a compressed storage format, called Compressed eXtended (CSX), able to detect and encode simultaneously multiple commonly encountered substructures inside sparse matrix. Relying on aggressive compression...
The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy improve kernel's performance is drastically reduce volume involved in computations. Since storage formats for sparse matrices include metadata describing structure non-zero elements within matrix, we propose a generalized approach compress by exploiting...
Sparse matrix-vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and underlying architecture. The main problem of SpMV high demands memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One most promising optimization techniques for blocking, can reduce indexing structures storing sparse matrix, therefore alleviate pressure to subsystem. However, blocking methods...
A hybrid evolutionary technique is proposed for data mining tasks, which combines a principle inspired by the immune system, namely clonal selection principle, with more common, though very efficient, technique, gene expression programming (GEP). The regulates response in order to successfully recognize and confront any foreign antigen, at same time allows amelioration of across successive appearances antigen. On other hand, descendant genetic algorithms eliminates their main disadvantages,...
Sparse matrix-vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and underlying architecture. The main problem of SpMV high demands memory bandwidth, which cannot yet be abundantly offered from modern commodity architectures. One most promising optimization techniques for blocking, can reduce indexing structures storing sparse matrix, therefore alleviate pressure to subsystem. In this paper, we study...
The Sparse Matrix-Vector Multiplication (SpMV) kernel ranks among the most important and thoroughly studied linear algebra operations, as it lies at heart of many iterative methods for solution sparse systems, often constitutes a severe performance bottleneck. Its optimization, which is intimately associated with data structures used to store matrix, has always been particular interest applied mathematics computer science communities attracted further attention since advent multicore...
We introduce Arbor, a performance portable library for simulation of large networks multi-compartment neurons on HPC systems. Arbor is open source software, developed under the auspices HBP. The portability by virtue back-end specific optimizations x86 multicore, Intel KNL, and NVIDIA GPUs. When coupled with low memory overheads, these make an order magnitude faster than most widely-used comparable software. single-node can be scaled out to run very models at extreme scale efficient weak...
A hybrid evolutionary technique is proposed for data mining tasks, which combines the Clonal Selection Principle with Gene Expression Programming (GEP). The algorithm introduces notion of Data Class Antigens, used to represent a class data. produced rules are evolved by clonal selection algorithm, extends recently CLONALG algorithm. In present among other new features, receptor editing step has been incorporated. Moreover, themselves represented as antibodies, coded GEP chromosomes, in order...
Symmetric sparse matrices arise often in the solution of linear systems. Exploiting non-zero element symmetry order to reduce overall matrix size is very tempting for optimizing symmetric Sparse Matrix-Vector Multiplication kernel (SpMxV) multicore architectures. Despite being beneficial single-threaded execution, not storing upper or lower triangular part a complicates multithreaded SpMxV version, since it introduces an undesirable dependency on output vector elements. The most common...
NestMC is a new multicompartment neural network simulator currently under development as collaboration between the Simulation Lab Neuroscience at Forschungszentrum Jülich, Barcelona Supercomputing Center and Swiss National Center.NestMC will enable scales classes of morphologically detailed neuronal simulations on current future supercomputing architectures.A number "many-core" architectures such GPU Intel Xeon Phi based systems are available.To optimally use these emerging architecture...
In this paper we explore the impact of block shape on blocked and vectorized versions Sparse Matrix-Vector Multiplication (SpMV) kernel build upon previous work by performing an extensive experimental evaluation most widespread blocking storage format, namely Block Compressed Row (BCSR) a set modern commodity microarchitectures. We evaluate merit vectorization memory-bound SpMV report results for single- multithreaded (both SMP NUMA) configurations. The performance can significantly vary...
Artificial immune systems (AIS) constitute an emerging and promising field, have been applied to pattern recognition classification tasks a limited extent so far. This work is first attempt of applying the clonal selection principle training multi-layer perceptrons (MLPs). The based neural classifier (CSNC) uses basic concepts evolve MLPs, which are represented as real-valued linear antibodies. proposed system actually multi-classifier, consisting multiple sets each one devoted different...
We describe a strategy for code modernisation of Gadget, widely used community computational astrophysics. The focus this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. identify and isolate sample kernel, which representative typical Smoothed Particle Hydrodynamics (SPH) algorithm. modifications include threading parallelism change the data layout into Structure Arrays (SoA), auto-vectorisation algorithmic improvements in particle...
Collective communication, namely the pattern allreduce in message-passing systems, is optimised based on measurements at installation time of library. The algorithms used are set up an initialisation phase as so-called persistent collective introduced interface (MPI) standard. Part our patterns reduce_scatter and allgatherv which also considered standalone. For for short messages existing cyclic shift algorithm (Bruck's algorithm) applied with a prefix operation. long allgatherv, where...
This paper presents a new approach for the execution of coarse-grain (tiled) parallel SPMD code applications derived from explicit discretization 1-dimensional PDE problems with finite-differencing schemes. Tiling transformation is an efficient loop to achieve parallelism in such algorithms, while rectangular tile shapes are only feasible that can be manually applied by program developers. However, tiling transformations not always valid due data dependencies, and thus requiring application...
In this paper, we apply a method for extracting running power estimate of applications from hardware performance counters, producing power/time curves which can be integrated over particular intervals to the energy consumption individual application stages. We use instrument executions conjugate gradient solver, examine and impacts applying Compressed Sparse eXtended (CSX) classic Row (CSR) matrix compression methods sparse linear systems different areas. The CSX format requires...
Summary We present a methodology to enable the complete software development life cycle on Cray XC systems within container that can hold any version of Programming Environment (CPE). The installation CPE inside facilitates many aspects typical HPC support and operation workloads managing such as testing new CPEs, comparing performances, or keeping built with an old running updated systems. procedure for creating consists three steps: creation holding targeted CPE, compilation desired...