- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed and Parallel Computing Systems
- Interconnection Networks and Systems
- Cloud Computing and Resource Management
- Distributed systems and fault tolerance
- Embedded Systems Design Techniques
- Advanced Neural Network Applications
- Cellular Automata and Applications
- Computational Fluid Dynamics and Aerodynamics
- Real-Time Systems Scheduling
- Ferroelectric and Negative Capacitance Devices
- Software Engineering Research
- Data Management and Algorithms
- Marine and fisheries research
- Caching and Content Delivery
- Advanced Memory and Neural Computing
- Stochastic Gradient Optimization Techniques
- Metaheuristic Optimization Algorithms Research
- Meteorological Phenomena and Simulations
- Machine Learning and Data Classification
- Tensor decomposition and applications
- Advanced Database Systems and Queries
- Low-power high-performance VLSI design
- Matrix Theory and Algorithms
Universidade da Coruña
2016-2025
Universidade de Santiago de Compostela
2023
University of Illinois Urbana-Champaign
2007
Tiling has proven to be an effective mechanism develop high performance implementations of algorithms. can used organize computations so that communication costs in parallel programs are reduced and locality sequential codes or components is enhanced.In this paper, a data type - Hierarchically Tiled Arrays HTAs facilitates the direct manipulation tiles introduced. HTA operations overloaded array operations. We argue implementation OO languages transforms these into powerful tools for...
Efficient memory hierarchy design is critical due to the increasing gap between speed of processors and memory. One sources inefficiency in current caches non-uniform distribution accesses on cache sets. Its consequence that while some sets may have working are far from fitting them, other be underutilized because their set has fewer lines than set. In this paper we present a technique aims balance pressure by detecting when it beneficial associate sets, displacing stressed ones. This new...
Caches play a very important role in the performance of modern computer systems due to gap between memory and processor speed. Among methods for studying their behavior, most widely used by now has been trace-driven simulation. Nevertheless, analytical modeling gives more information requires smaller computation times that allow it be compilation step drive automatic optimizations on code. The traditional drawback its limited precision lack techniques apply systematically without user...
The importance of tiles or blocks in scientific computing cannot be overstated. Many algorithms, both iterative and recursive, can expressed naturally if are represented explicitly. From the point view performance, tiling, either as a code data layout transformation, is one most effective ways to exploit locality, which must achieve good performance current computers because significant difference speed between processor memory. Furthermore, also useful express distribution parallel...
The growing complexity in computer system hierarchies due to the increase number of cores per processor, levels cache (some them shared) and processors node, as well high-speed interconnects, demands use new optimization techniques libraries that take advantage their features. In this paper Servet, a suite benchmarks focused on detecting set parameters with high influence overall performance multicore systems, is presented. These are able detect hierarchy, including size which caches shared...
Analytical models have been used to estimate optimal values for parameters such as tile sizes in the context of loop nests. However, important algorithms fast Fourier transforms (FFTs) present a far more complex search space consisting many thousands different implementations with very access patterns and nesting code structures. As results, some best available FFT use heuristic based on runtime measurements. In this paper we first analytical model that can successfully replace measurement...
The divide-and-conquer pattern of parallelism is a powerful approach to organize on problems that are expressed naturally in recursive way. In fact, recent tools such as Intel Threading Building Blocks (TBB), which has received much attention, go further and make extensive usage this parallelize other approaches following strategies. paper we discuss the limitations express with algorithm templates provided by TBB. Based our observations, propose new template implemented top TBB improves...
SUMMARY This work presents cost‐effective multi‐graphics processing unit (GPU) parallel implementations of a finite‐volume numerical scheme for solving pollutant transport problems in bidimensional domains. The fluid is modeled by 2D shallow‐water equations, whereas the equation. domain discretized using first‐order Roe scheme. Specifically, this paper multi‐GPU both solution that exploits recomputation on GPU and an optimized based ghost cell decoupling approach. Our have been nonblocking...
Caches play a very important role in the performance of modern computer systems due to gap between memory and processor speed. Among methods for studying their behaviour, most widely used has been trace-driven simulation. Nevertheless, analytical modeling gives more information requires smaller computation times that allow it be compilation step drive automatic optimizations on code. The traditional drawback its limited precision lack techniques apply systematically without user...
Many problems of industrial and scientific interest require the solving tridiagonal linear systems. This paper presents several implementations for parallel large systems on multi-core architectures, using OmpSs programming model. The strategy used parallelization is based combination two different existing algorithms, PCR Thomas. Thomas algorithm, which cannot be parallelized, requires fewest number floating point operations. algorithm most popular method, but it more computationally...
The increasing gap between processor and main memory speeds makes the role of hierarchy behavior in system performance essential. Both hardware software techniques to improve this require good analysis tools that help predict understand such behavior. Analytical modeling arises as a choice field due its high speed if traditional limited precision is overcome. We present modular analytical strategy for arbitrary set-associative caches with LRU replacement policy. model differs from all...
In an intelligent memory architecture, the main of a computer is enhanced with many simple processors. The result highly-parallel, heterogeneous machine that able to exploit computation in memory. While several instantiations this architecture have been proposed, question how effectively program them little effort has remained major challenge.In paper, we show hand-program at high level and very modest effort. We use FlexRAM as prototype architecture. To it, propose family high-level...
In this paper, we show our initial experience with a class of objects, called Hierarchically Tiled Arrays (HTAs), that encapsulate parallelism. HTAs allow the construction single-threaded parallel programs where master process distributes tasks to be executed by collection servers holding components (tiles) HTAs. The tiled and recursive nature facilitates adaptation use them varying machine configurations, eases mapping data computers hierarchical organization. We have implemented as MATLAB™...
Abstract The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help predicting and understanding its behavior are required. Analytical modeling is ideal base for such if traditional limitations accuracy scope application can be overcome. While there has been extensive research on codes with regular access patterns, less attention paid to irregular patterns due increased difficulty analyzing them. Nevertheless, many important...
The study of a language in terms programmability is very interesting issue parallel programming. Traditional approaches this field have studied different methods, such as the number Lines Code or analysis programs, order to prove benefits using paradigm compared another. Nevertheless, these methods usually focus only on code analysis, without giving much importance conditions development process and even learning stage, disadvantages reported by programmers. In paper we present methodology...
The use of heterogeneous devices is becoming increasingly widespread. Their main drawback their low programmability due to the large amount details that must be handled. Another important problem reduced code portability, as most tools program them are vendor or device-specific. exception this observation OpenCL, which largely suffers from mentioned, particularly in host side. Heterogeneous Programming Library (HPL) a recent proposal improve situation, it couples portability with good...
Multicore machines are becoming common. There many languages, language extensions and libraries devoted to improve the programmability performance of these machines. In this paper we compare two libraries, that face problem programming multi-cores from different perspectives, task parallelism data parallelism. The Intel threading building blocks (TBB) library separates logical patterns, which easy understand, physical threads, delegates scheduling tasks system. On other hand, hierarchically...