- Cloud Computing and Resource Management
- Advanced Data Storage Technologies
- Algorithms and Data Compression
- Advanced Neural Network Applications
- IoT and Edge/Fog Computing
- Video Surveillance and Tracking Methods
- Parallel Computing and Optimization Techniques
- Genomics and Phylogenetic Studies
- Distributed and Parallel Computing Systems
- DNA and Biological Computing
- Advanced Image and Video Retrieval Techniques
- Evolutionary Algorithms and Applications
- Software-Defined Networks and 5G
- Machine Learning in Materials Science
- Graph Theory and Algorithms
- Gene expression and cancer classification
- Visual Attention and Saliency Detection
- Distributed systems and fault tolerance
- Software System Performance and Reliability
- Topic Modeling
- Nuclear Materials and Properties
- Clinical Laboratory Practices and Quality Control
- CRISPR and Genetic Engineering
- Image Enhancement Techniques
- Generative Adversarial Networks and Image Synthesis
Universitat Politècnica de Catalunya
2013-2022
Barcelona Supercomputing Center
2010-2022
MapReduce is a data-driven programming model proposed by Google in 2004 which especially well suited for distributed data analytics applications. We consider the management of applications an environment where multiple share same physical resources. Such sharing line with recent trends center aim to consolidate workloads order achieve cost and energy savings. In shared environment, it necessary predict manage performance given set goals defined them. this paper, we address problem...
Microservices architecture has started a new trend for application development number of reasons: (1) to reduce complexity by using tiny services; (2) scale, remove and deploy parts the system easily; (3) improve flexibility use different frameworks tools; (4) increase overall scalability; (5) resilience system. Containers have empowered usage microservices architectures being lightweight, providing fast start-up times, having low overhead. can be used develop applications based on...
Autoscaling methods are used for cloud-hosted applications to dynamically scale the allocated resources guaranteeing Quality-of-Service (QoS). The public-facing application serves dynamic workloads, which contain bursts and pose challenges autoscaling ensure performance. Existing State-of-the-art burst-oblivious determine provision appropriate resources. For it is hard detect handle online maintaining In this article, we propose a novel burst-aware method detects burst in workloads using...
Recent advances in hardware, such as systems with multiple GPUs and their availability the cloud, are enabling deep learning various domains including health care, autonomous vehicles, Internet of Things. Multi-GPU exhibit complex connectivity among between CPUs. Workload schedulers must consider hardware topology workload communication requirements order to allocate CPU GPU resources for optimal execution time improved utilization shared cloud environments.
Next generation data centers will be composed of thousands hybrid systems in an attempt to increase overall cluster performance and minimize energy consumption. New programming models, such as MapReduce, specifically designed make the most very large infrastructures leveraged develop massively distributed services. At same time, bring unprecedented degree workload consolidation, hosting infrastructure services from many different users. In this paper we present our advancements leveraging...
This paper presents a scheduling technique for multi-job MapReduce workloads that is able to dynamically build performance models of the executing workloads, and then use these purposes. ability leveraged adaptively manage workload while observing taking advantage particulars execution environment modern data analytics applications, such as hardware heterogeneity distributed storage. The targets highly dynamic in which new jobs can be submitted at any time, share physical resources with...
The recent upsurge in the available amount of health data and advances next-generation sequencing are setting ground for long-awaited precision medicine. To process this deluge data, bioinformatics workloads becoming more complex computationally demanding. For reasons they have been extended to support different computing architectures, such as GPUs FPGAs, leverage form parallelism typical each architectures. paper describes how a genomic workload k-mer frequency counting that takes...
Abstract Modern applications demand resources at an unprecedented level. In this sense, data-centers are required to scale efficiently cope with such demand. Resource disaggregation has the potential improve resource-efficiency by allowing deployment of workloads in more flexible ways. Therefore, industry is shifting towards disaggregated architectures, which enables new ways structure hardware data centers. However, determining best performing resource provisioning a complicated task. The...
The emergence of Next Generation Sequencing (NGS) platforms has increased the throughput genomic sequencing and in turn amount data that needs to be processed, requiring highly efficient computation for its analysis. In this context, modern architectures including accelerators non-volatile memory are essential enable mass exploitation these bioinformatics workloads. This paper presents a redesign main component state-of-the-art reference-free method variant calling, SMUFIN, which been...
We present our work on developing and training scalable graph foundation models (GFM) using HydraGNN, a multi-headed convolutional neural network architecture. HydraGNN expands the boundaries of (GNN) in both scale data diversity. It abstracts over message passing algorithms, allowing reproduction comparison across algorithmic innovations that define convolution GNNs. This discusses series optimizations have allowed scaling up GFM to tens thousands GPUs datasets consist hundreds millions...
Conditional Restricted Boltzmann Machine (CRBM) is a promising candidate for multidimensional system modeling that can learn probability distribution over set of data. It specific type an artificial neural network with one input (visible) and output (hidden) layer. Recently published works demonstrate CRBM suitable mechanism time series such as human motion, workload characterization, city traffic analysis. The process learning inference these systems relies on linear algebra functions like...
In this paper we present a MapReduce task scheduler for shared environments in which is executed along with other resource-consuming workloads, such as transactional applications. All workloads may potentially share the same data store, some of them consuming analytics purposes while others acting generators. This kind scenario becoming increasingly important centers where improved resource utilization can be achieved through workload consolidation, and specially challenging due to...
As the adoption of Big Data technologies becomes norm in an increasing number scenarios, there is also a growing need to optimize them for modern processors. Spark has gained momentum over last few years among companies looking high performance solutions that can scale out across different cluster sizes. At same time, processors be connected large amounts physical memory, range up terabytes. This opens enormous opportunities runtimes and applications aim improve their by leveraging low...
Smith-Waterman algorithm is primarily used in DNA and protein sequencing which helps by a local sequence alignment to determine similarities between biomolecule sequences. However the inefficiency performance of this limits its applications real world. In perspective, work presents two fold contributions. It develops evaluates mathematical model for targeting distributed processing system. This can be helpful estimate larger size sequences aligned thread level parallelism, using large set...
Summary Powered by deep learning, video analytic applications process millions of camera feeds in real‐time to extract meaningful information from their surroundings. And this number grows the minute. To avoid saturating backhaul network and provide lower latencies, a distributed heterogeneous edge cloud is postulated as key enabler for widespread analytics. This article provides complete characterization end‐to‐end analytics across set hardware platforms different neural architectures. Each...
Serverless computing is a cloud-based execution paradigm that allows provisioning resources on-demand, freeing developers from infrastructure management and operational concerns. It typically involves deploying workloads as stateless functions take no when not in use, meant to scale transparently. To make serverless effective, providers impose limits on per-function level, such maximum duration, fixed amount of memory, persistent local storage. These constraints it challenging for...
Current distributed key-value stores generally provide greater scalability at the expense of weaker consistency and isolation. However, additional isolation support is becoming increasingly important in environments which these are deployed, where different kinds applications with needs executed, from transactional workloads to data analytics. While fully-fledged ACID may not be feasible, it still possible take advantage design stores, often include notion multiversion concurrency control,...
Disaggregation of resources is a datacenter strategy that aims to decouple the physical location from place where they are accessed, as opposed physically attached devices connected Peripheral Component Interconnect Express (PCIe) bus. By attaching and detaching through fast interconnection network, it possible increase flexibility manage infrastructures while keeping performance pooled disaggregated devices. This article introduces workload scheduling placement policies for environments...