Yolanda Becerra

ORCID: 0000-0003-2357-7796
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Cloud Computing and Resource Management
  • Advanced Data Storage Technologies
  • Parallel Computing and Optimization Techniques
  • Distributed and Parallel Computing Systems
  • Scientific Computing and Data Management
  • Software System Performance and Reliability
  • Advanced Database Systems and Queries
  • Distributed systems and fault tolerance
  • Algorithms and Data Compression
  • Data Quality and Management
  • Data Management and Algorithms
  • Ruminant Nutrition and Digestive Physiology
  • Advanced Optical Network Technologies
  • Digital Transformation in Industry
  • Big Data and Business Intelligence
  • IoT and Edge/Fog Computing
  • Data Stream Mining Techniques
  • Agroforestry and silvopastoral systems
  • Software-Defined Networks and 5G
  • Interconnection Networks and Systems
  • Agriculture and Rural Development Research
  • Data Mining Algorithms and Applications
  • RNA and protein synthesis mechanisms
  • DNA and Nucleic Acid Chemistry
  • Time Series Analysis and Forecasting

National University of Cajamarca
2024

Universitat Politècnica de Catalunya
2009-2022

Barcelona Supercomputing Center
2009-2022

MapReduce is a data-driven programming model proposed by Google in 2004 which especially well suited for distributed data analytics applications. We consider the management of applications an environment where multiple share same physical resources. Such sharing line with recent trends center aim to consolidate workloads order achieve cost and energy savings. In shared environment, it necessary predict manage performance given set goals defined them. this paper, we address problem...

10.1109/noms.2010.5488494 article EN 2010-01-01

The use of the Python programming language for scientific computing has been gaining momentum in last years. fact that it is compact and readable its complete set libraries are two important characteristics favour adoption. Nevertheless, still lacks a solution easily parallelizing generic scripts on distributed infrastructures, since current alternatives mostly require APIs message passing or restricted to embarrassingly parallel computations. In sense, this paper presents PyCOMPSs,...

10.1177/1094342015594678 article EN The International Journal of High Performance Computing Applications 2015-08-21

Applications running inside data centers are enabled through the cooperation of thousands servers arranged in racks and interconnected together center network. Current DCN architectures based on electronic devices neither scalable to face massive growth DCs, nor flexible enough efficiently cost-effectively support highly dynamic application traffic profiles. The FP7 European Project LIGHTNESS foresees extending capabilities today's electrical DCNs introduction optical packet switching...

10.1109/mnet.2013.6678922 article EN IEEE Network 2013-11-01

Molecular dynamics simulation (MD) is, just behind genomics, the bioinformatics tool that generates largest amounts of data, and is using amount CPU time in supercomputing centres. MD trajectories are obtained after months calculations, analysed situ, practice forgotten. Several projects to generate stable trajectory databases have been developed for proteins, but no equivalence exists nucleic acids world. We present here a novel database system store analyses acids. The initial data set...

10.1093/nar/gkv1301 article EN cc-by-nc Nucleic Acids Research 2015-11-26

Next generation data centers will be composed of thousands hybrid systems in an attempt to increase overall cluster performance and minimize energy consumption. New programming models, such as MapReduce, specifically designed make the most very large infrastructures leveraged develop massively distributed services. At same time, bring unprecedented degree workload consolidation, hosting infrastructure services from many different users. In this paper we present our advancements leveraging...

10.1109/icpp.2010.73 article EN 2010-09-01

This paper presents a scheduling technique for multi-job MapReduce workloads that is able to dynamically build performance models of the executing workloads, and then use these purposes. ability leveraged adaptively manage workload while observing taking advantage particulars execution environment modern data analytics applications, such as hardware heterogeneity distributed storage. The targets highly dynamic in which new jobs can be submitted at any time, share physical resources with...

10.1109/tnsm.2012.122112.110163 article EN IEEE Transactions on Network and Service Management 2013-01-09

Virtualized infrastructure providers demand new methods to increase the accuracy of accounting models used charge their customers. Future data centers will be composed many-core systems that host a large number virtual machines (VMs) each. While resource utilization can achieved with existing system tools, energy is complex task when per-VM granularity goal. In this paper, we propose methodology brings opportunities by adding an unprecedented degree on measurements. We present -which...

10.1109/grid.2010.5697889 article EN 2010-10-01

In this paper we present a framework to enable data-intensive Spark workloads on MareNostrum, petascale supercomputer designed mainly for compute-intensive applications. As far as know, is the first attempt investigate optimized deployment configurations of HPC setup. We detail design and some benchmark data provide insights into scalabilityof system. examine impact different including parallelism, storage networking alternatives, discuss several aspects in executing Big Data computing...

10.1109/bigdata.2015.7363768 article EN 2021 IEEE International Conference on Big Data (Big Data) 2015-10-01

This article presents the ALOJA project, an initiative to produce mechanisms for automated characterization of cost-effectiveness Hadoop deployments and reports its initial results. is latest phase a long-term collaborative engagement between BSC Microsoft which, over past 6 years has explored range different aspects computing systems, software technologies performance profiling. While during last 5 years, become de-facto platform Big Data deployments, still little understood how layers...

10.1109/bigdata.2014.7004322 article EN 2021 IEEE International Conference on Big Data (Big Data) 2014-10-01

Current over-provisioned and multi-tier data centre networks (DCN) deploy rigid control management platforms, which are not able to accommodate the ever-growing workload driven by increasing demand of high-performance (DC) cloud applications. In response this, EC FP7 project LIGHTNESS (Low Latency High Throughput Dynamic Network Infrastructures for Performance Datacentre Interconnects) is proposing a new flattened optical DCN architecture capable providing dynamic, programmable, highly...

10.1109/eucnc.2014.6882622 article EN 2014-06-01

In an attempt to increase the performance/cost ratio, large compute clusters are becoming heterogeneous at multiple levels: from asymmetric processors, different system architectures, operating systems and networks. Exploiting intrinsic multi-level parallelism present in such a complex execution environment has become challenging task using traditional parallel distributed programming models. As result, increasing need for novel approaches exploiting arisen these environments. MapReduce is...

10.1109/icpp.2009.59 article EN International Conference on Parallel Processing 2009-09-01

The performance of memory-intensive applications tends to be poor due the high overhead added by swapping mechanism. same problem may found in highly-loaded multi-programming systems where all running have use swap space order able execute at time. In this paper, we present a solution these problems. idea consists compressing swapped pages and keeping them cache whenever possible. compressed was proposed few years ago, but it did not achieve expected results hardware limitations. As...

10.1002/(sici)1097-024x(20000425)30:5<567::aid-spe312>3.3.co;2-q article EN Software Practice and Experience 2000-04-25

Data center management is driven by high-level performance goals, and it the responsibility of a middleware to ensure that those goals are met using dynamic resource allocation. The delivered heterogeneous setoff applications running in virtualized enterprise data must be predicted make allocation decisions. For some these applications, required produce accurate profiles based on previous executions: thecae batch jobs.In this paper we propose methodology consumption for inside virtual...

10.1109/pdp.2009.55 article EN 2009-01-01

In this paper we present a MapReduce task scheduler for shared environments in which is executed along with other resource-consuming workloads, such as transactional applications. All workloads may potentially share the same data store, some of them consuming analytics purposes while others acting generators. This kind scenario becoming increasingly important centers where improved resource utilization can be achieved through workload consolidation, and specially challenging due to...

10.1109/ccgrid.2014.65 article EN 2014-05-01

Non-relational databases have recently been the preferred choice when it comes to dealing with BigData challenges, but their performance is very sensitive chosen data organisations. We seen differences of over 70 times in response time for same query on different models. This brings users need be fully conscious queries they intend serve order design model. The common practice then, replicate into models designed fit requirements. In this scenario, user charge code implementation required...

10.1016/j.procs.2015.05.441 article EN Procedia Computer Science 2015-01-01

The proliferation of Big Data applications puts pressure on improving and optimizing the handling diverse datasets across different domains. Among several challenges, major difficulties arise in data-sensitive domains like banking, telecommunications, etc., where strict regulations make very difficult to upload experiment with real data external cloud resources. In addition, most research development efforts aim address needs IT experts, while analytics tools remain unavailable non-expert...

10.1109/services.2019.00120 article EN 2019-07-01

Progress in science is deeply bound to the effective use of high-performance computing infrastructures and efficient extraction knowledge from vast amounts data. Such data comes different sources that follow a cycle composed pre-processing steps for curation preparation subsequent steps, later analysis analytics applied results. However, scientific workflows are currently fragmented multiple components, with processes management, gaps viewpoints user profiles involved. Our vision future...

10.1109/icdcs.2019.00171 preprint EN 2019-07-01

In response to the requirements of applications that work with large amounts data, various NoSQL databases have appeared deal specifically these challenges. These systems become popular in environments such as data analytics and OLTP, however are not only data-intensive can benefit from databases. life sciences domain, there many still use flat files a medium store they see themselves very limited terms scalability performance, well code complexity. We present an analysis on viability using...

10.1109/pdp.2015.43 article EN 2015-03-01

The shift to more parallel and distributed computer architectures changed how data is managed consequently giving birth a new generation of software products, namely NoSQL. These products offer scalable reliable solution for "Big Data", but none them solves the problem analyzing visualizing multidimensional data. There are solutions scaling analytic workloads, creating databases indexing data, there no single that points all three goals together.

10.1145/2833312.2833314 article EN 2016-01-04

Non-relational databases arise as a solution to solve the scalability problems of relational when dealing with big data applications. However, they are highly configurable prone user decisions that can heavily affect their performance. In order maximize performance, different models and queries should be analyzed choose best fit. This may involve wide range tests result in productivity issues. We present Aeneas, tool support design management code for applications using non-relational...

10.1016/j.procs.2013.05.441 article EN Procedia Computer Science 2013-01-01
Coming Soon ...