M-S. Barisits

ORCID: 0000-0003-0253-106X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Particle physics theoretical and experimental studies
  • High-Energy Particle Collisions Research
  • Particle Detector Development and Performance
  • Quantum Chromodynamics and Particle Interactions
  • Dark Matter and Cosmic Phenomena
  • Computational Physics and Python Applications
  • Cosmology and Gravitation Theories
  • Neutrino Physics Research
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Scientific Computing and Data Management
  • Big Data Technologies and Applications
  • Radiation Detection and Scintillator Technologies
  • Medical Imaging Techniques and Applications
  • Parallel Computing and Optimization Techniques
  • Astrophysics and Cosmic Phenomena
  • Black Holes and Theoretical Physics
  • Distributed systems and fault tolerance
  • Atomic and Subatomic Physics Research
  • Particle Accelerators and Free-Electron Lasers
  • Superconducting Materials and Applications
  • Structural Analysis of Composite Materials
  • Digital Radiography and Breast Imaging
  • Caching and Content Delivery
  • Cloud Computing and Remote Desktop Technologies

European Organization for Nuclear Research
2016-2025

Government of Catalonia
2024

A. Alikhanyan National Laboratory
2024

Institute of High Energy Physics
2024

SR Research (Canada)
2024

Federación Española de Enfermedades Raras
2024

Atlas Scientific (United States)
2024

The University of Adelaide
2016-2023

Max Planck Institute for Physics
2019-2023

Brandeis University
2019-2020

Particle physics has an ambitious and broad experimental programme for the coming decades. This requires large investments in detector hardware, either to build new facilities experiments, or upgrade existing ones. Similarly, it commensurate investment R&D of software acquire, manage, process, analyse shear amounts data be recorded. In planning HL-LHC particular, is critical that all collaborating stakeholders agree on goals priorities, efforts complement each other. this spirit, white paper...

10.1007/s41781-018-0018-8 article EN cc-by Computing and Software for Big Science 2019-03-20

Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The can be distributed across heterogeneous centers widely locations. was originally developed meet requirements of high-energy physics experiment ATLAS, now continuously extended support LHC experiments other diverse communities. In this article, we detail fundamental concepts Rucio, describe architecture along implementation details,...

10.1007/s41781-019-0026-3 article EN cc-by Computing and Software for Big Science 2019-08-09

Rucio is the next-generation Distributed Data Management (DDM) system benefiting from recent advances in cloud and "Big Data" computing to address HEP experiments scaling requirements. an evolution of ATLAS DDM Don Quijote 2 (DQ2), which has demonstrated very large scale data management capabilities with more than 140 petabytes spread worldwide across 130 sites, accesses 1,000 active users. However, DQ2 reaching its limits terms scalability, requiring a number support staff operate being...

10.1088/1742-6596/513/4/042021 article EN Journal of Physics Conference Series 2014-06-11

Power consumption has become a critical issue in large scale clusters.Existing solutions for addressing the servers' energy suggest "shrinking" set of active machines, at least until more power-proportional hardware devices available.This paper demonstrates that leveraging sleeping state, however, may lead to unacceptably poor performance and low data availability if distributed services are not aware power management's actions.Therefore, we present an architecture cluster which deployed...

10.1145/1555271.1555281 article EN 2009-06-19

ATLAS has recorded more than 8 petabyte(PB) of RAW data since the LHC started running at end 2009. Many derived products and complimentary simulation have also been produced by collaboration and, in total, 90PB are currently stored Worldwide Computing Grid ATLAS. All these managed Distributed Data Management system, called Don Quijote 2 (DQ2). DQ2 evolved rapidly to help operations manage large quantities across many grid sites which runs, physicists get access data.

10.1088/1742-6596/396/3/032045 article EN Journal of Physics Conference Series 2012-12-13

Rucio is a software framework designed to facilitate scientific collaborations in efficiently organising, managing, and accessing extensive volumes of data through customizable policies. The enables distribution across globally distributed locations heterogeneous centres, integrating various storage network technologies into unified federated entity. offers advanced features like recovery adaptive replication, it exhibits high scalability, modularity, extensibility. Originally developed meet...

10.1051/epjconf/202429501030 article EN cc-by EPJ Web of Conferences 2024-01-01

The ATLAS experiment at CERN’s LHC stores detector and simulation data in raw derived formats across more than 150 Grid sites world-wide, currently total about 200PB on disk 250PB tape. Data have different access characteristics due to various computational workflows, can be accessed from media, such as remote I/O, cache hard drives or SSDs. Also, larger centers provide the majority of offline storage capability via tape systems. For HighLuminosity (HL-LHC), estimated requirements are...

10.1051/epjconf/202024504035 article EN cc-by EPJ Web of Conferences 2020-01-01

Rucio is the next-generation of Distributed Data Management (DDM) system benefiting from recent advances in cloud and "Big Data" computing to address HEP experiments scaling requirements. an evolution ATLAS DDM Don Quixote 2 (DQ2), which has demonstrated very large scale data management capabilities with more than 160 petabytes spread worldwide across 130 sites, accesses 1,000 active users. However, DQ2 reaching its limits terms scalability, requiring a number support staff operate being...

10.1016/j.nuclphysbps.2015.09.151 article EN cc-by Nuclear and Particle Physics Proceedings 2016-04-01

The ATLAS experiment's data management system is constantly tracing file movement operations that occur on the Worldwide LHC Computing Grid (WLCG). Due to large scale of WLCG, statistical analysis traces infeasible in real-time. Factors contribute scalability problems include capability for users initiate on-demand queries, high dimensionality tracer entries combined with very low cardinality parameters, and size namespace. These issues are alleviated through adoption an incremental model...

10.1088/1742-6596/331/6/062018 article EN Journal of Physics Conference Series 2011-12-23

The ATLAS Distributed Data Management system stores more than 150PB of physics data across 120 sites globally. To cope with the anticipated workload coming decade, Rucio, next-generation management has been developed. Replica management, as one key aspects system, to satisfy critical performance requirements in order keep pace experiment's high rate continual generation. challenge lies meeting these objectives while still giving users and applications a powerful toolkit control their...

10.1088/1742-6596/513/4/042003 article EN Journal of Physics Conference Series 2014-06-11

With this contribution we present some recent developments made to Rucio, the data management system of High-Energy Physics Experiment ATLAS. Already managing 300 Petabytes both official and user data, Rucio has seen incremental improvements throughout LHC Run-2, is currently laying groundwork for HEP computing in HL-LHC era. The focus are (a) automations that have been put place such as rebalancing or dynamic replication well their supporting infrastructures real-time networking metrics...

10.1088/1742-6596/1085/3/032030 article EN Journal of Physics Conference Series 2018-09-01

This paper describes a popularity prediction tool for data-intensive data management systems, such as ATLAS distributed (DDM). It is fed by the DDM system, which produces historical reports about usage, providing information files, datasets, users and sites where was accessed. The described in this contribution uses to make future of data. finds trends usage using set neural networks input parameters predicts number accesses near term future. can then be used second step improve distribution...

10.1088/1742-6596/513/4/042004 article EN Journal of Physics Conference Series 2014-06-11

Performance evaluations of large-scale systems require the use representative workloads with certifiable similar or dissimilar characteristics. To quantify similarity characteristics, we describe a novel measure comprising two efficient methods that are suitable for workloads. One method uses discrete wavelet transform to assess periodic time and frequency characteristics in workload. The second evaluates dependencies descriptive attributes via association rule learning. Both evaluated find...

10.1145/2063384.2063441 article EN 2011-11-08

Transparent use of commercial cloud resources for scientific experiments is a hard problem. In this article, we describe the first steps Data Ocean R&D collaboration between high-energy physics experiment ATLAS together with Google Cloud Platform, to allow seamless Compute Engine and Storage analysis. We start by describing three preliminary cases that were identified at beginning project. The following sections then detail work done in data management system Rucio workflow systems PanDA...

10.1051/epjconf/201921404020 article EN cc-by EPJ Web of Conferences 2019-01-01

This paper describes a monitoring framework for large scale data management systems with frequent access. allows to generate meaningful information from collected tracing and be queried on demand specific user usage patterns in respect source destination locations, period intervals, other searchable parameters. The feasibility of such system at the petabyte is demonstrated by describing implementation operational experience real world ATLAS experiment employing proposed framework. Our...

10.1088/1742-6596/396/5/052055 article EN Journal of Physics Conference Series 2012-12-13

The ATLAS Distributed Data Management (DDM) system has evolved drastically in the last two years with Rucio software fully replacing previous before start of LHC Run-2. DDM manages now more than 250 petabytes spread on 130 storage sites and can handle file transfer rates up to 30Hz. In this paper, we discuss our experience acquired developing, commissioning, running maintaining such a large system. First, describe general architecture system, integration external services like WLCG File...

10.1088/1742-6596/898/6/062019 article EN Journal of Physics Conference Series 2017-10-01

This contribution details the deployment of Rucio, ATLAS Distributed Data Management system. The main complication is that Rucio interacts with a wide variety external services, and connects globally distributed data centres under different technological administrative control, at an unprecedented volume. It therefore not possible to create duplicate instance for testing or integration. Every software upgrade configuration change thus potentially disruptive requires fail-safe automatic error...

10.1088/1742-6596/664/6/062027 article EN Journal of Physics Conference Series 2015-12-23

For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes growing and their accompanying workflows are becoming more complex. Their storage computing resources heterogeneous distributed at numerous geographical locations belonging to different administrative domains organisations. These do not necessarily coincide with the places where produced nor stored, analysed by researchers, or...

10.1051/epjconf/202024511006 article EN cc-by EPJ Web of Conferences 2020-01-01

ATLAS has recorded almost 5PB of RAW data since the LHC started running at end 2009. Many more derived products and complimentary simulation have also been produced by collaboration and, in total, 70PB is currently stored Worldwide Computing Grid ATLAS. All this managed Distributed Data Management system, called Don Quixote 2 (DQ2). DQ2 evolved rapidly to help operations manage these large quantities across many grid sites which runs physicists get access data. In paper we describe new...

10.1088/1742-6596/368/1/012005 article EN Journal of Physics Conference Series 2012-06-21

Data grids are used in large scale scientific experiments to access and store nontrivial amounts of data by combining the storage resources from multiple centers one system. This enables users automated services use a common efficient way. However, as grow it becomes hard problem for developers operators estimate how modifications policy, hardware, software affect performance metrics grid. In this paper we address modeling operational grids. We first analyze grid middleware system ATLAS...

10.1109/ccgrid.2016.36 article EN 2016-05-01

Rucio is the successor of current Don Quijote 2 (DQ2) system for distributed data management (DDM) ATLAS experiment. The reasons replacing DQ2 are manifold, but besides high maintenance costs and architectural limitations, scalability concerns on top list. Current expectations that amount will be three to four times as it today by end 2014. Further availability more powerful computing resources pushing additional pressure DDM increases demands provisioning. Although capable handling...

10.1088/1742-6596/513/4/042048 article EN Journal of Physics Conference Series 2014-06-11

To prepare the migration to new ATLAS Data Management system called Rucio, a renaming campaign of all physical files produced by is needed.It represents around 300 million split between ∼120 sites with 6 different storage technologies.It must be done in transparent way order not disrupt ongoing computing activities.An infrastructure perform this has been developed and presented paper as well its performance.

10.1088/1742-6596/513/4/042008 article EN Journal of Physics Conference Series 2014-06-11
Coming Soon ...