- Particle physics theoretical and experimental studies
- High-Energy Particle Collisions Research
- Particle Detector Development and Performance
- Quantum Chromodynamics and Particle Interactions
- Dark Matter and Cosmic Phenomena
- Computational Physics and Python Applications
- Cosmology and Gravitation Theories
- Neutrino Physics Research
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Scientific Computing and Data Management
- Big Data Technologies and Applications
- Radiation Detection and Scintillator Technologies
- Medical Imaging Techniques and Applications
- Parallel Computing and Optimization Techniques
- Astrophysics and Cosmic Phenomena
- Black Holes and Theoretical Physics
- Distributed systems and fault tolerance
- Atomic and Subatomic Physics Research
- Particle Accelerators and Free-Electron Lasers
- Superconducting Materials and Applications
- Structural Analysis of Composite Materials
- Digital Radiography and Breast Imaging
- Caching and Content Delivery
- Cloud Computing and Remote Desktop Technologies
European Organization for Nuclear Research
2016-2025
Government of Catalonia
2024
A. Alikhanyan National Laboratory
2024
Institute of High Energy Physics
2024
SR Research (Canada)
2024
Federación Española de Enfermedades Raras
2024
Atlas Scientific (United States)
2024
The University of Adelaide
2016-2023
Max Planck Institute for Physics
2019-2023
Brandeis University
2019-2020
Particle physics has an ambitious and broad experimental programme for the coming decades. This requires large investments in detector hardware, either to build new facilities experiments, or upgrade existing ones. Similarly, it commensurate investment R&D of software acquire, manage, process, analyse shear amounts data be recorded. In planning HL-LHC particular, is critical that all collaborating stakeholders agree on goals priorities, efforts complement each other. this spirit, white paper...
Rucio is an open-source software framework that provides scientific collaborations with the functionality to organize, manage, and access their data at scale. The can be distributed across heterogeneous centers widely locations. was originally developed meet requirements of high-energy physics experiment ATLAS, now continuously extended support LHC experiments other diverse communities. In this article, we detail fundamental concepts Rucio, describe architecture along implementation details,...
Rucio is the next-generation Distributed Data Management (DDM) system benefiting from recent advances in cloud and "Big Data" computing to address HEP experiments scaling requirements. an evolution of ATLAS DDM Don Quijote 2 (DQ2), which has demonstrated very large scale data management capabilities with more than 140 petabytes spread worldwide across 130 sites, accesses 1,000 active users. However, DQ2 reaching its limits terms scalability, requiring a number support staff operate being...
Power consumption has become a critical issue in large scale clusters.Existing solutions for addressing the servers' energy suggest "shrinking" set of active machines, at least until more power-proportional hardware devices available.This paper demonstrates that leveraging sleeping state, however, may lead to unacceptably poor performance and low data availability if distributed services are not aware power management's actions.Therefore, we present an architecture cluster which deployed...
ATLAS has recorded more than 8 petabyte(PB) of RAW data since the LHC started running at end 2009. Many derived products and complimentary simulation have also been produced by collaboration and, in total, 90PB are currently stored Worldwide Computing Grid ATLAS. All these managed Distributed Data Management system, called Don Quijote 2 (DQ2). DQ2 evolved rapidly to help operations manage large quantities across many grid sites which runs, physicists get access data.
Rucio is a software framework designed to facilitate scientific collaborations in efficiently organising, managing, and accessing extensive volumes of data through customizable policies. The enables distribution across globally distributed locations heterogeneous centres, integrating various storage network technologies into unified federated entity. offers advanced features like recovery adaptive replication, it exhibits high scalability, modularity, extensibility. Originally developed meet...
The ATLAS experiment at CERN’s LHC stores detector and simulation data in raw derived formats across more than 150 Grid sites world-wide, currently total about 200PB on disk 250PB tape. Data have different access characteristics due to various computational workflows, can be accessed from media, such as remote I/O, cache hard drives or SSDs. Also, larger centers provide the majority of offline storage capability via tape systems. For HighLuminosity (HL-LHC), estimated requirements are...
Rucio is the next-generation of Distributed Data Management (DDM) system benefiting from recent advances in cloud and "Big Data" computing to address HEP experiments scaling requirements. an evolution ATLAS DDM Don Quixote 2 (DQ2), which has demonstrated very large scale data management capabilities with more than 160 petabytes spread worldwide across 130 sites, accesses 1,000 active users. However, DQ2 reaching its limits terms scalability, requiring a number support staff operate being...
The ATLAS experiment's data management system is constantly tracing file movement operations that occur on the Worldwide LHC Computing Grid (WLCG). Due to large scale of WLCG, statistical analysis traces infeasible in real-time. Factors contribute scalability problems include capability for users initiate on-demand queries, high dimensionality tracer entries combined with very low cardinality parameters, and size namespace. These issues are alleviated through adoption an incremental model...
The ATLAS Distributed Data Management system stores more than 150PB of physics data across 120 sites globally. To cope with the anticipated workload coming decade, Rucio, next-generation management has been developed. Replica management, as one key aspects system, to satisfy critical performance requirements in order keep pace experiment's high rate continual generation. challenge lies meeting these objectives while still giving users and applications a powerful toolkit control their...
With this contribution we present some recent developments made to Rucio, the data management system of High-Energy Physics Experiment ATLAS. Already managing 300 Petabytes both official and user data, Rucio has seen incremental improvements throughout LHC Run-2, is currently laying groundwork for HEP computing in HL-LHC era. The focus are (a) automations that have been put place such as rebalancing or dynamic replication well their supporting infrastructures real-time networking metrics...
This paper describes a popularity prediction tool for data-intensive data management systems, such as ATLAS distributed (DDM). It is fed by the DDM system, which produces historical reports about usage, providing information files, datasets, users and sites where was accessed. The described in this contribution uses to make future of data. finds trends usage using set neural networks input parameters predicts number accesses near term future. can then be used second step improve distribution...
Performance evaluations of large-scale systems require the use representative workloads with certifiable similar or dissimilar characteristics. To quantify similarity characteristics, we describe a novel measure comprising two efficient methods that are suitable for workloads. One method uses discrete wavelet transform to assess periodic time and frequency characteristics in workload. The second evaluates dependencies descriptive attributes via association rule learning. Both evaluated find...
Transparent use of commercial cloud resources for scientific experiments is a hard problem. In this article, we describe the first steps Data Ocean R&D collaboration between high-energy physics experiment ATLAS together with Google Cloud Platform, to allow seamless Compute Engine and Storage analysis. We start by describing three preliminary cases that were identified at beginning project. The following sections then detail work done in data management system Rucio workflow systems PanDA...
This paper describes a monitoring framework for large scale data management systems with frequent access. allows to generate meaningful information from collected tracing and be queried on demand specific user usage patterns in respect source destination locations, period intervals, other searchable parameters. The feasibility of such system at the petabyte is demonstrated by describing implementation operational experience real world ATLAS experiment employing proposed framework. Our...
The ATLAS Distributed Data Management (DDM) system has evolved drastically in the last two years with Rucio software fully replacing previous before start of LHC Run-2. DDM manages now more than 250 petabytes spread on 130 storage sites and can handle file transfer rates up to 30Hz. In this paper, we discuss our experience acquired developing, commissioning, running maintaining such a large system. First, describe general architecture system, integration external services like WLCG File...
This contribution details the deployment of Rucio, ATLAS Distributed Data Management system. The main complication is that Rucio interacts with a wide variety external services, and connects globally distributed data centres under different technological administrative control, at an unprecedented volume. It therefore not possible to create duplicate instance for testing or integration. Every software upgrade configuration change thus potentially disruptive requires fail-safe automatic error...
For many scientific projects, data management is an increasingly complicated challenge. The number of data-intensive instruments generating unprecedented volumes growing and their accompanying workflows are becoming more complex. Their storage computing resources heterogeneous distributed at numerous geographical locations belonging to different administrative domains organisations. These do not necessarily coincide with the places where produced nor stored, analysed by researchers, or...
ATLAS has recorded almost 5PB of RAW data since the LHC started running at end 2009. Many more derived products and complimentary simulation have also been produced by collaboration and, in total, 70PB is currently stored Worldwide Computing Grid ATLAS. All this managed Distributed Data Management system, called Don Quixote 2 (DQ2). DQ2 evolved rapidly to help operations manage these large quantities across many grid sites which runs physicists get access data. In paper we describe new...
Data grids are used in large scale scientific experiments to access and store nontrivial amounts of data by combining the storage resources from multiple centers one system. This enables users automated services use a common efficient way. However, as grow it becomes hard problem for developers operators estimate how modifications policy, hardware, software affect performance metrics grid. In this paper we address modeling operational grids. We first analyze grid middleware system ATLAS...
Rucio is the successor of current Don Quijote 2 (DQ2) system for distributed data management (DDM) ATLAS experiment. The reasons replacing DQ2 are manifold, but besides high maintenance costs and architectural limitations, scalability concerns on top list. Current expectations that amount will be three to four times as it today by end 2014. Further availability more powerful computing resources pushing additional pressure DDM increases demands provisioning. Although capable handling...
To prepare the migration to new ATLAS Data Management system called Rucio, a renaming campaign of all physical files produced by is needed.It represents around 300 million split between ∼120 sites with 6 different storage technologies.It must be done in transparent way order not disrupt ongoing computing activities.An infrastructure perform this has been developed and presented paper as well its performance.