Maria Grigorieva

ORCID: 0000-0002-8851-2187
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Distributed and Parallel Computing Systems
  • Scientific Computing and Data Management
  • Advanced Data Storage Technologies
  • Software System Performance and Reliability
  • Big Data Technologies and Applications
  • Cloud Computing and Resource Management
  • Peer-to-Peer Network Technologies
  • Research Data Management Practices
  • Advanced Database Systems and Queries
  • Data Visualization and Analytics
  • Particle physics theoretical and experimental studies
  • Particle Detector Development and Performance
  • Complex Network Analysis Techniques
  • Software Testing and Debugging Techniques
  • Video Analysis and Summarization
  • Ionosphere and magnetosphere dynamics
  • Network Traffic and Congestion Control
  • Computational and Text Analysis Methods
  • Linguistics and Cultural Studies
  • Service-Oriented Architecture and Web Services
  • Data Stream Mining Techniques
  • Atmospheric Ozone and Climate
  • Caching and Content Delivery
  • Solar and Space Plasma Dynamics
  • Advanced Text Analysis Techniques

European Organization for Nuclear Research
2025

Lomonosov Moscow State University
2019-2023

Gorky Institute of World Literature
2023

Institute of Mathematical Problems of Biology
2020-2021

Plekhanov Russian University of Economics
2017-2021

Moscow Center For Continuous Mathematical Education
2020-2021

Moscow State University
2021

Kurchatov Institute
2015-2018

Tomsk Polytechnic University
2017-2018

National Research Tomsk State University
2018

Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at LHC: more than 160 centers all over world execute tens millions jobs per day. ATLAS — largest experiment LHC creates an enormous flow data which has to be recorded analyzed by a complex heterogeneous environment. Statistically, about 10–12% end with failure: network faults, service failures, authorization other error conditions trigger messages provide detailed information...

10.1142/s0217751x21500706 article EN International Journal of Modern Physics A 2021-03-04

The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over world and processed continuously by various central production user analysis tasks. popularity typically measured as number accesses plays an important role in resolving management issues: deleting, replicating, moving between tapes, disks caches. These procedures were still carried out a semi-manual mode now we have focused our efforts on automating it, making use historical...

10.1051/epjconf/202125102013 article EN cc-by EPJ Web of Conferences 2021-01-01

In this contribution we discuss the various aspects of computing resource needs experiments in High Energy and Nuclear Physics, particular at Large Hadron Collider. This will evolve future when moving from LHC to HL-LHC ten years now, already exascale levels data are processing could increase by a further order magnitude. The distributed environment has been great success inclusion new super-computing facilities, cloud volunteering for is big challenge, which successfully mastering with...

10.1088/1748-0221/12/06/c06044 article EN Journal of Instrumentation 2017-06-29

In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed resources. The current systems have proven to be mature capable meeting experiment goals, by allowing timely delivery results. However, substantial amount interventions from software developers, shifters operational teams is needed efficiently manage such heterogeneous infrastructures. A wealth data can...

10.1051/epjconf/202024503017 article EN cc-by EPJ Web of Conferences 2020-01-01

The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale complexity of LHC distributed computing for ATLAS experiment. currently distributes jobs among more than 100,000 cores at well over 120 Grid sites, supercomputing centers, commercial academic clouds. physicists submit 1.5 M data processing, simulation analysis per day, keeps all meta-information about job submissions execution events in Oracle RDBMS. above information is used...

10.1016/j.procs.2015.11.051 article EN Procedia Computer Science 2015-01-01

In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe and workflows. These are used to obtain information about current system state for statistical trend analysis processes these systems drive. Over time amount stored can grow dramatically. this article we present our studies demonstrate how storage scalability performance be improved by using hybrid RDBMS/NoSQL architecture.

10.1088/1742-6596/664/4/042023 article EN Journal of Physics Conference Series 2015-12-23

Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle experiment is managed by specialized software like Distributed Data Management Workload Systems. In order to be interpreted mined, experimental must accompanied auxiliary metadata, which recorded at each processing step. Metadata describes represent objects or results experiments, allowing them shared various applications,...

10.1088/1742-6596/762/1/012017 article EN Journal of Physics Conference Series 2016-10-01

One of the most significant and rapidly developing fields data analysis is information flow management.In course targeted stochastic dissemination patterns are studied.The solving such problems daunting due to global growth amount its availability for a wide range users.The paper presents study in open networks on example COVID-19.The was conducted with use web scraping, methods linguistic visual analytics.As sources variety were used, as largest world Russian services, social instant...

10.26583/sv.13.4.11 article EN Scientific Visualization 2021-01-01

Contemporary scientific experiments produce significant amount of data as well publications based on this data. Since volumes both are constantly increasing, it becomes more and problematic to establish a connection between given paper the underlying However, such an association is one crucial pieces information for performing various tasks, validating results presented in paper, comparing different approaches deal with problem or even simply understanding situation some area science....

10.1088/1742-6596/1085/3/032013 article EN Journal of Physics Conference Series 2018-09-01

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, Operational Intelligence project aims at increasing level of automation computing operations and reducing human interventions. The distributed systems currently deployed by experiments have proven to be mature capable meeting experimental goals, allowing timely delivery scientific results. However, substantial number interventions software developers, shifters, operational teams is needed efficiently...

10.3389/fdata.2021.753409 article EN cc-by Frontiers in Big Data 2022-01-07

The amount of scientific data generated by the LHC experiments has hit exabyte scale. These are transferred, processed and analyzed in hundreds computing centers. popularity among individual physicists University groups become one key factors efficient management processing. It was actively used during Run 1 2 for central processing, allowed optimization placement policies to spread workload more evenly over existing resources. Besides provide storage resources physics analysis thousands...

10.1109/ivmem51402.2020.00010 article EN 2020-09-01

The experiments at the Large Hadron Collider (LHC) rely upon a complex distributed computing infrastructure (WLCG) consisting of hundreds individual sites worldwide universities and national laboratories, providing about half billion job slots an exabyte storage interconnected through high speed networks. Wide Area Networking (WAN) is one three pillars (together with computational resources storage) LHC computing. More than 5 PB/day are transferred between WLCG sites. Monitoring crucial...

10.1142/s0217751x21300052 article EN International Journal of Modern Physics A 2021-02-02

The framework for clustering of error messages, ClusterLogs, was developed as a flexible and modular tool the needs large-scale distributed computing infrastructures. Various types failures are being constantly registered during execution millions operations daily. Monitoring systems faced with challenging task analysis considerable amount multi-sourced messages. It is critical to present information about errors human experts in way that makes them able analyze it. ClusterLogs pipeline...

10.1016/j.procs.2021.06.037 article EN Procedia Computer Science 2021-01-01

Modern scientific experiments involve the producing of huge volumes data that requires new approaches in processing and storage. These themselves, as well their storage, are accompanied by a valuable amount additional information, called metadata, distributed over multiple informational systems repositories, having complicated, heterogeneous structure. Gathering these metadata for field high energy nuclear physics (HENP) is complex issue, requiring quest solutions outside box. One tasks to...

10.1088/1742-6596/1015/3/032055 article EN Journal of Physics Conference Series 2018-05-01

The Interactive Visual Explorer (InVEx) application is designed as a visual analytics tool for Big Data analysis. an integral approach to data analysis, combining methods of intellectual analysis with advanced interactive visualization. One the main objectives InVExis process large samples by decreasing their level detail (LoD).The proposed includes clustering well flexible grouping different parameters, providing exploration from lowest highest details. results and clusterization...

10.1051/epjconf/202022603011 article EN cc-by EPJ Web of Conferences 2020-01-01

Modern large-scale distributed computing systems, processing large volumes of data, require mature monitoring systems able to control and track in re-sources, networks, tasks, queues other components. In recent years, the ELK stack has become very popular for environment, largely due efficiency flexibility Elastic Search storage wide variety Kibana visualization tools. The analysis infrastructure metadata often requires visual exploration multiple parameters simultaneously on one graphical...

10.51130/graphicon-2020-2-3-10 article EN cc-by Proceedings of the 30th International Conference on Computer Graphics and Machine Vision (GraphiCon 2020) Part 1 2020-12-17

ClusterLogs is a framework for the automatic categorization of computing jobs and resources by error messages in distributed systems. Initially, it was developed high-energy physics experiments, but can be applied other areas. The first prototype limited to sequential execution did not allow processing large amount data an acceptable time. In next prototype, system significantly improved parallelization several preprocessing stages. this paper, we focus on DBSCAN algorithm, main method used...

10.1142/s0217751x2150247x article EN International Journal of Modern Physics A 2021-11-24

Having information such as an estimation of the processing time or possibility system outage (abnormal behaviour) helps to assist monitor performance and predict its next state. The current cyber-infrastructure ATLAS Production System presents computing conditions in which contention for resources among high-priority data analyses happens routinely, that might lead significant workload handling interruptions. lack behaviour analysis process (its duration) system's state itself provides...

10.1088/1742-6596/1085/3/032051 article EN Journal of Physics Conference Series 2018-09-01
Coming Soon ...