NFDI4DS | UHH-SEMS - Publication Details

Maria Grigorieva

ORCID: 0000-0002-8851-2187

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5069979785

Research Areas

Distributed and Parallel Computing Systems
Scientific Computing and Data Management
Advanced Data Storage Technologies
Software System Performance and Reliability
Big Data Technologies and Applications
Cloud Computing and Resource Management
Peer-to-Peer Network Technologies
Research Data Management Practices
Advanced Database Systems and Queries
Data Visualization and Analytics
Particle physics theoretical and experimental studies
Particle Detector Development and Performance
Complex Network Analysis Techniques
Software Testing and Debugging Techniques
Video Analysis and Summarization
Ionosphere and magnetosphere dynamics
Network Traffic and Congestion Control
Computational and Text Analysis Methods
Linguistics and Cultural Studies
Service-Oriented Architecture and Web Services
Data Stream Mining Techniques
Atmospheric Ozone and Climate
Caching and Content Delivery
Solar and Space Plasma Dynamics
Advanced Text Analysis Techniques

European Organization for Nuclear Research
2025

Lomonosov Moscow State University
2019-2023

Gorky Institute of World Literature
2023

Institute of Mathematical Problems of Biology
2020-2021

Plekhanov Russian University of Economics
2017-2021

Moscow Center For Continuous Mathematical Education
2020-2021

Moscow State University
2021

Kurchatov Institute
2015-2018

Tomsk Polytechnic University
2017-2018

National Research Tomsk State University
2018

Clustering error messages produced by distributed computing infrastructure during the processing of high energy physics data

OPENALEX - Publications

Maria Grigorieva Dmitry Grin

Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at LHC: more than 160 centers all over world execute tens millions jobs per day. ATLAS — largest experiment LHC creates an enormous flow data which has to be recorded analyzed by a complex heterogeneous environment. Statistically, about 10–12% end with failure: network faults, service failures, authorization other error conditions trigger messages provide detailed information...

10.1142/s0217751x21500706 article EN International Journal of Modern Physics A 2021-03-04

Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC

OPENALEX - Publications

T. A. Beermann Olga Chuchuk A. Di Girolamo Maria Grigorieva A. Klimentov and 4 more

The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over world and processed continuously by various central production user analysis tasks. popularity typically measured as number accesses plays an important role in resolving management issues: deleting, replicating, moving between tapes, disks caches. These procedures were still carried out a semi-manual mode now we have focused our efforts on automating it, making use historical...

10.1051/epjconf/202125102013 article EN cc-by EPJ Web of Conferences 2021-01-01

BigData and computing challenges in high energy and nuclear physics

OPENALEX - Publications

A. Klimentov Maria Grigorieva Andrey Kiryanov A. Zarochentsev

In this contribution we discuss the various aspects of computing resource needs experiments in High Energy and Nuclear Physics, particular at Large Hadron Collider. This will evolve future when moving from LHC to HL-LHC ten years now, already exascale levels data are processing could increase by a further order magnitude. The distributed environment has been great success inclusion new super-computing facilities, cloud volunteering for is big challenge, which successfully mastering with...

10.1088/1748-0221/12/06/c06044 article EN Journal of Instrumentation 2017-06-29

Operational Intelligence for Distributed Computing Systems for Exascale Science

OPENALEX - Publications

T. Dias Do Vale F. Legger Panos Paparrigopoulos Alexei Klimentov J. Schovancova and 14 more

In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed resources. The current systems have proven to be mature capable meeting experiment goals, by allowing timely delivery results. However, substantial amount interventions from software developers, shifters operational teams is needed efficiently manage such heterogeneous infrastructures. A wealth data can...

10.1051/epjconf/202024503017 article EN cc-by EPJ Web of Conferences 2020-01-01

PanDA Workload Management System Meta-data Segmentation

OPENALEX - Publications

Marina Golosova Maria Grigorieva A. Klimentov E. Ryabinkin

The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale complexity of LHC distributed computing for ATLAS experiment. currently distributes jobs among more than 100,000 cores at well over 120 Grid sites, supercomputing centers, commercial academic clouds. physicists submit 1.5 M data processing, simulation analysis per day, keeps all meta-information about job submissions execution events in Oracle RDBMS. above information is used...

10.1016/j.procs.2015.11.051 article EN Procedia Computer Science 2015-01-01

Studies of Big Data metadata segmentation between relational and non-relational databases

OPENALEX - Publications

Marina Golosova Maria Grigorieva A. Klimentov E. Ryabinkin Gancho Dimitrov and 1 more

In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe and workflows. These are used to obtain information about current system state for statistical trend analysis processes these systems drive. Over time amount stored can grow dramatically. this article we present our studies demonstrate how storage scalability performance be improved by using hybrid RDBMS/NoSQL architecture.

10.1088/1742-6596/664/4/042023 article EN Journal of Physics Conference Series 2015-12-23

Evaluating non-relational storage technology for HEP metadata and meta-data catalog

OPENALEX - Publications

Maria Grigorieva Marina Golosova M Y Gubin A. Klimentov V V Osipova and 1 more

Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle experiment is managed by specialized software like Distributed Data Management Workload Systems. In order to be interpreted mined, experimental must accompanied auxiliary metadata, which recorded at each processing step. Metadata describes represent objects or results experiments, allowing them shared various applications,...

10.1088/1742-6596/762/1/012017 article EN Journal of Physics Conference Series 2016-10-01

Methodology of Data Popularity Forecasting in High-Energy Physics Experiments on Unbalanced and Irregular Time-series Data

OPENALEX - Publications

Maria Grigorieva Н. Н. Попова D. A. Vartanov M. V. Shubin

10.1134/s1995080224603771 article EN Lobachevskii Journal of Mathematics 2024-07-01

Visual Analytics of Twitter and Social Media Dataflows: a Casestudy of COVID-19 Rumors

OPENALEX - Publications

Mikhail Ulizko Е.В. Антонов Maria Grigorieva Evheniy Tretyakov Rufina Tukumbetova and 1 more

One of the most significant and rapidly developing fields data analysis is information flow management.In course targeted stochastic dissemination patterns are studied.The solving such problems daunting due to global growth amount its availability for a wide range users.The paper presents study in open networks on example COVID-19.The was conducted with use web scraping, methods linguistic visual analytics.As sources variety were used, as largest world Russian services, social instant...

10.26583/sv.13.4.11 article EN Scientific Visualization 2021-01-01

Data Knowledge Base for HENP Scientific Collaborations

OPENALEX - Publications

Vasilii Aulov Marina Golosova Maria Grigorieva A. Klimentov S. Padolski and 1 more

Contemporary scientific experiments produce significant amount of data as well publications based on this data. Since volumes both are constantly increasing, it becomes more and problematic to establish a connection between given paper the underlying However, such an association is one crucial pieces information for performing various tasks, validating results presented in paper, comparing different approaches deal with problem or even simply understanding situation some area science....

10.1088/1742-6596/1085/3/032013 article EN Journal of Physics Conference Series 2018-09-01

Exploring Hierarchical Forecasting of Data Popularity in High-Energy Physics Experiments

OPENALEX - Publications

Maria Grigorieva Н. Н. Попова D. A. Vartanov M. V. Shubin

10.1134/s1995080223080206 article EN Lobachevskii Journal of Mathematics 2023-08-01

Preparing Distributed Computing Operations for the HL-LHC Era With Operational Intelligence

OPENALEX - Publications

T. Dias Do Vale F. Legger Panos Paparrigopoulos J. Schovancova T. A. Beermann and 21 more

As a joint effort from various communities involved in the Worldwide LHC Computing Grid, Operational Intelligence project aims at increasing level of automation computing operations and reducing human interventions. The distributed systems currently deployed by experiments have proven to be mature capable meeting experimental goals, allowing timely delivery scientific results. However, substantial number interventions software developers, shifters, operational teams is needed efficiently...

10.3389/fdata.2021.753409 article EN cc-by Frontiers in Big Data 2022-01-07

High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study

OPENALEX - Publications

Maria Grigorieva Evheniy Tretyakov A. Klimentov D. Golubkov Tatiana Korchuganova and 3 more

The amount of scientific data generated by the LHC experiments has hit exabyte scale. These are transferred, processed and analyzed in hundreds computing centers. popularity among individual physicists University groups become one key factors efficient management processing. It was actively used during Run 1 2 for central processing, allowed optimization placement policies to spread workload more evenly over existing resources. Besides provide storage resources physics analysis thousands...

10.1109/ivmem51402.2020.00010 article EN 2020-09-01

TRACER (TRACe route ExploRer): A tool to explore OSG/WLCG network route topologies

OPENALEX - Publications

Evheniy Tretyakov Alexey Artamonov Maria Grigorieva A. Klimentov Shawn McKee and 1 more

The experiments at the Large Hadron Collider (LHC) rely upon a complex distributed computing infrastructure (WLCG) consisting of hundreds individual sites worldwide universities and national laboratories, providing about half billion job slots an exabyte storage interconnected through high speed networks. Wide Area Networking (WAN) is one three pillars (together with computational resources storage) LHC computing. More than 5 PB/day are transferred between WLCG sites. Monitoring crucial...

10.1142/s0217751x21300052 article EN International Journal of Modern Physics A 2021-02-02

Visual Analysis Application for the Error Messages Clustering Framework

OPENALEX - Publications

Dmitry Grin Maria Grigorieva Alexey Artamonov

The framework for clustering of error messages, ClusterLogs, was developed as a flexible and modular tool the needs large-scale distributed computing infrastructures. Various types failures are being constantly registered during execution millions operations daily. Monitoring systems faced with challenging task analysis considerable amount multi-sourced messages. It is critical to present information about errors human experts in way that makes them able analyze it. ClusterLogs pipeline...

10.1016/j.procs.2021.06.037 article EN Procedia Computer Science 2021-01-01

Development of DKB ETL module in case of data conversion

OPENALEX - Publications

Anastasiia Kaida Marina Golosova Maria Grigorieva M Y Gubin

Modern scientific experiments involve the producing of huge volumes data that requires new approaches in processing and storage. These themselves, as well their storage, are accompanied by a valuable amount additional information, called metadata, distributed over multiple informational systems repositories, having complicated, heterogeneous structure. Gathering these metadata for field high energy nuclear physics (HENP) is complex issue, requiring quest solutions outside box. One tasks to...

10.1088/1742-6596/1015/3/032055 article EN Journal of Physics Conference Series 2018-05-01

Evaluation of the Level-of-Detail Generator for Visual Analysis of the ATLAS Computing Metadata

OPENALEX - Publications

Maria Grigorieva Mikhail Titov Aleksandr Alekseev Alexey Artamonov A. Klimentov and 4 more

10.1134/s199508021911012x article EN Lobachevskii Journal of Mathematics 2019-11-01

Nested Intellectual Data Grouping and Clusterization for the Interactive Visual Explorer

OPENALEX - Publications

Maria Grigorieva Mikhail Titov Timofei Galkin I. Milman

The Interactive Visual Explorer (InVEx) application is designed as a visual analytics tool for Big Data analysis. an integral approach to data analysis, combining methods of intellectual analysis with advanced interactive visualization. One the main objectives InVExis process large samples by decreasing their level detail (LoD).The proposed includes clustering well flexible grouping different parameters, providing exploration from lowest highest details. results and clusterization...

10.1051/epjconf/202022603011 article EN cc-by EPJ Web of Conferences 2020-01-01

Parallel Coordinates Visualization in the ELK Stack

OPENALEX - Publications

Timofei Galkin Maria Grigorieva

Modern large-scale distributed computing systems, processing large volumes of data, require mature monitoring systems able to control and track in re-sources, networks, tasks, queues other components. In recent years, the ELK stack has become very popular for environment, largely due efficiency flexibility Elastic Search storage wide variety Kibana visualization tools. The analysis infrastructure metadata often requires visual exploration multiple parameters simultaneously on one graphical...

10.51130/graphicon-2020-2-3-10 article EN cc-by Proceedings of the 30th International Conference on Computer Graphics and Machine Vision (GraphiCon 2020) Part 1 2020-12-17

Parallel Version of the Framework for Clustering Error Messages

OPENALEX - Publications

Maxim Vorobyov K. Zhukov Maria Grigorieva Sergey Korobkov

10.1134/s1995080221070246 article EN Lobachevskii Journal of Mathematics 2021-07-01

Parallelizing of the DBSCAN algorithm in the ClusterLogs framework

OPENALEX - Publications

Ivan Zherdev K. Zhukov Maria Grigorieva Sergey Korobkov

ClusterLogs is a framework for the automatic categorization of computing jobs and resources by error messages in distributed systems. Initially, it was developed high-energy physics experiments, but can be applied other areas. The first prototype limited to sequential execution did not allow processing large amount data an acceptable time. In next prototype, system significantly improved parallelization several preprocessing stages. this paper, we focus on DBSCAN algorithm, main method used...

10.1142/s0217751x2150247x article EN International Journal of Modern Physics A 2021-11-24

Predictive analytics tools to adjust and monitor performance metrics for the ATLAS Production System

OPENALEX - Publications

Fernando Harald Barreiro Megino M. Borodin D. Golubkov Maria Grigorieva Maksim Gubin and 5 more

Having information such as an estimation of the processing time or possibility system outage (abnormal behaviour) helps to assist monitor performance and predict its next state. The current cyber-infrastructure ATLAS Production System presents computing conditions in which contention for resources among high-priority data analyses happens routinely, that might lead significant workload handling interruptions. lack behaviour analysis process (its duration) system's state itself provides...

10.1088/1742-6596/1085/3/032051 article EN Journal of Physics Conference Series 2018-09-01

Coming Soon ...