- Distributed and Parallel Computing Systems
- Scientific Computing and Data Management
- Advanced Data Storage Technologies
- Software System Performance and Reliability
- Big Data Technologies and Applications
- Cloud Computing and Resource Management
- Peer-to-Peer Network Technologies
- Research Data Management Practices
- Advanced Database Systems and Queries
- Data Visualization and Analytics
- Particle physics theoretical and experimental studies
- Particle Detector Development and Performance
- Complex Network Analysis Techniques
- Software Testing and Debugging Techniques
- Video Analysis and Summarization
- Ionosphere and magnetosphere dynamics
- Network Traffic and Congestion Control
- Computational and Text Analysis Methods
- Linguistics and Cultural Studies
- Service-Oriented Architecture and Web Services
- Data Stream Mining Techniques
- Atmospheric Ozone and Climate
- Caching and Content Delivery
- Solar and Space Plasma Dynamics
- Advanced Text Analysis Techniques
European Organization for Nuclear Research
2025
Lomonosov Moscow State University
2019-2023
Gorky Institute of World Literature
2023
Institute of Mathematical Problems of Biology
2020-2021
Plekhanov Russian University of Economics
2017-2021
Moscow Center For Continuous Mathematical Education
2020-2021
Moscow State University
2021
Kurchatov Institute
2015-2018
Tomsk Polytechnic University
2017-2018
National Research Tomsk State University
2018
Large-scale distributed computing infrastructures ensure the operation and maintenance of scientific experiments at LHC: more than 160 centers all over world execute tens millions jobs per day. ATLAS — largest experiment LHC creates an enormous flow data which has to be recorded analyzed by a complex heterogeneous environment. Statistically, about 10–12% end with failure: network faults, service failures, authorization other error conditions trigger messages provide detailed information...
The ATLAS Experiment at the LHC generates petabytes of data that is distributed among 160 computing sites all over world and processed continuously by various central production user analysis tasks. popularity typically measured as number accesses plays an important role in resolving management issues: deleting, replicating, moving between tapes, disks caches. These procedures were still carried out a semi-manual mode now we have focused our efforts on automating it, making use historical...
In this contribution we discuss the various aspects of computing resource needs experiments in High Energy and Nuclear Physics, particular at Large Hadron Collider. This will evolve future when moving from LHC to HL-LHC ten years now, already exascale levels data are processing could increase by a further order magnitude. The distributed environment has been great success inclusion new super-computing facilities, cloud volunteering for is big challenge, which successfully mastering with...
In the near future, large scientific collaborations will face unprecedented computing challenges. Processing and storing exabyte datasets require a federated infrastructure of distributed resources. The current systems have proven to be mature capable meeting experiment goals, by allowing timely delivery results. However, substantial amount interventions from software developers, shifters operational teams is needed efficiently manage such heterogeneous infrastructures. A wealth data can...
The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale complexity of LHC distributed computing for ATLAS experiment. currently distributes jobs among more than 100,000 cores at well over 120 Grid sites, supercomputing centers, commercial academic clouds. physicists submit 1.5 M data processing, simulation analysis per day, keeps all meta-information about job submissions execution events in Oracle RDBMS. above information is used...
In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe and workflows. These are used to obtain information about current system state for statistical trend analysis processes these systems drive. Over time amount stored can grow dramatically. this article we present our studies demonstrate how storage scalability performance be improved by using hybrid RDBMS/NoSQL architecture.
Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle experiment is managed by specialized software like Distributed Data Management Workload Systems. In order to be interpreted mined, experimental must accompanied auxiliary metadata, which recorded at each processing step. Metadata describes represent objects or results experiments, allowing them shared various applications,...
One of the most significant and rapidly developing fields data analysis is information flow management.In course targeted stochastic dissemination patterns are studied.The solving such problems daunting due to global growth amount its availability for a wide range users.The paper presents study in open networks on example COVID-19.The was conducted with use web scraping, methods linguistic visual analytics.As sources variety were used, as largest world Russian services, social instant...
Contemporary scientific experiments produce significant amount of data as well publications based on this data. Since volumes both are constantly increasing, it becomes more and problematic to establish a connection between given paper the underlying However, such an association is one crucial pieces information for performing various tasks, validating results presented in paper, comparing different approaches deal with problem or even simply understanding situation some area science....
As a joint effort from various communities involved in the Worldwide LHC Computing Grid, Operational Intelligence project aims at increasing level of automation computing operations and reducing human interventions. The distributed systems currently deployed by experiments have proven to be mature capable meeting experimental goals, allowing timely delivery scientific results. However, substantial number interventions software developers, shifters, operational teams is needed efficiently...
The amount of scientific data generated by the LHC experiments has hit exabyte scale. These are transferred, processed and analyzed in hundreds computing centers. popularity among individual physicists University groups become one key factors efficient management processing. It was actively used during Run 1 2 for central processing, allowed optimization placement policies to spread workload more evenly over existing resources. Besides provide storage resources physics analysis thousands...
The experiments at the Large Hadron Collider (LHC) rely upon a complex distributed computing infrastructure (WLCG) consisting of hundreds individual sites worldwide universities and national laboratories, providing about half billion job slots an exabyte storage interconnected through high speed networks. Wide Area Networking (WAN) is one three pillars (together with computational resources storage) LHC computing. More than 5 PB/day are transferred between WLCG sites. Monitoring crucial...
The framework for clustering of error messages, ClusterLogs, was developed as a flexible and modular tool the needs large-scale distributed computing infrastructures. Various types failures are being constantly registered during execution millions operations daily. Monitoring systems faced with challenging task analysis considerable amount multi-sourced messages. It is critical to present information about errors human experts in way that makes them able analyze it. ClusterLogs pipeline...
Modern scientific experiments involve the producing of huge volumes data that requires new approaches in processing and storage. These themselves, as well their storage, are accompanied by a valuable amount additional information, called metadata, distributed over multiple informational systems repositories, having complicated, heterogeneous structure. Gathering these metadata for field high energy nuclear physics (HENP) is complex issue, requiring quest solutions outside box. One tasks to...
The Interactive Visual Explorer (InVEx) application is designed as a visual analytics tool for Big Data analysis. an integral approach to data analysis, combining methods of intellectual analysis with advanced interactive visualization. One the main objectives InVExis process large samples by decreasing their level detail (LoD).The proposed includes clustering well flexible grouping different parameters, providing exploration from lowest highest details. results and clusterization...
Modern large-scale distributed computing systems, processing large volumes of data, require mature monitoring systems able to control and track in re-sources, networks, tasks, queues other components. In recent years, the ELK stack has become very popular for environment, largely due efficiency flexibility Elastic Search storage wide variety Kibana visualization tools. The analysis infrastructure metadata often requires visual exploration multiple parameters simultaneously on one graphical...
ClusterLogs is a framework for the automatic categorization of computing jobs and resources by error messages in distributed systems. Initially, it was developed high-energy physics experiments, but can be applied other areas. The first prototype limited to sequential execution did not allow processing large amount data an acceptable time. In next prototype, system significantly improved parallelization several preprocessing stages. this paper, we focus on DBSCAN algorithm, main method used...
Having information such as an estimation of the processing time or possibility system outage (abnormal behaviour) helps to assist monitor performance and predict its next state. The current cyber-infrastructure ATLAS Production System presents computing conditions in which contention for resources among high-priority data analyses happens routinely, that might lead significant workload handling interruptions. lack behaviour analysis process (its duration) system's state itself provides...