NFDI4DS | UHH-SEMS - Publication Details

Liudmila Prokhorenkova

ORCID: 0000-0002-1520-4167

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038638877

Research Areas

Complex Network Analysis Techniques
Advanced Clustering Algorithms Research
Opinion Dynamics and Social Influence
Graph theory and applications
Advanced Graph Neural Networks
Web Data Mining and Analysis
Stochastic processes and statistical mechanics
Random Matrices and Applications
Anomaly Detection Techniques and Applications
Data Management and Algorithms
Peer-to-Peer Network Technologies
Advanced Graph Theory Research
Data Visualization and Analytics
Attachment and Relationship Dynamics
Graph Theory and Algorithms
Advanced Image and Video Retrieval Techniques
Machine Learning and Algorithms
Neural Networks and Applications
Caching and Content Delivery
Stochastic Gradient Optimization Techniques
Machine Learning and Data Classification
Text and Document Classification Technologies
Imbalanced Data Classification Techniques
Mental Health Research Topics
Markov Chains and Monte Carlo Methods

Yandex (Russia)
2013-2022

National Research University Higher School of Economics
2019-2022

Moscow Institute of Physics and Technology
2014-2020

Lomonosov Moscow State University
2013-2015

Prediction of retweet cascade size over time

OPENALEX - Publications

Andrey Kupavskii Liudmila Prokhorenkova Alexey Umnov Svyatoslav Usachev Pavel Serdyukov and 2 more

Retweet cascades play an essential role in information diffusion Twitter. Popular tweets reflect the current trends Twitter, while Twitter itself is one of most important online media. Thus, understanding reasons why a tweet becomes popular great interest for sociologists, marketers and social media researches. What even more possibility to make prognosis tweet's future popularity. Besides scientific significance such possibility, this sort prediction has lots practical applications as...

10.1145/2396761.2398634 article EN 2012-10-29

CatBoost: unbiased boosting with categorical features

OPENALEX - Publications

Liudmila Prokhorenkova Gleb Gusev Aleksandr Vorobev Anna Veronika Dorogush Andrey Gulin

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available implementations in terms of quality on variety datasets. Two critical advances introduced are implementation ordered boosting, permutation-driven alternative classic algorithm, and an innovative algorithm for processing categorical features. Both were created fight prediction shift caused by special kind target leakage...

10.48550/arxiv.1706.09516 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Shifts: A Dataset of Real Distributional Shift Across Multiple Large-Scale Tasks

OPENALEX - Publications

Andrey Malinin Neil Band Ganshin Alexander - German Chesnokov and 13 more

There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work examined standard datasets benchmarks assessing these approaches. Additionally, most estimation developed new techniques based small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, sensor which offer challenges involving...

10.48550/arxiv.2107.07455 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A critical look at the evaluation of GNNs under heterophily: are we really making progress?

OPENALEX - Publications

Oleg Platonov Denis Kuznedelev M.G. Diskin Artem Babenko Liudmila Prokhorenkova

Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and typically assumed specialized methods required achieve performance such graphs. In work, we challenge assumption. First, show datasets used...

10.48550/arxiv.2302.11640 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Good Classification Measures and How to Find Them

OPENALEX - Publications

Martijn Gösgens Anton Zhiyanov Alexei Tikhonov Liudmila Prokhorenkova

Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure is best in all situations? To answer this question, conduct a systematic analysis measures: formally define list desirable properties theoretically analyze which satisfy properties. We also prove an impossibility theorem: cannot simultaneously satisfied. Finally, propose new family satisfying...

10.48550/arxiv.2201.09044 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Community Detection through Likelihood Optimization: In Search of a Sound Model

OPENALEX - Publications

Liudmila Prokhorenkova Alexey Tikhonov

Community detection is one of the most important problems in network analysis. Among many algorithms proposed for this task, methods based on statistical inference are particular interest: they mathematically sound and were shown to provide partitions good quality. Statistical fitting some random graph model (a.k.a. null model) observed by maximizing likelihood. The choice extremely main focus current study. We an extensive theoretical empirical analysis compare several models: widely used...

10.1145/3308558.3313429 article EN 2019-05-13

Uncertainty in Gradient Boosting via Ensembles

OPENALEX - Publications

Aleksei Ustimenko Liudmila Prokhorenkova Andrey Malinin

For many practical, high-risk applications, it is essential to quantify uncertainty in a model's predictions avoid costly mistakes. While predictive widely studied for neural networks, the topic seems be under-explored models based on gradient boosting. However, boosting often achieves state-of-the-art results tabular data. This work examines probabilistic ensemble-based framework deriving estimates of classification and regression models. We conducted experiments range synthetic real...

10.48550/arxiv.2006.10562 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Modularity in several random graph models

OPENALEX - Publications

Liudmila Prokhorenkova Paweł Prałat А. М. Райгородский

10.1016/j.endm.2017.07.058 article EN Electronic Notes in Discrete Mathematics 2017-08-01

Quick Detection of High-Degree Entities in Large Directed Networks

OPENALEX - Publications

Konstantin Avrachenkov Nelly Litvak Liudmila Prokhorenkova E. Suyargulova

In this paper we address the problem of quick detection high-degree entities in large online social networks. Practical importance is attested by a number companies that continuously collect and update statistics about popular entities, usually using degree an entity as approximation its popularity. We suggest simple, efficient, easy to implement two-stage randomized algorithm provides highly accurate solutions problem. For instance, our needs only one thousand API requests order find...

10.1109/icdm.2014.95 preprint EN 2014-12-01

Modularity of complex networks models

OPENALEX - Publications

Liudmila Prokhorenkova А. М. Райгородский Paweł Prałat

Modularity is designed to measure the strength of division a network into clusters (known also as communities). Networks with high modularity have dense connections between vertices within but sparse different clusters. As result, often used in optimization methods for detecting community structure networks, and so it an important graph parameter from practical point view. Unfortunately, many existing non-spatial models complex networks do not generate graphs modularity; on other hand,...

10.24166/im.12.2017 article EN Internet Mathematics 2017-07-18

Boost then Convolve: Gradient Boosting Meets Graph Neural Networks

OPENALEX - Publications

Sergei Ivanov Liudmila Prokhorenkova

Graph neural networks (GNNs) are powerful models that have been successful in various graph representation learning tasks. Whereas gradient boosted decision trees (GBDT) often outperform other machine methods when faced with heterogeneous tabular data. But what approach should be used for graphs node features? Previous GNN mostly focused on homogeneous sparse features and, as we show, suboptimal the setting. In this work, propose a novel architecture trains GBDT and jointly to get best of...

10.48550/arxiv.2101.08543 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Learning Clusters through Information Diffusion

OPENALEX - Publications

Liudmila Prokhorenkova Alexey Tikhonov Nelly Litvak

When information or infectious diseases spread over a network, in many practical cases, one can observe when nodes adopt become infected, but the underlying network is hidden. In this paper, we analyze problem of finding communities highly interconnected nodes, given only infection times nodes. We propose, analyze, and empirically compare several algorithms for task. The most stable performance, that improves current state-of-the-art, obtained by our proposed heuristic approaches, are...

10.1145/3308558.3313560 article EN 2019-05-13

Graph-based Nearest Neighbor Search: From Practice to Theory

OPENALEX - Publications

Liudmila Prokhorenkova Aleksandr Shekhovtsov

Graph-based approaches are empirically shown to be very successful for the nearest neighbor search (NNS). However, there has been little research on their theoretical guarantees. We fill this gap and rigorously analyze performance of graph-based NNS algorithms, specifically focusing low-dimensional (d << \log n) regime. In addition basic greedy algorithm graphs, we also most heuristics commonly used in practice: speeding up via adding shortcut edges improving accuracy maintaining a...

10.48550/arxiv.1907.00845 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Local clustering coefficient of spatial preferential attachment model

OPENALEX - Publications

Lenar Iskhakov Bogumił Kamiński Maksim Mironov Paweł Prałat Liudmila Prokhorenkova

In this article, we study the clustering properties of spatial preferential attachment (SPA) model. This model naturally combines geometry and using notion spheres influence. It was previously shown in several research papers that graphs generated by SPA are similar to real-world networks many aspects. Also, successfully used for practical applications. However, were not fully analysed. The coefficient is an important characteristic complex which tightly connected with its community...

10.1093/comnet/cnz019 article EN Journal of Complex Networks 2019-05-03

Timely crawling of high-quality ephemeral new content

OPENALEX - Publications

Damien Lefortier Liudmila Prokhorenkova Egor Samosvat Pavel Serdyukov

In this paper, we study the problem of timely finding and crawling \textit{ephemeral} new pages, i.e., for which user traffic grows really quickly right after they appear, but lasts only several days (e.g., news, blog forum posts). Traditional policies do not give any particular priority to such pages may thus crawl them enough, even already obsolete content. We propose a metric, well thought out task, takes into account decrease interest ephemeral over time.

10.1145/2505515.2505641 article EN 2013-10-27

Global Clustering Coefficient in Scale-Free Weighted and Unweighted Networks

OPENALEX - Publications

Liudmila Prokhorenkova

_In this article, we present a detailed analysis of the global clustering coefficient in scale-free graphs. Many observed real-world networks diverse nature have power-law degree distribution. Moreover, distribution usually has an infinite variance. Therefore, are especially interested such distributions. In addition, analyze for both weighted and unweighted There two well-known definitions graph: average local coefficients. several models proposed literature which tends to positive constant...

10.1080/15427951.2015.1092482 article EN Internet Mathematics 2015-09-18

General results on preferential attachment and clustering coefficient

OPENALEX - Publications

Liudmila Prokhorenkova

10.1007/s11590-016-1030-8 article EN Optimization Letters 2016-04-06

Coming Soon ...