Liudmila Prokhorenkova

ORCID: 0000-0002-1520-4167
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Complex Network Analysis Techniques
  • Advanced Clustering Algorithms Research
  • Opinion Dynamics and Social Influence
  • Graph theory and applications
  • Advanced Graph Neural Networks
  • Web Data Mining and Analysis
  • Stochastic processes and statistical mechanics
  • Random Matrices and Applications
  • Anomaly Detection Techniques and Applications
  • Data Management and Algorithms
  • Peer-to-Peer Network Technologies
  • Advanced Graph Theory Research
  • Data Visualization and Analytics
  • Attachment and Relationship Dynamics
  • Graph Theory and Algorithms
  • Advanced Image and Video Retrieval Techniques
  • Machine Learning and Algorithms
  • Neural Networks and Applications
  • Caching and Content Delivery
  • Stochastic Gradient Optimization Techniques
  • Machine Learning and Data Classification
  • Text and Document Classification Technologies
  • Imbalanced Data Classification Techniques
  • Mental Health Research Topics
  • Markov Chains and Monte Carlo Methods

Yandex (Russia)
2013-2022

National Research University Higher School of Economics
2019-2022

Moscow Institute of Physics and Technology
2014-2020

Lomonosov Moscow State University
2013-2015

Retweet cascades play an essential role in information diffusion Twitter. Popular tweets reflect the current trends Twitter, while Twitter itself is one of most important online media. Thus, understanding reasons why a tweet becomes popular great interest for sociologists, marketers and social media researches. What even more possibility to make prognosis tweet's future popularity. Besides scientific significance such possibility, this sort prediction has lots practical applications as...

10.1145/2396761.2398634 article EN 2012-10-29

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available implementations in terms of quality on variety datasets. Two critical advances introduced are implementation ordered boosting, permutation-driven alternative classic algorithm, and an innovative algorithm for processing categorical features. Both were created fight prediction shift caused by special kind target leakage...

10.48550/arxiv.1706.09516 preprint EN other-oa arXiv (Cornell University) 2017-01-01

There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work examined standard datasets benchmarks assessing these approaches. Additionally, most estimation developed new techniques based small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, sensor which offer challenges involving...

10.48550/arxiv.2107.07455 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Node classification is a classical graph machine learning task on which Graph Neural Networks (GNNs) have recently achieved strong results. However, it often believed that standard GNNs only work well for homophilous graphs, i.e., graphs where edges tend to connect nodes of the same class. Graphs without this property are called heterophilous, and typically assumed specialized methods required achieve performance such graphs. In work, we challenge assumption. First, show datasets used...

10.48550/arxiv.2302.11640 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Several performance measures can be used for evaluating classification results: accuracy, F-measure, and many others. Can we say that some of them are better than others, or, ideally, choose one measure is best in all situations? To answer this question, conduct a systematic analysis measures: formally define list desirable properties theoretically analyze which satisfy properties. We also prove an impossibility theorem: cannot simultaneously satisfied. Finally, propose new family satisfying...

10.48550/arxiv.2201.09044 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Community detection is one of the most important problems in network analysis. Among many algorithms proposed for this task, methods based on statistical inference are particular interest: they mathematically sound and were shown to provide partitions good quality. Statistical fitting some random graph model (a.k.a. null model) observed by maximizing likelihood. The choice extremely main focus current study. We an extensive theoretical empirical analysis compare several models: widely used...

10.1145/3308558.3313429 article EN 2019-05-13

For many practical, high-risk applications, it is essential to quantify uncertainty in a model's predictions avoid costly mistakes. While predictive widely studied for neural networks, the topic seems be under-explored models based on gradient boosting. However, boosting often achieves state-of-the-art results tabular data. This work examines probabilistic ensemble-based framework deriving estimates of classification and regression models. We conducted experiments range synthetic real...

10.48550/arxiv.2006.10562 preprint EN other-oa arXiv (Cornell University) 2020-01-01

10.1016/j.endm.2017.07.058 article EN Electronic Notes in Discrete Mathematics 2017-08-01

In this paper we address the problem of quick detection high-degree entities in large online social networks. Practical importance is attested by a number companies that continuously collect and update statistics about popular entities, usually using degree an entity as approximation its popularity. We suggest simple, efficient, easy to implement two-stage randomized algorithm provides highly accurate solutions problem. For instance, our needs only one thousand API requests order find...

10.1109/icdm.2014.95 preprint EN 2014-12-01

Modularity is designed to measure the strength of division a network into clusters (known also as communities). Networks with high modularity have dense connections between vertices within but sparse different clusters. As result, often used in optimization methods for detecting community structure networks, and so it an important graph parameter from practical point view. Unfortunately, many existing non-spatial models complex networks do not generate graphs modularity; on other hand,...

10.24166/im.12.2017 article EN Internet Mathematics 2017-07-18

Graph neural networks (GNNs) are powerful models that have been successful in various graph representation learning tasks. Whereas gradient boosted decision trees (GBDT) often outperform other machine methods when faced with heterogeneous tabular data. But what approach should be used for graphs node features? Previous GNN mostly focused on homogeneous sparse features and, as we show, suboptimal the setting. In this work, propose a novel architecture trains GBDT and jointly to get best of...

10.48550/arxiv.2101.08543 preprint EN other-oa arXiv (Cornell University) 2021-01-01

When information or infectious diseases spread over a network, in many practical cases, one can observe when nodes adopt become infected, but the underlying network is hidden. In this paper, we analyze problem of finding communities highly interconnected nodes, given only infection times nodes. We propose, analyze, and empirically compare several algorithms for task. The most stable performance, that improves current state-of-the-art, obtained by our proposed heuristic approaches, are...

10.1145/3308558.3313560 article EN 2019-05-13

Graph-based approaches are empirically shown to be very successful for the nearest neighbor search (NNS). However, there has been little research on their theoretical guarantees. We fill this gap and rigorously analyze performance of graph-based NNS algorithms, specifically focusing low-dimensional (d << \log n) regime. In addition basic greedy algorithm graphs, we also most heuristics commonly used in practice: speeding up via adding shortcut edges improving accuracy maintaining a...

10.48550/arxiv.1907.00845 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this article, we study the clustering properties of spatial preferential attachment (SPA) model. This model naturally combines geometry and using notion spheres influence. It was previously shown in several research papers that graphs generated by SPA are similar to real-world networks many aspects. Also, successfully used for practical applications. However, were not fully analysed. The coefficient is an important characteristic complex which tightly connected with its community...

10.1093/comnet/cnz019 article EN Journal of Complex Networks 2019-05-03

In this paper, we study the problem of timely finding and crawling \textit{ephemeral} new pages, i.e., for which user traffic grows really quickly right after they appear, but lasts only several days (e.g., news, blog forum posts). Traditional policies do not give any particular priority to such pages may thus crawl them enough, even already obsolete content. We propose a metric, well thought out task, takes into account decrease interest ephemeral over time.

10.1145/2505515.2505641 article EN 2013-10-27

_In this article, we present a detailed analysis of the global clustering coefficient in scale-free graphs. Many observed real-world networks diverse nature have power-law degree distribution. Moreover, distribution usually has an infinite variance. Therefore, are especially interested such distributions. In addition, analyze for both weighted and unweighted There two well-known definitions graph: average local coefficients. several models proposed literature which tends to positive constant...

10.1080/15427951.2015.1092482 article EN Internet Mathematics 2015-09-18
Coming Soon ...