NFDI4DS | UHH-SEMS - Publication Details

Sergei Koltcov

ORCID: 0000-0002-2932-2746

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5063666310

Research Areas

Computational and Text Analysis Methods
Opinion Dynamics and Social Influence
Topic Modeling
Complex Network Analysis Techniques
Bayesian Methods and Mixture Models
Advanced Text Analysis Techniques
Misinformation and Its Impacts
Stock Market Forecasting Methods
Social Media and Politics
Hate Speech and Cyberbullying Detection
Sentiment Analysis and Opinion Mining
Neural Networks and Applications
Digital Marketing and Social Media
Statistical Mechanics and Entropy
Authorship Attribution and Profiling
Media Influence and Politics
Open Source Software Innovations
Knowledge Management and Sharing
Forecasting Techniques and Applications
Media Studies and Communication
Complex Systems and Time Series Analysis
Impact of Technology on Adolescents
Mental Health via Writing
Mathematical Biology Tumor Growth
Advanced Statistical Methods and Models

National Research University Higher School of Economics
2014-2024

Moscow Power Engineering Institute
2016

Latent dirichlet allocation

OPENALEX - Publications

Sergei Koltcov Olessia Koltsova Sergey Nikolenko

Topic modeling, in particular the Latent Dirichlet Allocation (LDA) model, has recently emerged as an important tool for understanding large datasets, particular, user-generated datasets social studies of Web. In this work, we investigate instability LDA inference, propose a new metric similarity between topics and criterion vocabulary reduction. We show limitations approach purposes qualitative analysis science sketch some ways improvement.

10.1145/2615569.2615680 article EN 2014-06-23

Topic modelling for qualitative studies

OPENALEX - Publications

Sergey Nikolenko Sergei Koltcov Olessia Koltsova

Qualitative studies, such as sociological research, opinion analysis and media can benefit greatly from automated topic mining provided by models latent Dirichlet allocation (LDA). However, examples of qualitative studies that employ modelling a tool are currently few far between. In this work, we identify two important problems along the way to using in studies: lack good quality metric closely matches human judgement understanding topics need indicate specific subtopics study may be most...

10.1177/0165551515617393 article EN Journal of Information Science 2015-12-12

Random forests with parametric entropy-based information gains for classification and regression problems

OPENALEX - Publications

Vera Ignatenko Anton Surkov Sergei Koltcov

The random forest algorithm is one of the most popular and commonly used algorithms for classification regression tasks. It combines output multiple decision trees to form a single result. Random demonstrate highest accuracy on tabular data compared other in various applications. However, forests and, more precisely, trees, are usually built with application classic Shannon entropy. In this article, we consider potential deformed entropies, which successfully field complex systems, increase...

10.7717/peerj-cs.1775 article EN cc-by PeerJ Computer Science 2024-01-03

Mapping the public agenda with topic modeling: The case of the Russian livejournal

OPENALEX - Publications

Olessia Koltsova Sergei Koltcov

Abstract This article describes agendas as “packages” of topics varying salience, set by the Russian Internet users on Russia's leading blog platform LiveJournal. The research involved modeling LiveJournal's topic structure, viewed an important component what is termed here self‐generated public opinion. Topic was performed automatically with LDA algorithm, and complemented hand labeling topics. Data were collected software created authors to generate a relational database storing all posts...

10.1002/1944-2866.poi331 article EN Policy & Internet 2013-06-01

Estimating Topic Modeling Performance with Sharma–Mittal Entropy

OPENALEX - Publications

Sergei Koltcov Vera Ignatenko Olessia Koltsova

Topic modeling is a popular approach for clustering text documents. However, current tools have number of unsolved problems such as instability and lack criteria selecting the values model parameters. In this work, we propose method to solve partially optimizing parameters, simultaneously accounting semantic stability. Our inspired by concepts from statistical physics based on Sharma–Mittal entropy. We test our two models: probabilistic Latent Semantic Analysis (pLSA) Dirichlet Allocation...

10.3390/e21070660 article EN cc-by Entropy 2019-07-05

Application of Rényi and Tsallis entropies to topic modeling optimization

OPENALEX - Publications

Sergei Koltcov

10.1016/j.physa.2018.08.050 article EN Physica A Statistical Mechanics and its Applications 2018-08-18

Topic models with elements of neural networks: investigation of stability, coherence, and determining the optimal number of topics

OPENALEX - Publications

Sergei Koltcov Anton Surkov V. A. Filippov Vera Ignatenko

Topic modeling is a widely used instrument for the analysis of large text collections. In last few years, neural topic models and with word embeddings have been proposed to increase quality solutions. However, these were not extensively tested in terms stability interpretability. Moreover, question selecting number topics (a model parameter) remains challenging task. We aim partially fill this gap by testing four well-known available wide range users such as embedded (ETM), Gaussian Softmax...

10.7717/peerj-cs.1758 article EN cc-by PeerJ Computer Science 2024-01-03

Mining Ethnic Content Online with Additively Regularized Topic Models

OPENALEX - Publications

Murat Apishev Sergei Koltcov Olessia Koltsova Sergey Nikolenko Konstantin Vorontsov

Social studies of the Internet have adopted large-scale text mining for unsupervised discovery topics related to specific subjects. A recently developed approach topic modeling, additive regularization models (ARTM), provides fast inference and more control over with a wide variety possible regularizers than developing LDA extensions. We apply ARTM ethnic-related content from Russian-language blogosphere, introduce new combined regularizer, compare derived LDA. show human evaluations that is...

10.13053/cys-20-3-2473 article EN Computación y Sistemas 2016-09-30

Communities of co-commenting in the Russian LiveJournal and their topical coherence

OPENALEX - Publications

Olessia Koltsova Sergei Koltcov Sergey Nikolenko

Purpose – The paper addresses the problem of what drives formation latent discussion communities, if any, in blogosphere: topical composition posts or their authorship? purpose this is to contribute knowledge about structure co-commenting. Design/methodology/approach research based on a dataset 17,386 full text written by top 2,000 LiveJournal bloggers and over 520,000 comments that result 4.5 million edges network co-commenting, where are vertices. Louvain algorithm used detect communities...

10.1108/intr-03-2014-0079 article EN Internet Research 2016-05-17

Analyzing the Influence of Hyper-parameters and Regularizers of Topic Modeling in Terms of Renyi Entropy

OPENALEX - Publications

Sergei Koltcov Vera Ignatenko Zeyd Boukhers Steffen Staab

Topic modeling is a popular technique for clustering large collections of text documents. A variety different types regularization implemented in topic modeling. In this paper, we propose novel approach analyzing the influence on results Based Renyi entropy, inspired by concepts from statistical physics, where an inferred topical structure collection can be considered information system residing non-equilibrium state. By testing our four models-Probabilistic Latent Semantic Analysis (pLSA),...

10.3390/e22040394 article EN cc-by Entropy 2020-03-30

Stable topic modeling for web science

OPENALEX - Publications

Sergei Koltcov Sergey Nikolenko Olessia Koltsova Svetlana S. Bodrunova

Topic modeling is a powerful tool for analyzing large collections of user-generated web content, but it still suffers from problems with topic stability, which are especially important social sciences. We evaluate stability different models and propose new model, granulated LDA, that samples short sequences neighboring words at once. show gLDA exhibits very stable results.

10.1145/2908131.2908184 article EN 2016-05-18

Analysis and tuning of hierarchical topic models based on Renyi entropy approach

OPENALEX - Publications

Sergei Koltcov Vera Ignatenko Maxim Terpilovskii Paolo Rosso

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing hierarchy representing the levels abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number topics at each level hierarchy, remains challenging task. In this paper, we propose approach based on Renyi entropy as partial solution to above problem. First, introduce entropy-based...

10.7717/peerj-cs.608 article EN cc-by PeerJ Computer Science 2021-07-29

Fractal approach for determining the optimal number of topics in the field of topic modeling.

OPENALEX - Publications

Vera Ignatenko Sergei Koltcov Steffen Staab Zeyd Boukhers

In this paper we apply multifractal formalism to the analysis of statistical behaviour topic models under condition varying number topics. Our reveals existence two self-similar regions and one transition region in function density-of-states depending on As earlier a that can be expressed through was successfully used determine optimal topics, test applicability for same purpose. We provide numerical results three (PLSA, ARTM, LDA Gibbs sampling) marked-up collections containing texts...

10.1088/1742-6596/1163/1/012025 article EN Journal of Physics Conference Series 2019-02-01

Renormalization Analysis of Topic Models

OPENALEX - Publications

Sergei Koltcov Vera Ignatenko

In practice, to build a machine learning model of big data, one needs tune parameters. The process parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory statistical physics provides techniques allowing us optimize this process. paper shows that function output topic modeling demonstrates self-similar behavior under variation number clusters. Such allows using renormalization technique. A combination procedure with Renyi entropy...

10.3390/e22050556 article EN cc-by Entropy 2020-05-16

A thermodynamic approach to selecting a number of clusters based on topic modeling

OPENALEX - Publications

Sergei Koltcov

10.1134/s1063785017060207 article EN Technical Physics Letters 2017-06-01

Gibbs sampler optimization for analysis of a granulated medium

OPENALEX - Publications

Sergei Koltcov Sergey Nikolenko E. Yu. Koltsova

10.1134/s1063785016080241 article EN Technical Physics Letters 2016-08-01

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language

OPENALEX - Publications

Sergei Koltcov Anton Surkov Olessia Koltsova Vera Ignatenko

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due privacy concerns and high annotation costs low-resource languages. A potential solution create human-AI systems that utilize extensive public domain user-to-user user-to-professional discussions on social media. These discussions,...

10.7717/peerj-cs.2395 article EN cc-by PeerJ Computer Science 2024-11-28

Changes in the Topical Structure of Russian-Language LiveJournal: The Impact of Elections 2011

OPENALEX - Publications

Kirill Maslinsky Sergei Koltcov Olessia Koltsova

This study investigates the topical structure of Russian-language blog-publishing service LiveJournal and change in it that occurred course public activity after State Duma elections December 2011 as compared to a previous "control" period (November 27-December 27 August 15-September 15 respectively). The data for both periods have been automatically obtained from 2000 top-rated blogs on basis ratings published by LiveJournal. Unsupervised topic modelling sampled posts was done using Latent...

10.2139/ssrn.2209802 article EN SSRN Electronic Journal 2013-01-01

Do ordinary bloggers really differ from blog celebrities?

OPENALEX - Publications

Olessia Koltsova Sergei Koltcov Svetlana Alexeeva

In this paper we describe structural and topical properties of "ordinary" blogs versus "popular" blogs. Using the complete directory Russian language LiveJournal, sample both groups show that main difference between them is in volume posting activity commenting feedback skewedness respective distributions. No substantial differences structure obtained with LDA algorithm are found, which suggests ordinary bloggers do not hold specific vision topic salience set their own "grassroots" agendas.

10.1145/2615569.2615675 article EN 2014-06-23

Coming Soon ...