Sergei Koltcov

ORCID: 0000-0002-2932-2746
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Computational and Text Analysis Methods
  • Opinion Dynamics and Social Influence
  • Topic Modeling
  • Complex Network Analysis Techniques
  • Bayesian Methods and Mixture Models
  • Advanced Text Analysis Techniques
  • Misinformation and Its Impacts
  • Stock Market Forecasting Methods
  • Social Media and Politics
  • Hate Speech and Cyberbullying Detection
  • Sentiment Analysis and Opinion Mining
  • Neural Networks and Applications
  • Digital Marketing and Social Media
  • Statistical Mechanics and Entropy
  • Authorship Attribution and Profiling
  • Media Influence and Politics
  • Open Source Software Innovations
  • Knowledge Management and Sharing
  • Forecasting Techniques and Applications
  • Media Studies and Communication
  • Complex Systems and Time Series Analysis
  • Impact of Technology on Adolescents
  • Mental Health via Writing
  • Mathematical Biology Tumor Growth
  • Advanced Statistical Methods and Models

National Research University Higher School of Economics
2014-2024

Moscow Power Engineering Institute
2016

Topic modeling, in particular the Latent Dirichlet Allocation (LDA) model, has recently emerged as an important tool for understanding large datasets, particular, user-generated datasets social studies of Web. In this work, we investigate instability LDA inference, propose a new metric similarity between topics and criterion vocabulary reduction. We show limitations approach purposes qualitative analysis science sketch some ways improvement.

10.1145/2615569.2615680 article EN 2014-06-23

Qualitative studies, such as sociological research, opinion analysis and media can benefit greatly from automated topic mining provided by models latent Dirichlet allocation (LDA). However, examples of qualitative studies that employ modelling a tool are currently few far between. In this work, we identify two important problems along the way to using in studies: lack good quality metric closely matches human judgement understanding topics need indicate specific subtopics study may be most...

10.1177/0165551515617393 article EN Journal of Information Science 2015-12-12

The random forest algorithm is one of the most popular and commonly used algorithms for classification regression tasks. It combines output multiple decision trees to form a single result. Random demonstrate highest accuracy on tabular data compared other in various applications. However, forests and, more precisely, trees, are usually built with application classic Shannon entropy. In this article, we consider potential deformed entropies, which successfully field complex systems, increase...

10.7717/peerj-cs.1775 article EN cc-by PeerJ Computer Science 2024-01-03

Abstract This article describes agendas as “packages” of topics varying salience, set by the Russian Internet users on Russia's leading blog platform LiveJournal. The research involved modeling LiveJournal's topic structure, viewed an important component what is termed here self‐generated public opinion. Topic was performed automatically with LDA algorithm, and complemented hand labeling topics. Data were collected software created authors to generate a relational database storing all posts...

10.1002/1944-2866.poi331 article EN Policy & Internet 2013-06-01

Topic modeling is a popular approach for clustering text documents. However, current tools have number of unsolved problems such as instability and lack criteria selecting the values model parameters. In this work, we propose method to solve partially optimizing parameters, simultaneously accounting semantic stability. Our inspired by concepts from statistical physics based on Sharma–Mittal entropy. We test our two models: probabilistic Latent Semantic Analysis (pLSA) Dirichlet Allocation...

10.3390/e21070660 article EN cc-by Entropy 2019-07-05

10.1016/j.physa.2018.08.050 article EN Physica A Statistical Mechanics and its Applications 2018-08-18

Topic modeling is a widely used instrument for the analysis of large text collections. In last few years, neural topic models and with word embeddings have been proposed to increase quality solutions. However, these were not extensively tested in terms stability interpretability. Moreover, question selecting number topics (a model parameter) remains challenging task. We aim partially fill this gap by testing four well-known available wide range users such as embedded (ETM), Gaussian Softmax...

10.7717/peerj-cs.1758 article EN cc-by PeerJ Computer Science 2024-01-03

Social studies of the Internet have adopted large-scale text mining for unsupervised discovery topics related to specific subjects. A recently developed approach topic modeling, additive regularization models (ARTM), provides fast inference and more control over with a wide variety possible regularizers than developing LDA extensions. We apply ARTM ethnic-related content from Russian-language blogosphere, introduce new combined regularizer, compare derived LDA. show human evaluations that is...

10.13053/cys-20-3-2473 article EN Computación y Sistemas 2016-09-30

Purpose – The paper addresses the problem of what drives formation latent discussion communities, if any, in blogosphere: topical composition posts or their authorship? purpose this is to contribute knowledge about structure co-commenting. Design/methodology/approach research based on a dataset 17,386 full text written by top 2,000 LiveJournal bloggers and over 520,000 comments that result 4.5 million edges network co-commenting, where are vertices. Louvain algorithm used detect communities...

10.1108/intr-03-2014-0079 article EN Internet Research 2016-05-17

Topic modeling is a popular technique for clustering large collections of text documents. A variety different types regularization implemented in topic modeling. In this paper, we propose novel approach analyzing the influence on results Based Renyi entropy, inspired by concepts from statistical physics, where an inferred topical structure collection can be considered information system residing non-equilibrium state. By testing our four models-Probabilistic Latent Semantic Analysis (pLSA),...

10.3390/e22040394 article EN cc-by Entropy 2020-03-30

Topic modeling is a powerful tool for analyzing large collections of user-generated web content, but it still suffers from problems with topic stability, which are especially important social sciences. We evaluate stability different models and propose new model, granulated LDA, that samples short sequences neighboring words at once. show gLDA exhibits very stable results.

10.1145/2908131.2908184 article EN 2016-05-18

Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing hierarchy representing the levels abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number topics at each level hierarchy, remains challenging task. In this paper, we propose approach based on Renyi entropy as partial solution to above problem. First, introduce entropy-based...

10.7717/peerj-cs.608 article EN cc-by PeerJ Computer Science 2021-07-29

In this paper we apply multifractal formalism to the analysis of statistical behaviour topic models under condition varying number topics. Our reveals existence two self-similar regions and one transition region in function density-of-states depending on As earlier a that can be expressed through was successfully used determine optimal topics, test applicability for same purpose. We provide numerical results three (PLSA, ARTM, LDA Gibbs sampling) marked-up collections containing texts...

10.1088/1742-6596/1163/1/012025 article EN Journal of Physics Conference Series 2019-02-01

In practice, to build a machine learning model of big data, one needs tune parameters. The process parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory statistical physics provides techniques allowing us optimize this process. paper shows that function output topic modeling demonstrates self-similar behavior under variation number clusters. Such allows using renormalization technique. A combination procedure with Renyi entropy...

10.3390/e22050556 article EN cc-by Entropy 2020-05-16

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due privacy concerns and high annotation costs low-resource languages. A potential solution create human-AI systems that utilize extensive public domain user-to-user user-to-professional discussions on social media. These discussions,...

10.7717/peerj-cs.2395 article EN cc-by PeerJ Computer Science 2024-11-28

This study investigates the topical structure of Russian-language blog-publishing service LiveJournal and change in it that occurred course public activity after State Duma elections December 2011 as compared to a previous "control" period (November 27-December 27 August 15-September 15 respectively). The data for both periods have been automatically obtained from 2000 top-rated blogs on basis ratings published by LiveJournal. Unsupervised topic modelling sampled posts was done using Latent...

10.2139/ssrn.2209802 article EN SSRN Electronic Journal 2013-01-01

In this paper we describe structural and topical properties of "ordinary" blogs versus "popular" blogs. Using the complete directory Russian language LiveJournal, sample both groups show that main difference between them is in volume posting activity commenting feedback skewedness respective distributions. No substantial differences structure obtained with LDA algorithm are found, which suggests ordinary bloggers do not hold specific vision topic salience set their own "grassroots" agendas.

10.1145/2615569.2615675 article EN 2014-06-23
Coming Soon ...