- Computational and Text Analysis Methods
- Bayesian Methods and Mixture Models
- Topic Modeling
- Advanced Text Analysis Techniques
- Neural Networks and Applications
- Engineering Technology and Methodologies
- Opinion Dynamics and Social Influence
- Complex Network Analysis Techniques
- Astrophysics and Cosmic Phenomena
- Mental Health via Writing
- Technical Engine Diagnostics and Monitoring
- Engineering Diagnostics and Reliability
- Dark Matter and Cosmic Phenomena
- Expert finding and Q&A systems
- Environmental Sustainability and Technology
- Sociopolitical Dynamics in Russia
- Numerical methods for differential equations
- Digital Mental Health Interventions
- Mechanical and Thermal Properties Analysis
- Forecasting Techniques and Applications
- Authorship Attribution and Profiling
- Nonlinear Dynamics and Pattern Formation
- Advanced Clustering Algorithms Research
- Cosmology and Gravitation Theories
- Differential Equations and Numerical Methods
National Research University Higher School of Economics
2019-2024
The random forest algorithm is one of the most popular and commonly used algorithms for classification regression tasks. It combines output multiple decision trees to form a single result. Random demonstrate highest accuracy on tabular data compared other in various applications. However, forests and, more precisely, trees, are usually built with application classic Shannon entropy. In this article, we consider potential deformed entropies, which successfully field complex systems, increase...
Topic modeling is a popular approach for clustering text documents. However, current tools have number of unsolved problems such as instability and lack criteria selecting the values model parameters. In this work, we propose method to solve partially optimizing parameters, simultaneously accounting semantic stability. Our inspired by concepts from statistical physics based on Sharma–Mittal entropy. We test our two models: probabilistic Latent Semantic Analysis (pLSA) Dirichlet Allocation...
Topic modeling is a widely used instrument for the analysis of large text collections. In last few years, neural topic models and with word embeddings have been proposed to increase quality solutions. However, these were not extensively tested in terms stability interpretability. Moreover, question selecting number topics (a model parameter) remains challenging task. We aim partially fill this gap by testing four well-known available wide range users such as embedded (ETM), Gaussian Softmax...
Topic modeling is a popular technique for clustering large collections of text documents. A variety different types regularization implemented in topic modeling. In this paper, we propose novel approach analyzing the influence on results Based Renyi entropy, inspired by concepts from statistical physics, where an inferred topical structure collection can be considered information system residing non-equilibrium state. By testing our four models-Probabilistic Latent Semantic Analysis (pLSA),...
Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing hierarchy representing the levels abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number topics at each level hierarchy, remains challenging task. In this paper, we propose approach based on Renyi entropy as partial solution to above problem. First, introduce entropy-based...
In this paper we apply multifractal formalism to the analysis of statistical behaviour topic models under condition varying number topics. Our reveals existence two self-similar regions and one transition region in function density-of-states depending on As earlier a that can be expressed through was successfully used determine optimal topics, test applicability for same purpose. We provide numerical results three (PLSA, ARTM, LDA Gibbs sampling) marked-up collections containing texts...
In practice, to build a machine learning model of big data, one needs tune parameters. The process parameter tuning involves extremely time-consuming and computationally expensive grid search. However, the theory statistical physics provides techniques allowing us optimize this process. paper shows that function output topic modeling demonstrates self-similar behavior under variation number clusters. Such allows using renormalization technique. A combination procedure with Renyi entropy...
Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due privacy concerns and high annotation costs low-resource languages. A potential solution create human-AI systems that utilize extensive public domain user-to-user user-to-professional discussions on social media. These discussions,...
The basic methods for approximating the elementary functions of probability distribution density a random sample from general set statistical material used in field operational reliability cars are analyzed. It was proposed to use Johnson and Pearson systems describe non-Gaussian experimental data, which allow us practically any unimodal distributions. effectiveness these investigated by modeling. Results approbation models on real data presented.
A one-parameter family of Mackey-Glass type differential delay equations is considered. The existence a homoclinic solution for suitable parameter value proved. As consequence, one obtains stable periodic solutions nearby values. An example nonlinear functions given, which all sufficient conditions our theoretical results can be verified numerically. Numerically computed are shown.
In practice, the critical step in building machine learning models of big data (BD) is costly terms time and computing resources procedure parameter tuning with a grid search. Due to size, BD are comparable mesoscopic physical systems. Hence, methods statistical physics could be applied BD. The paper shows that topic modeling demonstrates self-similar behavior under condition varying number clusters. Such allows using renormalization technique. combination Rényi entropy approach for fast...
It is shown that a drastic modifications of BH properties in the extra dimension presence makes method extra‐dimensional search by possible celestial body absorption much more effective and realistic. First, decrease Hawking radiation intensity allows to survive till present BHs with masses many orders less 4D mass. Second, strong gravity these decelerate fast become captured white dwarfs or other bodies. And third, same complete dwarf light for cosmological time. The possibility detection...
Белорусский государственный технологический университет», Республика Беларусь1
Hierarchical topic modeling is a potentially powerful instrument for determining the topical structure of text collections that allows constructing hierarchy representing levels abstraction. However, tuning parameters hierarchical models, including number topics on each level, remains challenging task and an open issue. In this paper, we propose Renyi entropy-based approach partial solution to above problem. First, metric quality models. Second, practical concept model tested datasets with...