- Advanced Clustering Algorithms Research
- Complex Network Analysis Techniques
- Data Mining Algorithms and Applications
- Data Management and Algorithms
- Rough Sets and Fuzzy Logic
- Face and Expression Recognition
- Advanced Text Analysis Techniques
- Bioinformatics and Genomic Networks
- Bayesian Methods and Mixture Models
- Text and Document Classification Technologies
- Genomics and Phylogenetic Studies
- Machine Learning in Bioinformatics
- Multi-Criteria Decision Making
- Gene expression and cancer classification
- Algorithms and Data Compression
- Advanced Graph Neural Networks
- RNA and protein synthesis mechanisms
- Remote-Sensing Image Classification
- Advanced Scientific Research Methods
- Data Visualization and Analytics
- Sensory Analysis and Statistical Methods
- Advanced Statistical Methods and Models
- DNA and Biological Computing
- Spam and Phishing Detection
- Logic, programming, and type systems
Birkbeck, University of London
2013-2024
National Research University Higher School of Economics
2015-2024
Technion – Israel Institute of Technology
2011-2014
University of London
2006-2011
University of Trento
2011
Lancaster University
2011
Carnegie Mellon University
2011
University of Surrey
2011
Cornell University
2011
University of California, Irvine
2011
Lactic acid-producing bacteria are associated with various plant and animal niches play a key role in the production of fermented foods beverages. We report nine genome sequences representing phylogenetic functional diversity these bacteria. The small genomes lactic acid encode broad repertoire transporters for efficient carbon nitrogen acquisition from nutritionally rich environments they inhabit reflect limited range biosynthetic capabilities that indicate both prototrophic auxotrophic...
Comparative analysis of sequenced genomes reveals numerous instances apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific loss might have been even more common evolution. This complicates the notion a species tree, which needs to be re-interpreted as prevailing evolutionary trend, rather than full depiction evolution, makes reconstruction ancestral non-trivial task.We addressed problem constructing parsimonious scenarios for individual sets...
Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction eukaryotic genomes consists paralogous gene families. We assess the extent ancestral paralogy, which dates back to last common ancestor all eukaryotes, and examine origins paralogs their potential roles in emergence cell complexity. parsimonious reconstruction repertoires shows that 4137 orthologous sets (LECA) map 2150 hypothetical first (FECA) [paralogy quotient (PQ) 1.92]. Analogous reconstructions...
This paper gives an experimentally supported review and comparison of several indices based on the conventional K-means inertia criterion for determining number clusters, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i> , in datasets, using popular Silhouette width index as a benchmark. Our experiments involve novel version Elbow index, defined values two or three steps apart. We also discuss alternative ways computing summarizing its...
Abstract The issue of determining ‘the right number clusters’ is attracting ever growing interest. paper reviews published work on the with respect to mixture distributions, partition, especially in k ‐means clustering, and hierarchical cluster structures. Some perspective directions for further developments are outlined. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 1 252–260 DOI: 10.1002/widm.15 This article categorized under: Algorithmic Development > Structure...
In the framework of problem combining different gene trees into a unique species phylogeny, model for duplication/speciation/loss events along evolutionary tree is introduced. The employed embedding phylogeny another one via so-called duplication/speciation principle requiring that duplicated evolves in such way any contemporary involved bears only copies diverged. number biologically meaningful elements result (duplications, losses, information gaps) considered (asymmetric) dissimilarity...
The multiple prototype fuzzy clustering model (FCMP), introduced by Nascimento, Mirkin and Moura-Pires (1999), proposes a framework for partitional which suggests of how the data are generated from cluster structure to be identified. In model, it is assumed that membership each entity expresses part reflected in entity. this paper we extend FCMP number criteria, study properties on fitting underlying proposed generated. A comparative with c-means algorithm also presented.
Abstract The prediction of a biological activity using Quantitative Structure–Activity Relationship (QSAR) model is valid only if the compound in question inside model's domain applicability. existing methods for determining applicability descriptor space suffer from problems including poor handling nonconvex training sets and computational inefficiency. In this paper, we propose cluster‐based approach to modelling applicability, which may overcome some shortcomings approaches described. We...