- Bayesian Modeling and Causal Inference
- Machine Learning and Data Classification
- Gene expression and cancer classification
- AI-based Problem Solving and Planning
- Bioinformatics and Genomic Networks
- Computational Drug Discovery Methods
- Statistical Methods and Inference
- Data Quality and Management
- Rough Sets and Fuzzy Logic
- Fault Detection and Control Systems
- Metal-Organic Frameworks: Synthesis and Applications
- Machine Learning and Algorithms
- Face and Expression Recognition
- Machine Learning in Materials Science
- Statistical Methods in Clinical Trials
- Explainable Artificial Intelligence (XAI)
- Indoor and Outdoor Localization Technologies
- Click Chemistry and Applications
- Gene Regulatory Network Analysis
- Advanced Multi-Objective Optimization Algorithms
- Statistical Methods and Bayesian Inference
- Fuzzy Logic and Control Systems
- Underwater Acoustics Research
- X-ray Diffraction in Crystallography
- Hemodynamic Monitoring and Therapy
Science and Technology Park of Crete
2022
University of Crete
2012-2021
Crete University Press
2017
Foundation for Research and Technology Hellas
2010-2015
Czech Academy of Sciences, Institute of Computer Science
2010-2015
FORTH Institute of Computer Science
2012-2013
Laboratoire d'Informatique de Paris-Nord
2010
Abstract A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use machine learning technologies. This approach a promising trade-off between accuracy ab initio methods and speed classical approaches, strategically combined chemical intuition. The results demonstrate that properties are indeed predictable (stochastically, not deterministically) using automated analysis protocols, predictions increasing sample size. Our initial indicate this...
Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms values hyper-parameters (called a configuration) producing final predictive model, (b) estimating performance model. However, cross-validated best configuration is optimistically biased. We present an efficient bootstrap method that corrects bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea to whole process...
Fully automated machine learning (AutoML) for predictive modeling is becoming a reality, giving rise to whole new field. We present the basic ideas and principles of Just Add Data Bio (JADBio), an AutoML platform applicable low-sample, high-dimensional omics data that arise in translational medicine bioinformatics applications. In addition diagnostic models ready clinical use, JADBio focuses on knowledge discovery by performing feature selection identifying corresponding biosignatures, i.e.,...
We present the Parallel, Forward–Backward with Pruning (PFBP) algorithm for feature selection (FS) Big Data of high dimensionality. PFBP partitions data matrix both in terms rows as well columns. By employing concepts p-values conditional independence tests and meta-analysis techniques, relies only on computations local to a partition while minimizing communication costs, thus massively parallelizing computations. Similar techniques combining are also employed create final predictive model....
We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial, and ordinal variables. use likelihood-ratio tests based on appropriate regression models show how to derive symmetric conditional independence tests. Such can then be directly used by existing methods data, PC FCI algorithms for learning Bayesian networks maximal ancestral graphs, respectively. In experiments simulated networks, we employ algorithm...
Abstract Fully automated machine learning, statistical modelling, and artificial intelligence for predictive modeling is becoming a reality, giving rise to the field of Automated Machine Learning (AutoML). AutoML systems promise democratize data analysis non-experts, drastically increase productivity, improve replicability analysis, facilitate interpretation results, shield against common methodological pitfalls. We present basic ideas principles Just Add Data Bio (JADBIO), an technology...
Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but not sufficient knowledge discovery if multiple solutions exist. We propose strategy to extend class of greedy efficiently solutions, and show under which conditions it identifies all solutions. also introduce taxonomy features that takes the existence into account. Furthermore, we explore different definitions statistical equivalence as well testing equivalence. A novel algorithm...
Abstract Could there be unexpected similarities between different studies, diseases, or treatments, on a molecular level due to common biological mechanisms involved? To answer this question, we develop method for computing empirical, statistical distributions of high-dimensional, low-sample datasets, and apply it hundreds -omics studies. The lead dataset-to-dataset networks visualizing the landscape large portion data. Potentially interesting connecting studies diseases are assembled in...
We consider the incorporation of causal knowledge about presence or absence (possibly indirect) relations into a model. Such correspond to directed paths in This type naturally arises from experimental data, among others. Specifically, we formalisms Causal Bayesian Networks and Maximal Ancestral Graphs their Markov equivalence classes: Partially Directed Acyclic Oriented Graphs. introduce sound complete procedures which are able incorporate prior such models. In simulated experiments, show...
Causal discovery algorithms can induce some of the causal relations from data, commonly in form a network such as Bayesian network. Arguably however, all lack far behind what is necessary for true business application. We develop an initial version new, general algorithm called ETIO with many features suitable applications. These include (a) ability to accept prior knowledge (e.g., taking senior driving courses improves skills), (b) admitting presence latent confounding factors, (c)...
In this paper, we consider the data association problem that arises when localizing multiple sound sources using direction of arrival (DOA) estimates from microphone arrays. such a scenario, DOAs across arrays correspond to same source is unknown and must be found for accurate localization. We present an algorithm finds correct DOA based on features extracted each propose. Our method results in high localization accuracy scenarios with missed detections, reverberation, noise outperforms...
Forward-backward selection is one of the most basic and commonly-used feature algorithms available. It also general conceptually applicable to many different types data. In this paper, we propose a heuristic that significantly improves its running time, while preserving predictive accuracy. The idea temporarily discard variables are conditionally independent with outcome given selected variable set. Depending on how those reconsidered reintroduced, gives rise family increasingly stronger...
A significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs each possible network, thus complementing the data. In this paper, a method presented assigning priors based on presence or absence certain paths in true network. Such correspond to knowledge about causal and associative relations between pairs variables. This type naturally arises from experimental observational data, among others. addition, novel...
A correction to this article has been published and is linked from the HTML version of article.
Any supervised machine learning analysis is required to provide an estimate of the out-of-sample predictive performance. However, it imperative also a quantification uncertainty this performance in form confidence or credible interval (CI) and not just point estimate. In AutoML setting, estimating CI challenging due ``winner's curse", i.e., bias estimation cross-validating several pipelines selecting winning one. work, we perform comparative evaluation 9 state-of-the-art methods variants...
A significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs each possible network, thus complementing the data. In this paper, a method presented assigning priors based on presence or absence certain paths in true network. Such correspond to knowledge about causal and associative relations between pairs variables. This type naturally arises from experimental observational data, among others. addition, novel...
The chemosensitivity of tumours to specific drugs can be predicted based on molecular quantities, such as gene expressions, miRNA and protein concentrations. This finding is important for improving drug efficacy personalizing use. In this paper, the authors present an analysis strategy that, compared prior work, retains more information in data may lead improved prediction. apply methods estimating GI50 value a (an indicator response drug), regression constructing predictive models value,...