- Genomics and Phylogenetic Studies
- Genetic diversity and population structure
- Advanced Clustering Algorithms Research
- Bioinformatics and Genomic Networks
- Data Management and Algorithms
- Computational Drug Discovery Methods
- Chromosomal and Genetic Variations
- Imbalanced Data Classification Techniques
- RNA and protein synthesis mechanisms
- AI in cancer detection
- Cell Image Analysis Techniques
- Gene expression and cancer classification
- Machine Learning and Data Classification
- Algorithms and Data Compression
- Evolution and Paleontology Studies
- Complex Network Analysis Techniques
- Statistical Methods in Clinical Trials
- Metabolomics and Mass Spectrometry Studies
- COVID-19 diagnosis using AI
- Reinforcement Learning in Robotics
- Plant and animal studies
- Data Mining Algorithms and Applications
- Artificial Intelligence in Healthcare
- Machine Learning in Bioinformatics
- Cervical Cancer and HPV Research
Université du Québec à Montréal
2015-2024
Mila - Quebec Artificial Intelligence Institute
2024
Salam University
2022
Université de Montréal
1999-2013
McGill University
2009-2012
McGill University and Génome Québec Innovation Centre
2007
Centre d'Analyse et de Mathématique Sociales
1998
Abstract Motivation Accurate detection of sequence similarity and homologous recombination are essential parts many evolutionary analyses. Results We have developed SimPlot++, an open-source multiplatform application implemented in Python, which can be used to produce publication quality plots using 63 nucleotide 20 amino acid distance models, detect intergenic intragenic events Φ, Max-χ2, NSS or proportion tests, generate analyze interactive networks. SimPlot++ supports multicore data...
In the quest for clean and efficient energy solutions, lithium-ion batteries have emerged at forefront of technological innovation. Accurate state-of-charge (SOC) estimation across a broad temperature range is essential extending battery longevity, enduring effective management overcharge over-discharge conditions. However, prevailing challenges persist in achieving precise SOC estimates generalizing wide range, particularly lower temperatures. Our comparative analysis reveals that, while...
Credit scoring (CS) is an effective and crucial approach used for risk management in banks other financial institutions. It provides appropriate guidance on granting loans reduces risks the area. Hence, companies are trying to use novel automated solutions deal with CS challenge protect their own finances customers. Nowadays, different machine learning (ML) data mining (DM) algorithms have been improve various aspects of prediction. In this paper, we introduce a methodology, named Deep...
Coronary artery disease (CAD) is one of the main causes cardiac death around world. Due to its significant impact on society, early and accurate detection CAD essential. This study proposes a novel nested ensemble nu-Support Vector Classification (NE-nu-SVC) model which combines several traditional machine learning methods techniques for effective diagnosis CAD. We validated our using two well-known datasets (Z-Alizadeh Sani Cleveland). To improve performance model, we selected clinically...
Abstract Background The SARS-CoV-2 pandemic is one of the greatest global medical and social challenges that have emerged in recent history. Human coronavirus strains discovered during previous SARS outbreaks been hypothesized to pass from bats humans using intermediate hosts, e.g. civets for SARS-CoV camels MERS-CoV. discovery an host identification specific mechanism its emergence are topics primary evolutionary importance. In this study we investigate patterns 11 main genes SARS-CoV-2....
This paper gives an experimentally supported review and comparison of several indices based on the conventional K-means inertia criterion for determining number clusters, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i> , in datasets, using popular Silhouette width index as a benchmark. Our experiments involve novel version Elbow index, defined values two or three steps apart. We also discuss alternative ways computing summarizing its...
Among the various forms of canonical analysis available in statistical literature, RDA (redundancy analysis) and CCA (canonical correspondence have become instruments choice for ecological research because they recognize different roles explanatory response data tables. Data table Y contains variables (e.g., species data) while X variables. is an extension multiple linear regression; it uses a model relationship between Y. In CCA, are chi-square transformed as initial step, but still assumed...
Multimodal emotion recognition is an emerging interdisciplinary field of research in the area affective computing and sentiment analysis. It aims at exploiting information carried by signals different nature to make systems more accurate. This achieved employing a powerful multimodal fusion method. In this study, hybrid data method proposed which audio visual modalities are fused using latent space linear map then, their projected features into cross-modal with textual modality...
Abstract Recent years have seen a steep rise in the number of skin cancer detection applications. While modern advances deep learning made possible reaching new heights terms classification accuracy, no publicly available software provide confidence estimates for these predictions. We present DUNEScan (Deep Uncertainty Estimation Skin Cancer), web server that performs an intuitive in-depth analysis uncertainty commonly used models based on convolutional neural networks (CNNs). allows users...
Abstract Motivation: High-throughput screening (HTS) is an early-stage process in drug discovery which allows thousands of chemical compounds to be tested a single study. We report method for correcting HTS data prior the hit selection (i.e. active compounds). The proposed correction minimizes impact systematic errors may affect HTS. introduced method, called well correction, proceeds by distribution measurements within wells given assay. use simulated and experimental illustrate advantages...
Horizontal gene transfer (HGT) is one of the main mechanisms driving evolution microorganisms. Its accurate identification major challenges posed by reticulate evolution. In this article, we describe a new polynomial-time algorithm for inferring HGT events and compare 3 existing 1 tree comparison indices in context identification. The proposed can rely on different optimization criteria, including least squares (LS), Robinson Foulds (RF) distance, quartet distance (QD), bipartition...
Off-target predictions are crucial in gene editing research. Recently, significant progress has been made the field of prediction off-target mutations, particularly with CRISPR-Cas9 data, thanks to use deep learning. is a technique which allows manipulation DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for neural networks, however, strong impact on accuracy. We propose novel sequences that aggregates data no loss information.In our experiments, we compare proposed...
Nowadays, k-means remains arguably the most popular clustering algorithm [1], [2]. Two of its main properties are simplicity and speed in practice. Here, our claim is that average number iterations takes to converge (τ¯) fact very informative. We find this be particularly interesting because τ¯ always known when applying but has never been, knowledge, used data analysis process. By experimenting with Gaussian clusters, we show related structure a set under study. Data sets containing...
Understanding the evolution of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) and its relationship to other coronaviruses in wild is crucial for preventing future virus outbreaks. While origin SARS-CoV-2 pandemic remains uncertain, mounting evidence suggests direct involvement bat pangolin genome. To unravel early days a probable zoonotic spillover event, we analyzed genomic data from various coronavirus strains both human hosts. Bayesian phylogenetic analysis was performed using...
A reticulogram is a general network capable of representing reticulate evolutionary structure. It particularly useful for portraying relationships among organisms that may be related in nonunique way to their common ancestor—relationships cannot represented by dendrogram or phylogenetic tree. We propose new method constructing reticulograms represent given distance matrix. Reticulate evolution applies first problems; it has been found nature, example, the within-species microevolution...