Rabie Saidi
- Machine Learning in Bioinformatics
- Bioinformatics and Genomic Networks
- Genomics and Phylogenetic Studies
- Biomedical Text Mining and Ontologies
- Data Mining Algorithms and Applications
- Gene expression and cancer classification
- Protein Structure and Dynamics
- Scientific Computing and Data Management
- Enzyme Structure and Function
- Advanced Proteomics Techniques and Applications
- Genetics, Bioinformatics, and Biomedical Research
- Rough Sets and Fuzzy Logic
- Computational Drug Discovery Methods
- Molecular Biology Techniques and Applications
- Research Data Management Practices
- Data Management and Algorithms
- Microbial Natural Products and Biosynthesis
- Metabolomics and Mass Spectrometry Studies
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- RNA and protein synthesis mechanisms
- Microbial Metabolic Engineering and Bioproduction
- Geological and Geochemical Analysis
- Geological and Geophysical Studies Worldwide
- Graph Theory and Algorithms
European Bioinformatics Institute
2014-2024
SIB Swiss Institute of Bioinformatics
2024
Universitat Politècnica de Catalunya
2020
Institut Supérieur d’Informatique, de Modélisation et de leurs Applications
2014
University of Sfax
2014
Centre National de la Recherche Scientifique
2007-2012
Clermont Université
2012
Université Clermont Auvergne
2012
Laboratoire d'Informatique, de Modélisation et d'Optimisation des Systèmes
2011-2012
Institut Pascal
2010
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this article, we describe significant updates that have made over last two years resource. number in UniProtKB has risen approximately 190 million, despite continued work reduce sequence redundancy at proteome level. We adopted new methods assessing completeness quality. continue extract detailed annotations from...
Abstract The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set protein sequences annotated functional information. In this publication we describe enhancements made our data processing pipeline website adapt an ever-increasing information content. number in UniProtKB has risen over 227 million are working towards including reference proteome for each taxonomic group. We continue extract detailed annotations from literature...
Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation protein function. Results Here, we report on results third CAFA challenge, CAFA3, that featured expanded analysis over previous rounds, both in terms volume data analyzed types performed. In a novel major new development, predictions assessment goals drove some experimental assays, resulting functional annotations for...
Abstract Motivation To provide high quality, computationally tractable annotation of binding sites for biologically relevant (cognate) ligands in UniProtKB using the chemical ontology ChEBI (Chemical Entities Biological Interest), to better support efforts study and predict functionally interactions between protein sequences structures small molecule ligands. Results We structured data model cognate ligand site annotations performed a complete reannotation all stable unique identifiers from...
Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora databases tools deployed, technically complex diverse implementations, across spectrum disciplines. The corpus documentation resources is fragmented Web, with much redundancy, has lacked common standard information. outcome scientists must often struggle find, understand, compare use best for...
Abstract Motivation The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result genome sequencing and prediction protein-coding genes. Providing functional annotation for these proteins presents significant continuing challenge. Results In response this challenge, has developed method annotation, known UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) by members...
Abstract Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome limitations rely solely upon sequence similarity are attracting increased attention. One these novel is use organization structural domains proteins. Results: We propose a method for automatic annotation protein sequences UniProt Knowledgebase (UniProtKB) by comparing their domain...
Abstract Background This paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to address that task. It has been largely used encode biological into feature vectors enable using well-known machine-learning classifiers which require this format. However, designing a suitable space, set proteins, not trivial For purpose, we propose novel encoding method uses amino-acid substitution matrices define similarity between motifs during...
Abstract The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation protein function. Here we report on results third CAFA challenge, CAFA3, that featured expanded analysis over previous rounds, both in terms volume data analyzed types performed. In a novel major new development, predictions assessment goals drove some experimental assays, resulting functional annotations for more than 1000...
Abstract The use of raw amino acid sequences as input for deep learning models protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while require same-shape input. To accomplish this, zeros are usually added each sequence up a established common length process called zero-padding. However, the effect padding strategies on model performance and data structure is yet unknown. We propose implement four novel types...
Abstract Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel effective treatment approaches against diseases. However, different layers the are produced using technologies scattered across individual computational resources without any explicit connections to each other, which hinders extensive integrative multi-omics-based analysis. We aimed address this issue by a new integration/representation...
The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used analysis and comprehension huge amounts However, number generated is too large be efficiently analyzed explored in any further process. order bypass hamper, efficient selection has performed. Since necessarily based on evaluation, many interestingness measures have been proposed. abundance these gave rise a new...
The huge number of association rules represent the main hamper that a decision maker faces. In order to bypass this hamper, an efficient selection has be performed. Since is necessarily based on evaluation, many interestingness measures have been proposed. However, abundance these gave rise new problem, namely heterogeneity evaluation results and created confusion decision. respect, we propose novel approach discover interesting without favoring or excluding any measure by adopting notion...
Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they expensive, time-consuming cannot cope with exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill gap automatic function prediction. The results last Critical Assessment Function Annotation challenge revealed that GO-terms prediction remains a very...
The cherty rocks of the Chouabine Formation Gafsa-Metlaoui basin (south-western Tunisia), that is composed by biogenic silica, are treated using thermal treatment at 1000°C with flux calcination method in order to prepare a specific filter aids melting sulfur used for production sulfuric acid. This work presents effect heating on granulometry chert. mineralogical composition natural starting chert opal CT (cristobalite/tridymite) and mineral mixture quartz, smectite clay minerals,...
One of the most powerful techniques to study proteins is look for recurrent fragments (also called substructures), then use them as patterns characterize under study. Although protein sequences have been extensively studied in literature, studying three-dimensional (3D) structures can reveal relevant structural and functional information that may not be derived from alone. An emergent trend consists parsing 3D into graphs amino acids. Hence, search substructures formulated a process frequent...
Abstract Recent advances in computing power and machine learning empower functional annotation of protein sequences their transcript variations. Here, we present an automated prediction system UniGOPred, for GO annotations a database term predictions proteomes several organisms UniProt Knowledgebase (UniProtKB). UniGOPred provides function 514 molecular (MF), 2909 biological process (BP), 438 cellular component (CC) terms each sequence. covers nearly the whole functionality spectrum Gene...
The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation is expected meet conflicting requirements maximizing coverage, while minimizing erroneous assignments. This trade-off imposes a great challenge in designing intelligent systems tackle problem automatic protein annotation. In this work, we present system that utilizes rule mining techniques predict metabolic pathways...
Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This consists for example transforming sequences into vectors motifs where each motif a subsequence that can be seen as property (or attribute) characterizing sequence. Hence, we obtain object-property table objects are and properties extracted from output used to apply standard machine learning tools perform data mining tasks such classification. Several previous works have...
UniProt continues to support the ongoing process of making scientific data FAIR. Here we contribute this with a FAIRness assessment our UniProtKB dataset followed by critical reflection on challenges and future directions adoption validation FAIR principles metrics.
Recently, the principles of graph theory are being adopted to address molecular and chemical structures investigations such as 3D protein structure prediction spatial motifs discovery. Proteins have been parsed into graphs according several approaches methods then studied based on concepts data mining tools. In this paper we make a brief survey most used graph-based representations propose naïve method help with making since key step valuable process is build concise correct holding reliable...