- Biomedical Text Mining and Ontologies
- Semantic Web and Ontologies
- Topic Modeling
- Bioinformatics and Genomic Networks
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Machine Learning in Bioinformatics
- Computational Drug Discovery Methods
- Autism Spectrum Disorder Research
- Recommender Systems and Techniques
- Genetics, Bioinformatics, and Biomedical Research
- Scientific Computing and Data Management
- Data Quality and Management
- Genomics and Rare Diseases
- Machine Learning in Healthcare
- Privacy-Preserving Technologies in Data
- Genomics and Phylogenetic Studies
- scientometrics and bibliometrics research
- Gene expression and cancer classification
- Machine Learning in Materials Science
- Service-Oriented Architecture and Web Services
- Advanced Graph Neural Networks
- Data-Driven Disease Surveillance
- Research Data Management Practices
- Cryptography and Data Security
University of Lisbon
2015-2024
Dalle Molle Institute for Artificial Intelligence Research
2022
East Stroudsburg University
2022
Brandeis University
2022
RMIT University
2022
Université d'Orléans
2022
Centre National de la Recherche Scientifique
2022
University of Zurich
2022
Mohamed bin Zayed University of Artificial Intelligence
2022
Universidade do Porto
2020
The automatic extraction of chemical information from text requires the recognition entity mentions as one its key steps. When developing supervised named (NER) systems, availability a large, manually annotated corpus is desirable. Furthermore, large corpora permit robust evaluation and comparison different approaches that detect chemicals in documents. We present CHEMDNER corpus, collection 10,000 PubMed abstracts contain total 84,355 labeled by expert chemistry literature curators,...
Several semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which the best approach in this context, since there no conclusive evaluation of various measures. Another issue, whether electronic annotations should or not be used calculations. We conducted systematic GO-based using relationship sequence as means quantify performance, and assessed influence by testing...
Biological databases offer access to formalized facts about many aspects of biology—genes and gene products, protein structure, metabolic pathways, diseases, organisms, so on. These are becoming increasingly important researchers. The information that populates is generated by research teams usually published in peer-reviewed journals. As part the publication process, some authors deposit data into a database but, more often, it extracted from literature deposited human curators, painstaking...
Identifying disease genes from a vast amount of genetic data is one the most challenging tasks in post-genomic era. Also, complex diseases present highly heterogeneous genotype, which difficult biological marker identification. Machine learning methods are widely used to identify these markers, but their performance dependent upon size and quality available data. In this study, we demonstrated that machine classifiers trained on gene functional similarities, using Gene Ontology (GO), can...
Concept recognition tools rely on the availability of textual corpora to assess their performance and enable identification areas for improvement. Typically, are developed specific purposes, such as gene name recognition. Gene protein longstanding goals biomedical text mining, therefore a number different exist. However, phenotypes only recently became an entity interest specialized concept systems, hardly any annotated is available testing training. Here, we present unique corpus, capturing...
Recent studies have proposed deep learning techniques, namely recurrent neural networks, to improve biomedical text mining tasks. However, these techniques rarely take advantage of existing domain-specific resources, such as ontologies. In Life and Health Sciences there is a vast valuable set resources publicly available, which are continuously being updated. Biomedical ontologies nowadays mainstream approach formalize knowledge about entities, genes, chemicals, phenotypes, disorders. These...
Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than sequence. In most databases, are already annotated with ontology terms. Previous studies identified a correlation between the sequence similarity and semantic of proteins. The was computed GO However, sharing do not necessarily have similar sequence.This paper introduces our study family similarity. Family overcomes some limitations similarity, thus we obtained strong...
The large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, entities described therein. A common approach, known as semantic similarity, compares through information content they share ontology. However, different disjunctive ancestors are frequently neglected, or not properly explored, measures.This paper proposes a novel method, dubbed DiShIn, that effectively...
Ontology Matching aims at identifying a set of semantic correspondences, called an alignment, between related ontologies. In recent years, there has been growing interest in efficient and effective matching methods for large However, alignments produced ontologies are often logically incoherent. It was only recently that the use repair techniques to improve coherence ontology began be explored. This paper presents novel modularization technique alignment which extracts fragments input...
Biomedical Relation Extraction (RE) systems identify and classify relations between biomedical entities to enhance our knowledge of biological medical processes. Most state-of-the-art use deep learning approaches, mainly target the same type, such as proteins or pharmacological substances. However, these are mostly restricted what they directly on text ignore specialized domain bases, ontologies, that formalize integrate information typically structured direct acyclic graphs. On other hand,...
With the increasing amount of data made available in chemical field, there is a strong need for systems capable comparing and classifying compounds an efficient effective way. The best approaches existing today are based on structure-activity relationship premise, which states that biological activity molecule strongly related to its structural or physicochemical properties. This work presents novel approach automatic classification by integrating semantic similarity with comparison methods....
Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining knowledge base with corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have been made available biological processes, while availability corpora still limited. We studied microRNA-gene relations from text. MicroRNA regulation...
Biomedical ontologies pose several challenges to ontology matching due both the complexity of biomedical domain and characteristics themselves. The tracks in Ontology Matching Evaluation Initiative (OAEI) have spurred development systems able tackle these challenges, benchmarked their general performance. In this study, we dissect strategies employed by gauge impact themselves on performance, using AgreementMakerLight (AML) system as platform for study. We demonstrate that linear hash-based...
The development of text mining systems that annotate biological entities with their properties using scientific literature is an important recent research topic. These need first to recognize the and in text, then decide which pairs represent valid annotations.This document introduces a novel unsupervised method for recognizing unstructured involving evidence content names.This shows results obtained by application our BioCreative tasks 2.1 2.2, where it identified Gene Ontology annotations...
Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation precise precious, but it time-consuming. Therefore, instead curated annotations most the come uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic proposed they do not satisfy high quality expectations curators.In this paper we describe an approach links to text extracted from literature. The selection based on...
There is a prominent trend to augment and improve the formality of biomedical ontologies. For example, this shown by current effort on adding description logic axioms, such as disjointness. One key ontology applications that can take advantage conceptual (functional) similarity measurement. The presence axioms in ontologies make structural or extensional approaches weaker further away from providing sound semantics-based measures. Although beneficial small ontologies, exploration measures...
Epidemiology is a data-intensive and multi-disciplinary subject, where data integration, curation sharing are becoming increasingly relevant, given its global context time constraints. The semantic annotation of epidemiology resources cornerstone to effectively support such activities. Although several ontologies cover some the subdomains epidemiology, we identified lack for epidemiology-specific terms. This paper addresses this need by proposing Ontology (EPO) describing integration with...