- Biomedical Text Mining and Ontologies
- Semantic Web and Ontologies
- Bioinformatics and Genomic Networks
- Natural Language Processing Techniques
- Topic Modeling
- Gene expression and cancer classification
- Genetics, Bioinformatics, and Biomedical Research
- Scientific Computing and Data Management
- Machine Learning in Bioinformatics
- Genomics and Phylogenetic Studies
- Genomics and Rare Diseases
- Computational Drug Discovery Methods
- Advanced Text Analysis Techniques
- linguistics and terminology studies
- Research Data Management Practices
- Advanced Database Systems and Queries
- Radiomics and Machine Learning in Medical Imaging
- Service-Oriented Architecture and Web Services
- Data-Driven Disease Surveillance
- Artificial Intelligence in Healthcare and Education
- Machine Learning in Healthcare
- COVID-19 diagnosis using AI
- Data Quality and Management
- Cell Image Analysis Techniques
- Web Data Mining and Analysis
ZB MED - Information Centre for Life Sciences
2018-2024
University of Cologne
2019-2024
National Student Clearinghouse Research Center
2021
Vrije Universiteit Amsterdam
2020
Ollscoil na Gaillimhe – University of Galway
2015-2019
University of Zurich
2012-2015
European Bioinformatics Institute
2005-2014
Wellcome Trust
2005-2014
University of Lisbon
2013
Institute for Systems Engineering and Computers
2013
In this paper <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> , we proposed an explainable deep neural networks (DNN)-based method for automatic detection of COVID-19 symptoms from chest radiography (CXR) images, which call 'DeepCOVIDExplainer'. We used 15,959 CXR images 15,854 patients, covering normal, pneumonia, and cases. are first comprehensively preprocessed augmented before classifying with a ensemble method, followed by...
Abstract Motivation: Text-mining (TM) solutions are developing into efficient services to researchers in the biomedical research community. Such have scale with growing number and size of resources (e.g. available controlled vocabularies), amount literature be processed about 17 million documents PubMed) demands user community different methods for fact extraction). These motivated development a server-based solution analysis. Whatizit is suite modules that analyse text contained...
Abstract Summary: To allow efficient and systematic retrieval of statements from Medline we have developed EBIMed, a service that combines document with co-occurrence-based analysis abstracts. Upon keyword query, EBIMed retrieves the abstracts EMBL-EBI's installation filters for sentences contain biomedical terminology maintained in public bioinformatics resources. The extracted are used to generate an overview table on proteins, Gene Ontology (GO) annotations, drugs species same biological...
Abstract Motivation: Scholarly biomedical publications report on the findings of a research investigation. Scientists use well-established discourse structure to relate their work state art, express own motivation and hypotheses methods, results conclusions. In previous work, we have proposed ways explicitly annotate scientific investigations in scholarly publications. Here present means facilitate automatic access articles by automating recognition 11 categories at sentence level, which...
Knee osteoarthritis (KOA) is a disease that impairs knee function and causes pain. A radiologist reviews X-ray images grades the severity level of impairments according to Kellgren Lawrence grading scheme; five-point ordinal scale (0--4). In this study, we used Elastic Net (EN) Random Forests (RF) build predictive models using patient assessment data (i.e. signs symptoms both knees medication use) convolution neural network (CNN) trained only. Linear mixed effect (LMM) were model within...
Biological databases offer access to formalized facts about many aspects of biology—genes and gene products, protein structure, metabolic pathways, diseases, organisms, so on. These are becoming increasingly important researchers. The information that populates is generated by research teams usually published in peer-reviewed journals. As part the publication process, some authors deposit data into a database but, more often, it extracted from literature deposited human curators, painstaking...
In recent years, the recognition of semantic types from biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) ontology terms (GO terms). Other diseases have not received same level attention. Different solutions proposed to identify disease in literature. While matching terminology with language patterns suffers low recall (e.g., Whatizit) other make use morpho-syntactic features better cover full scope terminological variability MetaMap)....
The CALBC initiative aims to provide a large-scale biomedical text corpus that contains semantic annotations for named entities of different kinds. generation this requires the from automatic annotation systems be harmonized. In first phase, five participants (EMBL-EBI, EMC Rotterdam, NLM, JULIE Lab Jena, and Linguamatics) were gathered. All delivered in common format included concept identifiers boundary assignments enabled comparison alignment results. During harmonization results produced...
Abstract This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, scientific community help with this process, what next steps are, and role future BioCreative evaluations play. The responses identify several broad themes, including possibility of fusing literature databases through mining; need for user interfaces tailored different classes users supporting community-based annotation; importance scaling technology...
Abstract Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and Gene Ontology (GO) provide an efficient way of accessing organizing biomedical information by reducing ambiguity inherent to free-text data. Different methods automating assignment MeSH concepts have been proposed replace manual annotation, but they are either limited a small subset or only compared with number other systems. Results: We compare performance six classification systems...
Networks of molecular interactions explain complex biological processes, and all known information on events is contained in a number public repositories including the scientific literature. Metabolic signalling pathways are often viewed separately, even though both types composed involving proteins other chemical entities.
Annual suicide figures are critical in identifying trends and guiding research, yet challenges arising from significant lags reporting can delay complicate real-time interventions. In this paper, we utilized Google Trends search volumes for behavioral forecasting of national rates Ireland between 2004 2015. Official recorded by the Central Statistics Office Ireland. While similar investigations using data have been carried out other jurisdictions (e.g., United Kingdom, Stated America), such...
Abstract Motivation: Biological literature contains many abbreviations with one particular sense in each document. However, most do not have a unique across the literature. Furthermore, documents contain long forms of abbreviations. Resolving an abbreviation document consists retrieving its use. Abbreviation resolution improves accuracy retrieval engines and information extraction systems. Results: We combine automatic analysis Medline abstracts linguistic methods to build dictionary...
Competitions in text mining have been used to measure the performance of automatic processing solutions against a manually annotated gold standard corpus (GSC). The preparation GSC is time-consuming and costly final consists at most few thousand documents with limited set semantic groups. To overcome these shortcomings, CALBC project partners (PPs) produced large-scale biomedical four different groups through harmonisation annotations from solutions, first version Silver Standard Corpus...
Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This a necessary step towards unified access to biological sets, but this still requires solutions query multiple endpoints for their heterogeneous eventually retrieve all the meaningful information. Suggested are based on federation approaches, which require submission of SPARQL queries endpoints. Due size complexity these have be...
Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application numerous areas such as discovery of disease genes drug targets, phylogenetics pharmacogenomics. Phenotypes, defined observable characteristics organisms, can be seen one bridges that lead a translation experimental findings into applications thereby support 'bench bedside' efforts. However, build this translational bridge, common universal understanding phenotypes is required goes...
Amid the coronavirus disease(COVID-19) pandemic, humanity experiences a rapid increase in infection numbers across world. Challenge hospitals are faced with, fight against virus, is effective screening of incoming patients. One methodology assessment chest radiography(CXR) images, which usually requires expert radiologist's knowledge. In this paper, we propose an explainable deep neural networks(DNN)-based method for automatic detection COVID-19 symptoms from CXR call DeepCOVIDExplainer. We...
Osteoarthritis (OA) is a degenerative joint disease, which significantly affects middle-aged and elderly people. Although primarily identified via hyaline cartilage change based on medical images, technical bottlenecks like noise, artifacts, modality impose an enormous challenge high-precision, objective, efficient early quantification of OA. Owing to recent advancements, approaches neural networks (DNNs) have shown outstanding success in this application domain. However, due nested...