- Biomedical Text Mining and Ontologies
- Topic Modeling
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Data Quality and Management
- Misinformation and Its Impacts
- Advanced Text Analysis Techniques
- Machine Learning in Healthcare
- Electronic Health Records Systems
- Machine Learning in Materials Science
- Computational Drug Discovery Methods
- Statistical Methods in Clinical Trials
- Service-Oriented Architecture and Web Services
- COVID-19 diagnosis using AI
- Artificial Intelligence in Healthcare and Education
- Scientific Computing and Data Management
- Health, Environment, Cognitive Aging
- Text and Document Classification Technologies
- Advanced Graph Neural Networks
- Web Data Mining and Analysis
- Clinical practice guidelines implementation
- Meta-analysis and systematic reviews
- Advanced Database Systems and Queries
- Radiomics and Machine Learning in Medical Imaging
- Health Literacy and Information Accessibility
University of Geneva
2009-2025
SIB Swiss Institute of Bioinformatics
2017-2023
HES-SO University of Applied Sciences and Arts Western Switzerland
2019-2023
HES-SO Genève
2017-2022
HES-SO Arc
2021
Defence Scientific Information & Documentation Centre
2021
Universidade do Estado do Rio de Janeiro
2008-2018
University Hospital of Geneva
2011-2013
Geneva College
2012
Universidade Federal de Itajubá
2004
Elisa Terumi Rubel Schneider, João Vitor Andrioli de Souza, Julien Knafou, Lucas Emanuel Silva e Oliveira, Jenny Copara, Yohan Bonescki Gumiel, Ferro Antunes Emerson Cabrera Paraiso, Douglas Teodoro, Cláudia Maria Cabral Moro Barra. Proceedings of the 3rd Clinical Natural Language Processing Workshop. 2020.
Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with set of rules created by expert information specialists.
The prediction of chemical reaction pathways has been accelerated by the development novel machine learning architectures based on deep paradigm. In this context, neural networks initially designed for language translation have used to accurately predict a wide range reactions. Among models suited task translation, recently introduced molecular transformer reached impressive performance in terms forward-synthesis and retrosynthesis predictions. study, we first present an analysis product,...
Widespread misinformation in web resources can lead to serious implications for individuals seeking health advice. Despite that, information retrieval models are often focused only on the query-document relevance dimension rank results.
Abstract Due to the complexity of biomedical domain, ability capture semantically meaningful representations terms in context is a long-standing challenge. Despite important progress past years, no evaluation benchmark has been developed evaluate how well language models represent concepts according their corresponding context. Inspired by Word-in-Context (WiC) benchmark, which word sense disambiguation reformulated as binary classification task, we propose novel dataset, BioWiC, encode...
Background: Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial surveillance systems was identified as one the causes increasing resistance, due to lag time between new resistances alerts care providers. Several initiatives track drug evolution have been developed. However, no effective real-time source-independent monitoring system available publicly. Objective: To design implement an architecture that...
Abstract Interoperability in health information systems is crucial for accurate data exchange across environments such as electronic records, clinical notes, and medical research. The main challenge arises from the wide variation biomedical concepts, their representation different languages, limited context, complicating integration standardization. Inspired by recent advances large language models (LLMs), this study explores potential role knowledge engineers to (semi-)automate multilingual...
Over the past few years, discriminative and generative large language models (LLMs) have emerged as predominant approaches in natural processing. However, despite significant advancements, there remains a gap comparing performance of LLMs cross-lingual biomedical concept normalization. In this paper, we perform comparative study across several on challenging task normalization via dense retrieval. We utilize XL-BEL dataset covering 10 languages to evaluate model's capacity generalize various...
Equitable distribution of physicians across specialties is a significant public health challenge. While previous studies primarily relied on classic statistics models to estimate factors affecting medical students' career choices, this study explores the use machine learning techniques predict decisions early in their studies. We evaluated various supervised models, including support vector machines, artificial neural networks, extreme gradient boosting (XGBoost), and CatBoost using data...
Abstract Adverse drug events (ADEs) are a major safety issue in clinical trials. Thus, predicting ADEs is key to developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, dataset for multilabel ADE prediction monopharmacy treatments. CT-ADE encompasses 2,497 drugs 168,984 drug-ADE pairs from trial results, annotated using the MedDRA ontology. Unlike existing resources, integrates treatment target population data, enabling comparative analyses...
This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the persistence mechanisms for systems that use multilevel modelling approaches, especially when focus is queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based openEHR reference model. Six...
Success rate of clinical trials (CTs) is low, with the protocol design itself being considered a major risk factor. We aimed to investigate use deep learning methods predict CTs based on their protocols. Considering changes and final status, retrospective assignment method was proposed label according medium, high levels. Then, transformer graph neural networks were designed combined in an ensemble model learn infer ternary categories. The achieved robust performance (area under receiving...
Large language models (LLMs) have the potential to enhance verification of health claims. However, issues with hallucination and comprehension logical statements require these be closely scrutinized in healthcare applications. We introduce CliniFact, a scientific claim dataset created from hypothesis testing results clinical research, covering 992 unique interventions for 22 disease categories. The used study arms interventions, primary outcome measures, trials derive label research These...
Abstract Recently, machine learning methods have emerged to predict dental disease progression, often relying on costly annotated datasets and frequently exhibiting low generalization performance. This study evaluates the application of Siamese networks for detecting subtle changes in longitudinal x-rays predicting time span category between treatments using periapical radiographs patient demographic data. We assume that ability these models detect intervals would ensure their capability...
Equitable distribution of physicians across specialties is a significant public health challenge. While previous studies primarily relied on classic statistics models to estimate factors affecting medical students' career choices, this study explores the use machine learning techniques predict decisions early in their studies. We evaluated various supervised models, including support vector machines, artificial neural networks, extreme gradient boosting (XGBoost), and CatBoost using data...
Recently, machine learning methods have emerged to predict dental disease progression, often relying on costly annotated datasets and frequently exhibiting low generalization performance. This study evaluates the application of Siamese networks for detecting subtle changes in longitudinal x-rays predicting time span category between treatments using periapical radiographs patient demographic data. We assume that ability these models detect intervals would ensure their capability identify...
Over the past few years, discriminative and generative large language models (LLMs) have emerged as predominant approaches in natural processing. However, despite significant advancements, there remains a gap comparing performance of LLMs cross-lingual biomedical concept normalization. In this paper, we perform comparative study across several on challenging task normalization via dense retrieval. We utilize XL-BEL dataset covering 10 languages to evaluate model's capacity generalize various...
Adverse drug event (ADE) detection in social media texts poses significant challenges due to the informal nature of text and limited availability annotations. The scarcity ADE named entity recognition (NER) datasets for hinders development robust models this type corpus. In paper, we leveraged generative capabilities large language (LLMs) create synthetic data, addressing dataset gap. Specifically, generated 17,000 tweets with annotations pre-trained NER on data. Our evaluations an...
Hospital-acquired infections (HAIs), particularly those caused by multidrugresistant (MDR) bacteria, pose significant risks to vulnerable patients. Accurate predictive models are important for assessing infection dynamics and informing prediction control (IPC) programmes. Graph-based methods, including graph neural networks (GNNs), offer a powerful approach model complex relationships between patients environments but often struggle with data sparsity, irregularity, heterogeneity. We propose...