Douglas Teodoro

ORCID: 0000-0001-6238-4503
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Biomedical Text Mining and Ontologies
  • Topic Modeling
  • Natural Language Processing Techniques
  • Semantic Web and Ontologies
  • Data Quality and Management
  • Misinformation and Its Impacts
  • Advanced Text Analysis Techniques
  • Machine Learning in Healthcare
  • Electronic Health Records Systems
  • Machine Learning in Materials Science
  • Computational Drug Discovery Methods
  • Statistical Methods in Clinical Trials
  • Service-Oriented Architecture and Web Services
  • COVID-19 diagnosis using AI
  • Artificial Intelligence in Healthcare and Education
  • Scientific Computing and Data Management
  • Health, Environment, Cognitive Aging
  • Text and Document Classification Technologies
  • Advanced Graph Neural Networks
  • Web Data Mining and Analysis
  • Clinical practice guidelines implementation
  • Meta-analysis and systematic reviews
  • Advanced Database Systems and Queries
  • Radiomics and Machine Learning in Medical Imaging
  • Health Literacy and Information Accessibility

University of Geneva
2009-2025

SIB Swiss Institute of Bioinformatics
2017-2023

HES-SO University of Applied Sciences and Arts Western Switzerland
2019-2023

HES-SO Genève
2017-2022

HES-SO Arc
2021

Defence Scientific Information & Documentation Centre
2021

Universidade do Estado do Rio de Janeiro
2008-2018

University Hospital of Geneva
2011-2013

Geneva College
2012

Universidade Federal de Itajubá
2004

Elisa Terumi Rubel Schneider, João Vitor Andrioli de Souza, Julien Knafou, Lucas Emanuel Silva e Oliveira, Jenny Copara, Yohan Bonescki Gumiel, Ferro Antunes Emerson Cabrera Paraiso, Douglas Teodoro, Cláudia Maria Cabral Moro Barra. Proceedings of the 3rd Clinical Natural Language Processing Workshop. 2020.

10.18653/v1/2020.clinicalnlp-1.7 article EN cc-by 2020-01-01

Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with set of rules created by expert information specialists.

10.1186/s13643-022-02045-9 article EN cc-by Systematic Reviews 2022-08-17

The prediction of chemical reaction pathways has been accelerated by the development novel machine learning architectures based on deep paradigm. In this context, neural networks initially designed for language translation have used to accurately predict a wide range reactions. Among models suited task translation, recently introduced molecular transformer reached impressive performance in terms forward-synthesis and retrosynthesis predictions. study, we first present an analysis product,...

10.1021/acs.jcim.2c01407 article EN cc-by Journal of Chemical Information and Modeling 2023-03-23

Widespread misinformation in web resources can lead to serious implications for individuals seeking health advice. Despite that, information retrieval models are often focused only on the query-document relevance dimension rank results.

10.2196/42630 article EN cc-by JMIR AI 2024-01-15

Abstract Due to the complexity of biomedical domain, ability capture semantically meaningful representations terms in context is a long-standing challenge. Despite important progress past years, no evaluation benchmark has been developed evaluate how well language models represent concepts according their corresponding context. Inspired by Word-in-Context (WiC) benchmark, which word sense disambiguation reformulated as binary classification task, we propose novel dataset, BioWiC, encode...

10.1038/s41597-024-03317-w article EN cc-by Scientific Data 2024-05-04

Background: Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial surveillance systems was identified as one the causes increasing resistance, due to lag time between new resistances alerts care providers. Several initiatives track drug evolution have been developed. However, no effective real-time source-independent monitoring system available publicly. Objective: To design implement an architecture that...

10.2196/jmir.2043 article EN cc-by Journal of Medical Internet Research 2012-05-29

Abstract Interoperability in health information systems is crucial for accurate data exchange across environments such as electronic records, clinical notes, and medical research. The main challenge arises from the wide variation biomedical concepts, their representation different languages, limited context, complicating integration standardization. Inspired by recent advances large language models (LLMs), this study explores potential role knowledge engineers to (semi-)automate multilingual...

10.1101/2025.01.15.25320579 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-01-15

Over the past few years, discriminative and generative large language models (LLMs) have emerged as predominant approaches in natural processing. However, despite significant advancements, there remains a gap comparing performance of LLMs cross-lingual biomedical concept normalization. In this paper, we perform comparative study across several on challenging task normalization via dense retrieval. We utilize XL-BEL dataset covering 10 languages to evaluate model's capacity generalize various...

10.1101/2025.02.27.25323007 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-02-27

Equitable distribution of physicians across specialties is a significant public health challenge. While previous studies primarily relied on classic statistics models to estimate factors affecting medical students' career choices, this study explores the use machine learning techniques predict decisions early in their studies. We evaluated various supervised models, including support vector machines, artificial neural networks, extreme gradient boosting (XGBoost), and CatBoost using data...

10.1101/2025.03.06.25323485 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-03-11

Abstract Adverse drug events (ADEs) are a major safety issue in clinical trials. Thus, predicting ADEs is key to developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, dataset for multilabel ADE prediction monopharmacy treatments. CT-ADE encompasses 2,497 drugs 168,984 drug-ADE pairs from trial results, annotated using the MedDRA ontology. Unlike existing resources, integrates treatment target population data, enabling comparative analyses...

10.1038/s41597-025-04718-1 article EN cc-by Scientific Data 2025-03-11

This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the persistence mechanisms for systems that use multilevel modelling approaches, especially when focus is queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based openEHR reference model. Six...

10.1371/journal.pone.0150069 article EN cc-by PLoS ONE 2016-03-09

Success rate of clinical trials (CTs) is low, with the protocol design itself being considered a major risk factor. We aimed to investigate use deep learning methods predict CTs based on their protocols. Considering changes and final status, retrospective assignment method was proposed label according medium, high levels. Then, transformer graph neural networks were designed combined in an ensemble model learn infer ternary categories. The achieved robust performance (area under receiving...

10.1016/j.patter.2023.100689 article EN cc-by-nc-nd Patterns 2023-02-10

Large language models (LLMs) have the potential to enhance verification of health claims. However, issues with hallucination and comprehension logical statements require these be closely scrutinized in healthcare applications. We introduce CliniFact, a scientific claim dataset created from hypothesis testing results clinical research, covering 992 unique interventions for 22 disease categories. The used study arms interventions, primary outcome measures, trials derive label research These...

10.1038/s41597-025-04417-x article EN cc-by Scientific Data 2025-01-16

Abstract Recently, machine learning methods have emerged to predict dental disease progression, often relying on costly annotated datasets and frequently exhibiting low generalization performance. This study evaluates the application of Siamese networks for detecting subtle changes in longitudinal x-rays predicting time span category between treatments using periapical radiographs patient demographic data. We assume that ability these models detect intervals would ensure their capability...

10.1101/2025.03.04.25323305 preprint EN cc-by-nc medRxiv (Cold Spring Harbor Laboratory) 2025-03-04

Equitable distribution of physicians across specialties is a significant public health challenge. While previous studies primarily relied on classic statistics models to estimate factors affecting medical students' career choices, this study explores the use machine learning techniques predict decisions early in their studies. We evaluated various supervised models, including support vector machines, artificial neural networks, extreme gradient boosting (XGBoost), and CatBoost using data...

10.3233/shti250543 article EN Studies in health technology and informatics 2025-05-15

Recently, machine learning methods have emerged to predict dental disease progression, often relying on costly annotated datasets and frequently exhibiting low generalization performance. This study evaluates the application of Siamese networks for detecting subtle changes in longitudinal x-rays predicting time span category between treatments using periapical radiographs patient demographic data. We assume that ability these models detect intervals would ensure their capability identify...

10.3233/shti250636 article EN Studies in health technology and informatics 2025-05-15

Over the past few years, discriminative and generative large language models (LLMs) have emerged as predominant approaches in natural processing. However, despite significant advancements, there remains a gap comparing performance of LLMs cross-lingual biomedical concept normalization. In this paper, we perform comparative study across several on challenging task normalization via dense retrieval. We utilize XL-BEL dataset covering 10 languages to evaluate model's capacity generalize various...

10.3233/shti250467 article EN Studies in health technology and informatics 2025-05-15

Adverse drug event (ADE) detection in social media texts poses significant challenges due to the informal nature of text and limited availability annotations. The scarcity ADE named entity recognition (NER) datasets for hinders development robust models this type corpus. In paper, we leveraged generative capabilities large language (LLMs) create synthetic data, addressing dataset gap. Specifically, generated 17,000 tweets with annotations pre-trained NER on data. Our evaluations an...

10.3233/shti250465 article EN Studies in health technology and informatics 2025-05-15

Hospital-acquired infections (HAIs), particularly those caused by multidrugresistant (MDR) bacteria, pose significant risks to vulnerable patients. Accurate predictive models are important for assessing infection dynamics and informing prediction control (IPC) programmes. Graph-based methods, including graph neural networks (GNNs), offer a powerful approach model complex relationships between patients environments but often struggle with data sparsity, irregularity, heterogeneity. We propose...

10.1101/2025.05.27.25327491 preprint EN cc-by 2025-05-28
Coming Soon ...