NFDI4DS | UHH-SEMS - Publication Details

BioBERTpt - A Portuguese Neural Language Model for Clinical Named Entity Recognition

OPENALEX - Publications

Elisa Terumi Rubel Schneider João Vitor Andrioli de Souza Julien Knafou Lucas Emanuel Silva e Oliveira Jenny Copara and 5 more

Elisa Terumi Rubel Schneider, João Vitor Andrioli de Souza, Julien Knafou, Lucas Emanuel Silva e Oliveira, Jenny Copara, Yohan Bonescki Gumiel, Ferro Antunes Emerson Cabrera Paraiso, Douglas Teodoro, Cláudia Maria Cabral Moro Barra. Proceedings of the 3rd Clinical Natural Language Processing Workshop. 2020.

10.18653/v1/2020.clinicalnlp-1.7 article EN cc-by 2020-01-01

Reducing systematic review burden using Deduklick: a novel, automated, reliable, and explainable deduplication algorithm to foster medical research

OPENALEX - Publications

Nikolay Borissov Quentin Haas Beatrice Minder Doris Kopp‐Heim Marc von Gernler and 3 more

Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with set of rules created by expert information specialists.

10.1186/s13643-022-02045-9 article EN cc-by Systematic Reviews 2022-08-17

Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios

OPENALEX - Publications

Fernando Jaume-Santero Alban Bornet Alain Valery Nona Naderi David Vicente Alvarez and 5 more

The prediction of chemical reaction pathways has been accelerated by the development novel machine learning architectures based on deep paradigm. In this context, neural networks initially designed for language translation have used to accurately predict a wide range reactions. Among models suited task translation, recently introduced molecular transformer reached impressive performance in terms forward-synthesis and retrosynthesis predictions. study, we first present an analysis product,...

10.1021/acs.jcim.2c01407 article EN cc-by Journal of Chemical Information and Modeling 2023-03-23

Online health search via multi-dimensional information quality assessment based on deep language models (Preprint)

OPENALEX - Publications

Boya Zhang Nona Naderi Rahul Mishra Douglas Teodoro

Widespread misinformation in web resources can lead to serious implications for individuals seeking health advice. Despite that, information retrieval models are often focused only on the query-document relevance dimension rank results.

10.2196/42630 article EN cc-by JMIR AI 2024-01-15

A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models

OPENALEX - Publications

Hossein Rouhizadeh Irina Nikishinа Anthony Yazdani Alban Bornet Boya Zhang and 4 more

Abstract Due to the complexity of biomedical domain, ability capture semantically meaningful representations terms in context is a long-standing challenge. Despite important progress past years, no evaluation benchmark has been developed evaluate how well language models represent concepts according their corresponding context. Inspired by Word-in-Context (WiC) benchmark, which word sense disambiguation reformulated as binary classification task, we propose novel dataset, BioWiC, encode...

10.1038/s41597-024-03317-w article EN cc-by Scientific Data 2024-05-04

Building a Transnational Biosurveillance Network Using Semantic Web Technologies: Requirements, Design, and Preliminary Evaluation

OPENALEX - Publications

Douglas Teodoro Emilie Pasche Julien Gobeill Stéphane Emonet Patrick Ruch and 1 more

Background: Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial surveillance systems was identified as one the causes increasing resistance, due to lag time between new resistances alerts care providers. Several initiatives track drug evolution have been developed. However, no effective real-time source-independent monitoring system available publicly. Objective: To design implement an architecture that...

10.2196/jmir.2043 article EN cc-by Journal of Medical Internet Research 2012-05-29

PRIMIS: Privacy-preserving medical image sharing via deep sparsifying transform learning with obfuscation

OPENALEX - Publications

Isaac Shiri Behrooz Razeghi Sohrab Ferdowsi Yazdan Salimi Denız Gündüz and 3 more

10.1016/j.jbi.2024.104583 article EN publisher-specific-oa Journal of Biomedical Informatics 2024-01-07

Large Language Models Struggle to Encode Medical Concepts - A Multilingual Benchmarking and Comparative Analysis

OPENALEX - Publications

Hossein Rouhizadeh Anthony Yazdani Boya Zhang David Vicente Alvarez Matthias Hüser and 5 more

Abstract Interoperability in health information systems is crucial for accurate data exchange across environments such as electronic records, clinical notes, and medical research. The main challenge arises from the wide variation biomedical concepts, their representation different languages, limited context, complicating integration standardization. Inspired by recent advances large language models (LLMs), this study explores potential role knowledge engineers to (semi-)automate multilingual...

10.1101/2025.01.15.25320579 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-01-15

Exploring Zero-Shot Cross-Lingual Biomedical Concept Normalization via Large Language Models

OPENALEX - Publications

Hossein Rouhizadeh Anthony Yazdani Boya Zhang Douglas Teodoro

Over the past few years, discriminative and generative large language models (LLMs) have emerged as predominant approaches in natural processing. However, despite significant advancements, there remains a gap comparing performance of LLMs cross-lingual biomedical concept normalization. In this paper, we perform comparative study across several on challenging task normalization via dense retrieval. We utilize XL-BEL dataset covering 10 languages to evaluate model's capacity generalize various...

10.1101/2025.02.27.25323007 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-02-27

Assessment of machine learning algorithms to predict medical specialty choice

OPENALEX - Publications

David Vicente Alvarez Milena Abbiati Alban Bornet Georges L. Savoldelli Nadia M. Bajwa and 1 more

Equitable distribution of physicians across specialties is a significant public health challenge. While previous studies primarily relied on classic statistics models to estimate factors affecting medical students' career choices, this study explores the use machine learning techniques predict decisions early in their studies. We evaluated various supervised models, including support vector machines, artificial neural networks, extreme gradient boosting (XGBoost), and CatBoost using data...

10.1101/2025.03.06.25323485 preprint EN cc-by medRxiv (Cold Spring Harbor Laboratory) 2025-03-11

An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results

OPENALEX - Publications

Anthony Yazdani Alban Bornet Philipp Khlebnikov Boya Zhang Hossein Rouhizadeh and 2 more

Abstract Adverse drug events (ADEs) are a major safety issue in clinical trials. Thus, predicting ADEs is key to developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, dataset for multilabel ADE prediction monopharmacy treatments. CT-ADE encompasses 2,497 drugs 168,984 drug-ADE pairs from trial results, annotated using the MedDRA ontology. Unlike existing resources, integrates treatment target population data, enabling comparative analyses...

10.1038/s41597-025-04718-1 article EN cc-by Scientific Data 2025-03-11

Comparing the Performance of NoSQL Approaches for Managing Archetype-Based Electronic Health Record Data

OPENALEX - Publications

Sérgio Miranda Freire Douglas Teodoro Fang Wei-Kleiner Erik Sundvall Daniel Karlsson and 1 more

This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the persistence mechanisms for systems that use multilevel modelling approaches, especially when focus is queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based openEHR reference model. Six...

10.1371/journal.pone.0150069 article EN cc-by PLoS ONE 2016-03-09

Deep learning-based risk prediction for interventional clinical trials based on protocol design: A retrospective study

OPENALEX - Publications

Sohrab Ferdowsi Julien Knafou Nikolay Borissov David Vicente Alvarez Rahul Mishra and 2 more

Success rate of clinical trials (CTs) is low, with the protocol design itself being considered a major risk factor. We aimed to investigate use deep learning methods predict CTs based on their protocols. Considering changes and final status, retrospective assignment method was proposed label according medium, high levels. Then, transformer graph neural networks were designed combined in an ensemble model learn infer ternary categories. The achieved robust performance (area under receiving...

10.1016/j.patter.2023.100689 article EN cc-by-nc-nd Patterns 2023-02-10

A dataset for evaluating clinical research claims in large language models

OPENALEX - Publications

Boya Zhang Alban Bornet Anthony Yazdani Philipp Khlebnikov Marija Milutinovic and 3 more

Large language models (LLMs) have the potential to enhance verification of health claims. However, issues with hallucination and comprehension logical statements require these be closely scrutinized in healthcare applications. We introduce CliniFact, a scientific claim dataset created from hypothesis testing results clinical research, covering 992 unique interventions for 22 disease categories. The used study arms interventions, primary outcome measures, trials derive label research These...

10.1038/s41597-025-04417-x article EN cc-by Scientific Data 2025-01-16

Assessment of Elapsed Time Between Dental Radiographs Using Siamese Network

OPENALEX - Publications

Marija Milutinovic René Daher Julian Leprince Douglas Teodoro

Abstract Recently, machine learning methods have emerged to predict dental disease progression, often relying on costly annotated datasets and frequently exhibiting low generalization performance. This study evaluates the application of Siamese networks for detecting subtle changes in longitudinal x-rays predicting time span category between treatments using periapical radiographs patient demographic data. We assume that ability these models detect intervals would ensure their capability...

10.1101/2025.03.04.25323305 preprint EN cc-by-nc medRxiv (Cold Spring Harbor Laboratory) 2025-03-04

Comparing neural language models for medical concept representation and patient trajectory prediction

OPENALEX - Publications

Alban Bornet Dimitrios Proios Anthony Yazdani Fernando Jaume-Santero Guy Haller and 2 more

10.1016/j.artmed.2025.103108 article EN cc-by Artificial Intelligence in Medicine 2025-03-10

Assessment of Machine Learning Algorithms to Predict Medical Specialty Choice

OPENALEX - Publications

David Vicente Alvarez Milena Abbiati Alban Bornet Georges L. Savoldelli Nadia M. Bajwa and 1 more

Equitable distribution of physicians across specialties is a significant public health challenge. While previous studies primarily relied on classic statistics models to estimate factors affecting medical students' career choices, this study explores the use machine learning techniques predict decisions early in their studies. We evaluated various supervised models, including support vector machines, artificial neural networks, extreme gradient boosting (XGBoost), and CatBoost using data...

10.3233/shti250543 article EN Studies in health technology and informatics 2025-05-15

Assessment of Elapsed Time Between Dental Radiographs Using Siamese Network

OPENALEX - Publications

Marija Milutinovic René Daher Julian Leprince Douglas Teodoro

Recently, machine learning methods have emerged to predict dental disease progression, often relying on costly annotated datasets and frequently exhibiting low generalization performance. This study evaluates the application of Siamese networks for detecting subtle changes in longitudinal x-rays predicting time span category between treatments using periapical radiographs patient demographic data. We assume that ability these models detect intervals would ensure their capability identify...

10.3233/shti250636 article EN Studies in health technology and informatics 2025-05-15

Exploring Zero-Shot Cross-Lingual Biomedical Concept Normalization via Large Language Models

OPENALEX - Publications

Hossein Rouhizadeh Anthony Yazdani Boya Zhang Douglas Teodoro

Over the past few years, discriminative and generative large language models (LLMs) have emerged as predominant approaches in natural processing. However, despite significant advancements, there remains a gap comparing performance of LLMs cross-lingual biomedical concept normalization. In this paper, we perform comparative study across several on challenging task normalization via dense retrieval. We utilize XL-BEL dataset covering 10 languages to evaluate model's capacity generalize various...

10.3233/shti250467 article EN Studies in health technology and informatics 2025-05-15

Leveraging Large Language Models for Synthetic Data Generation to Enhance Adverse Drug Event Detection in Tweets

OPENALEX - Publications

Anthony Yazdani Hossein Rouhizadeh Alban Bornet Douglas Teodoro

Adverse drug event (ADE) detection in social media texts poses significant challenges due to the informal nature of text and limited availability annotations. The scarcity ADE named entity recognition (NER) datasets for hinders development robust models this type corpus. In paper, we leveraged generative capabilities large language (LLMs) create synthetic data, addressing dataset gap. Specifically, generated 17,000 tweets with annotations pre-trained NER on data. Our evaluations an...

10.3233/shti250465 article EN Studies in health technology and informatics 2025-05-15

STM-GNN: Space-Time-and-Memory Graph Neural Networks for Predicting Multi-Drug Resistance Risks in Dynamic Patient Networks

OPENALEX - Publications

Damien Geissbuhler Alban Bornet Catarina Santos-Marques André Anjos Sónia Gonçalves Pereira and 1 more

Hospital-acquired infections (HAIs), particularly those caused by multidrugresistant (MDR) bacteria, pose significant risks to vulnerable patients. Accurate predictive models are important for assessing infection dynamics and informing prediction control (IPC) programmes. Graph-based methods, including graph neural networks (GNNs), offer a powerful approach model complex relationships between patients environments but often struggle with data sparsity, irregularity, heterogeneity. We propose...

10.1101/2025.05.27.25327491 preprint EN cc-by 2025-05-28