NFDI4DS | UHH-SEMS - Publication Details

Fabián Villena

ORCID: 0000-0002-8759-466X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5084525042

Research Areas

Natural Language Processing Techniques
Biomedical Text Mining and Ontologies
Topic Modeling
Migration and Labor Dynamics
Insurance, Mortality, Demography, Risk Management
demographic modeling and climate adaptation
Text Readability and Simplification
Data Quality and Management
Healthcare Systems and Technology
Artificial Intelligence in Healthcare
Artificial Intelligence in Healthcare and Education
Privacy-Preserving Technologies in Data
Data-Driven Disease Surveillance
Interpreting and Communication in Healthcare
Dental Radiography and Imaging
Radiomics and Machine Learning in Medical Imaging
Semantic Web and Ontologies
Advanced Computational Techniques and Applications
linguistics and terminology studies
Data Mining Algorithms and Applications
Authorship Attribution and Profiling
Medical Coding and Health Information
COVID-19 diagnosis using AI
Machine Learning in Healthcare
Healthcare Operations and Scheduling Optimization

University of Chile
2019-2025

Millennium Institute for Integrative Biology
2024-2025

Pontificia Universidad Católica de Chile
2023

Center for Mathematical Modeling
2021

Instituto de Neurociencia Biomédica
2021

Zimmer Biomet (Germany)
2020

Heidelberg University
2020

Predicting no-show appointments in a pediatric hospital in Chile using machine learning

OPENALEX - Publications

Jocelyn Dunstan Fabián Villena J S Víctor Riquelme M. Royer and 2 more

The Chilean public health system serves 74% of the country's population, and 19% medical appointments are missed on average because no-shows. national goal is 15%, which coincides with no-show rate reported in private healthcare system. Our case study, Doctor Luis Calvo Mackenna Hospital, a high-complexity pediatric hospital teaching center Santiago, Chile. Historically, it has had high rates, up to 29% certain specialties. Using machine learning algorithms predict no-shows patients terms...

10.1007/s10729-022-09626-z article EN cc-by Health Care Management Science 2023-01-28

Automatic Extraction of Nested Entities in Clinical Referrals in Spanish

OPENALEX - Publications

Pablo Báez Felipe Bravo-Márquez Jocelyn Dunstan Matías Rojas Fabián Villena

Here we describe a new clinical corpus rich in nested entities and series of neural models to identify them. The comprises de-identified referrals from the waiting list Chilean public hospitals. A subset 5,000 (58.6% medical 41.4% dental) was manually annotated with 10 types entities, six attributes, pairs relations relevance. In total, there are 110,771 tokens. trained doctor or dentist these referrals, then, together three other researchers, consolidated each annotations. has 48.17%...

10.1145/3498324 article EN ACM Transactions on Computing for Healthcare 2022-04-07

Developing and Validating an Automatic Support System for Tumor Coding in Pathology Reports in Spanish

OPENALEX - Publications

Fabián Villena Pablo Báez Sergio Peñafiel Matías Rojas Inti Paredes and 1 more

PURPOSE Pathology reports provide valuable information for cancer registries to understand, plan, and implement strategies mitigate the impact of cancer. However, coding essential from unstructured is performed by experts in a time-consuming manual process. We developed validated novel two-step automatic system that first recognizes tumor morphology topography mentions free text then suggests codes International Classification Diseases Oncology (ICD-O) Spanish. MATERIALS AND METHODS created...

10.1200/cci.24.00124 article EN JCO Clinical Cancer Informatics 2025-02-01

NLP modeling recommendations for restricted data availability in clinical settings

OPENALEX - Publications

Fabián Villena Felipe Bravo-Márquez Jocelyn Dunstan

Abstract Background Clinical decision-making in healthcare often relies on unstructured text data, which can be challenging to analyze using traditional methods. Natural Language Processing (NLP) has emerged as a promising solution, but its application clinical settings is hindered by restricted data availability and the need for domain-specific knowledge. Methods We conducted an experimental analysis evaluate performance of various NLP modeling paradigms multiple tasks Spanish. These...

10.1186/s12911-025-02948-2 article EN cc-by BMC Medical Informatics and Decision Making 2025-03-07

The Chilean Waiting List Corpus: a new resource for clinical Named Entity Recognition in Spanish

OPENALEX - Publications

Pablo Báez Fabián Villena Matías Rojas Manuel Durán Jocelyn Dunstan

In this work we describe the Waiting List Corpus consisting of de-identified referrals for several specialty consultations from waiting list in Chilean public hospitals. A subset 900 was manually annotated with 9,029 entities, 385 attributes, and 284 pairs relations clinical relevance. trained medical doctor these referrals, then together other three researchers, consolidated each annotations. The corpus has nested 32.2% entities embedded entities. We use to obtain preliminary results Named...

10.18653/v1/2020.clinicalnlp-1.32 article EN cc-by 2020-01-01

A Privacy-Preserving Corpus for Occupational Health in Spanish: Evaluation for NER and Classification Tasks

OPENALEX - Publications

Claudio Aracena Luis Giménez Thomas Vakili Fabián Villena Tamara Quiroga and 3 more

10.18653/v1/2024.clinicalnlp-1.11 article EN 2024-01-01

Clinical Flair: A Pre-Trained Language Model for Spanish Clinical Natural Language Processing

OPENALEX - Publications

Matías Rojas Jocelyn Dunstan Fabián Villena

Word embeddings have been widely used in Natural Language Processing (NLP) tasks. Although these representations can capture the semantic information of words, they cannot learn sequence-level semantics. This problem be handled using contextual word derived from pre-trained language models, which contributed to significant improvements several NLP Further are achieved when pre-training models on domain-specific corpora. In this paper, we introduce Clinical Flair, a model trained Spanish...

10.18653/v1/2022.clinicalnlp-1.9 article EN cc-by 2022-01-01

Supporting the classification of patients in public hospitals in Chile by designing, deploying and validating a system based on natural language processing

OPENALEX - Publications

Fabián Villena Jorge Eduardo Pérez Pérez René Lagos Jocelyn Dunstan

Abstract Background In Chile, a patient needing specialty consultation or surgery has to first be referred by general practitioner, then placed on waiting list. The Explicit Health Guarantees (GES in Spanish) ensures, law, the maximum time solve 85 health problems. Usually, professional manually verifies if each referral, written natural language, corresponds not GES-covered disease. An error this classification is catastrophic for patients, as it puts them non-prioritized list,...

10.1186/s12911-021-01565-z article EN cc-by BMC Medical Informatics and Decision Making 2021-07-01

The incidence of psoriasis in Chile: an analysis of the National Waiting List Repository

OPENALEX - Publications

Cristóbal Lecaros Jocelyn Dunstan Fabián Villena Darren M. Ashcroft Rosa Parisi and 4 more

Psoriasis is a serious and chronic noncommunicable disease. However, the fundamental measure of disease occurrence, incidence, has been scarcely reported globally. There are no previous studies psoriasis incidence in Latin America.To estimate rates Chile during 2016 2017 using an administrative database, Waiting List Repository.We examined referrals at onset, made by physicians to dermatologists, evaluated agreement diagnosis, estimated considering eligible population risk.In most cases,...

10.1111/ced.14713 article EN cc-by-nc Clinical and Experimental Dermatology 2021-04-29

Aplicaciones de aprendizaje automático en salud

OPENALEX - Publications

Claudio Aracena Fabián Villena Felipe Van Der Huck Arias Jocelyn Dunstan

El presente trabajo tiene por objetivo mostrar algunas aplicaciones recientes de aprendizaje automático en el área la salud. o machine learning es una rama inteligencia artificial que ha logrado grandes avances extracción patrones y análisis predictivo obteniendo estado del arte varias tareas. Por lo mismo, esta tecnología utilizada varios sistemas dentro hospitales clínicas. Este introduce a temática algunos sus usos Posteriormente, se muestran divididas según los tipos datos utilizan. This...

10.1016/j.rmclc.2022.10.001 article ES cc-by-nc-nd Revista Médica Clínica Las Condes 2022-11-01

A Knowledge-Graph-Based Intrinsic Test for Benchmarking Medical Concept Embeddings and Pretrained Language Models

OPENALEX - Publications

Claudio Aracena Fabián Villena Matías Rojas Jocelyn Dunstan

Using language models created from large data sources has improved the performance of several deep learning-based architectures, obtaining state-of-the-art results in NLP extrinsic tasks. However, little research is related to creating intrinsic tests that allow us compare quality different when contextualized embeddings. This gap increases even more working on specific domains languages other than English. paper proposes a novel graph-based test allows measure clinical and biomedical...

10.18653/v1/2022.louhi-1.22 article EN cc-by 2022-01-01

Obtención automática de palabras clave en textos clínicos: una aplicación de procesamiento del lenguaje natural a datos masivos de sospecha diagnóstica en Chile

OPENALEX - Publications

Fabián Villena Jocelyn Dunstan

Free-text imposes a challenge in health data analysis since the lack of structure makes extraction and integration information difficult, particularly case massive data. An appropriate machine-interpretation electronic records Chile can unleash knowledge contained large volumes clinical texts, expanding management national research capabilities.To illustrate use weighted frequency algorithm to find keywords. This finding was carried out diagnostic suspicion field Chilean specialty...

10.4067/s0034-98872019001001229 article EN Revista médica de Chile 2019-10-01

Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in Spanish

OPENALEX - Publications

Carolina Chiu Fabián Villena Kinan Martin Fredy Núñez Torres Cecilia Besa and 1 more

Resources for Natural Language Processing (NLP) are less numerous languages different from English. In the clinical domain, where these resources vital obtaining new knowledge about human health and diseases, creating Spanish language is imperative. One of most common approaches in NLP word embeddings, which dense vector representations a word, considering word's context. This representation usually first step various tasks, such as text classification or information extraction. Therefore,...

10.3389/frai.2022.970517 article EN cc-by Frontiers in Artificial Intelligence 2022-09-21

On the Construction of Multilingual Corpora for Clinical Text Mining

OPENALEX - Publications

Fabián Villena Urs Eisenmann Petra Knaup Jocelyn Dunstan Matthias Ganzinger

The amount of digital data derived from healthcare processes have increased tremendously in the last years. This applies especially to unstructured data, which are often hard analyze due lack available tools process and extract information. Natural language processing is used medicine, but majority by researchers developed primarily for English language. For developing testing natural methods, it important a suitable corpus, specific medical domain that covers intended target To improve...

10.3233/shti200180 article EN Studies in health technology and informatics 2020-01-01

Automatic Support System for Tumor Coding in Pathology Reports in Spanish

OPENALEX - Publications

Fabián Villena Pablo Báez Sergio Peñafiel Matías Rojas Inti Paredes and 1 more

Pathology reports provide valuable information for cancer registries to understand, plan and implement strategies mitigate the impact of cancer. However, coding key from unstructured is done by experts in a time-consuming manual process. Here we report an automatic deep learning-based system that recognizes tumor morphology topography mentions free-text suggests codes International Classification Diseases Oncology (ICD-O) Spanish. This task was combining in-house annotated corpus mentions,...

10.2139/ssrn.3982259 article EN SSRN Electronic Journal 2021-01-01

Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas

OPENALEX - Publications

Pablo Báez Fabián Villena Karen Zúñiga Natalia R. Jones Gustavo Fernández and 2 more

A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use patient data. Automatic detection within narratives initially requires humans, following specific protocols rules, identify medical entities interest.To build a linguistic resource annotated on texts produced Chilean hospitals.A corpus was constructed using 150 referrals public hospitals. Three annotators identified six entities: findings, diagnoses,...

10.4067/s0034-98872021000701014 article EN Revista médica de Chile 2021-07-01

Supporting the Classification of Patients in Public Hospitals in Chile by Designing, Deploying and Validating a System Based on Natural Language Processing

OPENALEX - Publications

Fabián Villena Jorge Hernández Perez René Lagos Jocelyn Dunstan

Abstract BackgroundIn Chile, a patient needing specialty consultation or surgery has to first be referred by general practitioner, then placed on waiting list. The Explicit Health Guarantees (GES in Spanish) ensure, law, the maximum time solve an important set of health problems. Usually, professional manually verifies if each referral, written natural language, corresponds not GES-covered disease. An error this classification is catastrophic for patients, as it puts them non-prioritized...

10.21203/rs.3.rs-108491/v1 preprint EN cc-by Research Square (Research Square) 2020-11-19

A pseudonymized corpus of occupational health narratives for clinical entity recognition in Spanish

OPENALEX - Publications

Jocelyn Dunstan Thomas Vakili Luis Giménez Fabián Villena Claudio Aracena and 3 more

<title>Abstract</title> Despite the high creation cost, annotated corpora are indispensable for robust natural language processing systems. In clinical field, apart from annotating medical entities, personally identifiable information (PII) must be removed, especially in era of large models where unwanted memorization can occur. This paper presents a corpus to anonymize 1,787 anamneses work-related accidents and diseases Spanish. addition, we applied previously released model Named Entity...

10.21203/rs.3.rs-3826527/v1 preprint EN cc-by Research Square (Research Square) 2024-01-09

llmNER: (Zero|Few)-Shot Named Entity Recognition, Exploiting the Power of Large Language Models

OPENALEX - Publications

Fabián Villena Luis Giménez Claudio Aracena

Large language models (LLMs) allow us to generate high-quality human-like text. One interesting task in natural processing (NLP) is named entity recognition (NER), which seeks detect mentions of relevant information documents. This paper presents llmNER, a Python library for implementing zero-shot and few-shot NER with LLMs; by providing an easy-to-use interface, llmNER can compose prompts, query the model, parse completion returned LLM. Also, enables user perform prompt engineering...

10.48550/arxiv.2406.04528 preprint EN arXiv (Cornell University) 2024-06-06

A pseudonymized corpus of occupational health narratives for clinical entity recognition in Spanish

OPENALEX - Publications

Jocelyn Dunstan Thomas Vakili Luis Giménez Fabián Villena Claudio Aracena and 4 more

Despite the high creation cost, annotated corpora are indispensable for robust natural language processing systems. In clinical field, in addition to annotating medical entities, corpus creators must also remove personally identifiable information (PII). This has become increasingly important era of large models where unwanted memorization can occur. paper presents a anonymize 1,787 anamneses work-related accidents and diseases Spanish. Additionally, we applied previously released model...

10.1186/s12911-024-02609-w article EN cc-by BMC Medical Informatics and Decision Making 2024-07-24

Generative artificial intelligence in dentistry: Current approaches and future challenges

OPENALEX - Publications

Fabián Villena Claudia Véliz Rosario Garcia‐Huidobro Sebastián Aguayo

Artificial intelligence (AI) has become a commodity for people because of the advent generative AI (GenAI) models that bridge usability gap by providing natural language interface to interact with complex models. These GenAI range from text generation - such as two-way chat systems image or video textual descriptions input user. advancements in have impacted Dentistry multiple aspects. In dental education, student now opportunity solve plethora questions only prompting model and answer...

10.48550/arxiv.2407.17532 preprint EN arXiv (Cornell University) 2024-07-23

Clinical analogy resolution performance for foundation language models

OPENALEX - Publications

Fabián Villena Tamara Quiroga Jocelyn Dunstan

Using extensive data sources to create foundation language models has revolutionized the performance of deep learning-based architectures. This remarkable improvement led state-of-the-art results for various downstream NLP tasks, including clinical tasks. However, more research is needed measure model intrinsically, especially in domain. We revisit use analogy questions as an effective method intrinsic domain English. tested multiple Transformers-based over constructed from Unified Medical...

10.1145/3709155 article EN other-oa ACM Transactions on Computing for Healthcare 2024-12-21

A transcription and information extraction system to facilitate EHR documentation in Spanish

OPENALEX - Publications

M. M. Rojas Fernández Fabián Villena Matías Rojas Fredy Núñez Torres Jorge F. Silva and 1 more

<title>Abstract</title> The large and diverse access to data sources in healthcare has boosted the application of novel computer techniques that can extract meaningful information improve patients' prognoses other important medical uses. However, most current systems require professional type a manual time-consuming manner, increasing risk transcription errors cross-contamination. One solution is create an automated system allows professionals dictate clinical be transcribed analyzed. Since...

10.21203/rs.3.rs-3175804/v1 preprint EN cc-by Research Square (Research Square) 2023-07-21

Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System

OPENALEX - Publications

Fabián Villena Matías Rojas Felipe Van Der Huck Arias Jorge E. Pacheco Paulina Vera and 1 more

The disease coding task involves assigning a unique identifier from controlled vocabulary to each mentioned in clinical document. This is relevant since it allows information extraction unstructured data perform, for example, epidemiological studies about the incidence and prevalence of diseases determined context. However, manual process subject errors as requires medical personnel be competent rules terminology. In addition, this consumes lot time energy, which could allocated more...

10.18653/v1/2023.clinicalnlp-1.37 article EN cc-by 2023-01-01

Difusión por resonancia magnética para el diagnóstico de lesiones intracraneales

OPENALEX - Publications

Cláudio Galvão de Castro Riccardo Velasco Fabián Villena

10.20453/rnp.v65i1.1507 article cc-by Revista de Neuro-Psiquiatría 2013-03-09

Coming Soon ...