- Natural Language Processing Techniques
- Biomedical Text Mining and Ontologies
- Topic Modeling
- Migration and Labor Dynamics
- Insurance, Mortality, Demography, Risk Management
- demographic modeling and climate adaptation
- Text Readability and Simplification
- Data Quality and Management
- Healthcare Systems and Technology
- Artificial Intelligence in Healthcare
- Artificial Intelligence in Healthcare and Education
- Privacy-Preserving Technologies in Data
- Data-Driven Disease Surveillance
- Interpreting and Communication in Healthcare
- Dental Radiography and Imaging
- Radiomics and Machine Learning in Medical Imaging
- Semantic Web and Ontologies
- Advanced Computational Techniques and Applications
- linguistics and terminology studies
- Data Mining Algorithms and Applications
- Authorship Attribution and Profiling
- Medical Coding and Health Information
- COVID-19 diagnosis using AI
- Machine Learning in Healthcare
- Healthcare Operations and Scheduling Optimization
University of Chile
2019-2025
Millennium Institute for Integrative Biology
2024-2025
Pontificia Universidad Católica de Chile
2023
Center for Mathematical Modeling
2021
Instituto de Neurociencia Biomédica
2021
Zimmer Biomet (Germany)
2020
Heidelberg University
2020
The Chilean public health system serves 74% of the country's population, and 19% medical appointments are missed on average because no-shows. national goal is 15%, which coincides with no-show rate reported in private healthcare system. Our case study, Doctor Luis Calvo Mackenna Hospital, a high-complexity pediatric hospital teaching center Santiago, Chile. Historically, it has had high rates, up to 29% certain specialties. Using machine learning algorithms predict no-shows patients terms...
Here we describe a new clinical corpus rich in nested entities and series of neural models to identify them. The comprises de-identified referrals from the waiting list Chilean public hospitals. A subset 5,000 (58.6% medical 41.4% dental) was manually annotated with 10 types entities, six attributes, pairs relations relevance. In total, there are 110,771 tokens. trained doctor or dentist these referrals, then, together three other researchers, consolidated each annotations. has 48.17%...
PURPOSE Pathology reports provide valuable information for cancer registries to understand, plan, and implement strategies mitigate the impact of cancer. However, coding essential from unstructured is performed by experts in a time-consuming manual process. We developed validated novel two-step automatic system that first recognizes tumor morphology topography mentions free text then suggests codes International Classification Diseases Oncology (ICD-O) Spanish. MATERIALS AND METHODS created...
Abstract Background Clinical decision-making in healthcare often relies on unstructured text data, which can be challenging to analyze using traditional methods. Natural Language Processing (NLP) has emerged as a promising solution, but its application clinical settings is hindered by restricted data availability and the need for domain-specific knowledge. Methods We conducted an experimental analysis evaluate performance of various NLP modeling paradigms multiple tasks Spanish. These...
In this work we describe the Waiting List Corpus consisting of de-identified referrals for several specialty consultations from waiting list in Chilean public hospitals. A subset 900 was manually annotated with 9,029 entities, 385 attributes, and 284 pairs relations clinical relevance. trained medical doctor these referrals, then together other three researchers, consolidated each annotations. The corpus has nested 32.2% entities embedded entities. We use to obtain preliminary results Named...
Word embeddings have been widely used in Natural Language Processing (NLP) tasks. Although these representations can capture the semantic information of words, they cannot learn sequence-level semantics. This problem be handled using contextual word derived from pre-trained language models, which contributed to significant improvements several NLP Further are achieved when pre-training models on domain-specific corpora. In this paper, we introduce Clinical Flair, a model trained Spanish...
Abstract Background In Chile, a patient needing specialty consultation or surgery has to first be referred by general practitioner, then placed on waiting list. The Explicit Health Guarantees (GES in Spanish) ensures, law, the maximum time solve 85 health problems. Usually, professional manually verifies if each referral, written natural language, corresponds not GES-covered disease. An error this classification is catastrophic for patients, as it puts them non-prioritized list,...
Psoriasis is a serious and chronic noncommunicable disease. However, the fundamental measure of disease occurrence, incidence, has been scarcely reported globally. There are no previous studies psoriasis incidence in Latin America.To estimate rates Chile during 2016 2017 using an administrative database, Waiting List Repository.We examined referrals at onset, made by physicians to dermatologists, evaluated agreement diagnosis, estimated considering eligible population risk.In most cases,...
El presente trabajo tiene por objetivo mostrar algunas aplicaciones recientes de aprendizaje automático en el área la salud. o machine learning es una rama inteligencia artificial que ha logrado grandes avances extracción patrones y análisis predictivo obteniendo estado del arte varias tareas. Por lo mismo, esta tecnología utilizada varios sistemas dentro hospitales clínicas. Este introduce a temática algunos sus usos Posteriormente, se muestran divididas según los tipos datos utilizan. This...
Using language models created from large data sources has improved the performance of several deep learning-based architectures, obtaining state-of-the-art results in NLP extrinsic tasks. However, little research is related to creating intrinsic tests that allow us compare quality different when contextualized embeddings. This gap increases even more working on specific domains languages other than English. paper proposes a novel graph-based test allows measure clinical and biomedical...
Free-text imposes a challenge in health data analysis since the lack of structure makes extraction and integration information difficult, particularly case massive data. An appropriate machine-interpretation electronic records Chile can unleash knowledge contained large volumes clinical texts, expanding management national research capabilities.To illustrate use weighted frequency algorithm to find keywords. This finding was carried out diagnostic suspicion field Chilean specialty...
Resources for Natural Language Processing (NLP) are less numerous languages different from English. In the clinical domain, where these resources vital obtaining new knowledge about human health and diseases, creating Spanish language is imperative. One of most common approaches in NLP word embeddings, which dense vector representations a word, considering word's context. This representation usually first step various tasks, such as text classification or information extraction. Therefore,...
The amount of digital data derived from healthcare processes have increased tremendously in the last years. This applies especially to unstructured data, which are often hard analyze due lack available tools process and extract information. Natural language processing is used medicine, but majority by researchers developed primarily for English language. For developing testing natural methods, it important a suitable corpus, specific medical domain that covers intended target To improve...
Pathology reports provide valuable information for cancer registries to understand, plan and implement strategies mitigate the impact of cancer. However, coding key from unstructured is done by experts in a time-consuming manual process. Here we report an automatic deep learning-based system that recognizes tumor morphology topography mentions free-text suggests codes International Classification Diseases Oncology (ICD-O) Spanish. This task was combining in-house annotated corpus mentions,...
A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use patient data. Automatic detection within narratives initially requires humans, following specific protocols rules, identify medical entities interest.To build a linguistic resource annotated on texts produced Chilean hospitals.A corpus was constructed using 150 referrals public hospitals. Three annotators identified six entities: findings, diagnoses,...
Abstract BackgroundIn Chile, a patient needing specialty consultation or surgery has to first be referred by general practitioner, then placed on waiting list. The Explicit Health Guarantees (GES in Spanish) ensure, law, the maximum time solve an important set of health problems. Usually, professional manually verifies if each referral, written natural language, corresponds not GES-covered disease. An error this classification is catastrophic for patients, as it puts them non-prioritized...
<title>Abstract</title> Despite the high creation cost, annotated corpora are indispensable for robust natural language processing systems. In clinical field, apart from annotating medical entities, personally identifiable information (PII) must be removed, especially in era of large models where unwanted memorization can occur. This paper presents a corpus to anonymize 1,787 anamneses work-related accidents and diseases Spanish. addition, we applied previously released model Named Entity...
Large language models (LLMs) allow us to generate high-quality human-like text. One interesting task in natural processing (NLP) is named entity recognition (NER), which seeks detect mentions of relevant information documents. This paper presents llmNER, a Python library for implementing zero-shot and few-shot NER with LLMs; by providing an easy-to-use interface, llmNER can compose prompts, query the model, parse completion returned LLM. Also, enables user perform prompt engineering...
Despite the high creation cost, annotated corpora are indispensable for robust natural language processing systems. In clinical field, in addition to annotating medical entities, corpus creators must also remove personally identifiable information (PII). This has become increasingly important era of large models where unwanted memorization can occur. paper presents a anonymize 1,787 anamneses work-related accidents and diseases Spanish. Additionally, we applied previously released model...
Artificial intelligence (AI) has become a commodity for people because of the advent generative AI (GenAI) models that bridge usability gap by providing natural language interface to interact with complex models. These GenAI range from text generation - such as two-way chat systems image or video textual descriptions input user. advancements in have impacted Dentistry multiple aspects. In dental education, student now opportunity solve plethora questions only prompting model and answer...
Using extensive data sources to create foundation language models has revolutionized the performance of deep learning-based architectures. This remarkable improvement led state-of-the-art results for various downstream NLP tasks, including clinical tasks. However, more research is needed measure model intrinsically, especially in domain. We revisit use analogy questions as an effective method intrinsic domain English. tested multiple Transformers-based over constructed from Unified Medical...
<title>Abstract</title> The large and diverse access to data sources in healthcare has boosted the application of novel computer techniques that can extract meaningful information improve patients' prognoses other important medical uses. However, most current systems require professional type a manual time-consuming manner, increasing risk transcription errors cross-contamination. One solution is create an automated system allows professionals dictate clinical be transcribed analyzed. Since...
The disease coding task involves assigning a unique identifier from controlled vocabulary to each mentioned in clinical document. This is relevant since it allows information extraction unstructured data perform, for example, epidemiological studies about the incidence and prevalence of diseases determined context. However, manual process subject errors as requires medical personnel be competent rules terminology. In addition, this consumes lot time energy, which could allocated more...
.