Juan M. Banda
- Biomedical Text Mining and Ontologies
- Topic Modeling
- Social Media in Health Education
- Misinformation and Its Impacts
- Machine Learning in Healthcare
- Image Retrieval and Classification Techniques
- Semantic Web and Ontologies
- Sentiment Analysis and Opinion Mining
- Data-Driven Disease Surveillance
- COVID-19 epidemiological studies
- Pharmacovigilance and Adverse Drug Reactions
- Advanced Image and Video Retrieval Techniques
- Solar Radiation and Photovoltaics
- Solar and Space Plasma Dynamics
- COVID-19 Clinical Research Studies
- Artificial Intelligence in Healthcare and Education
- Hate Speech and Cyberbullying Detection
- Electronic Health Records Systems
- COVID-19 Pandemic Impacts
- COVID-19 diagnosis using AI
- Computational Drug Discovery Methods
- Data Management and Algorithms
- Artificial Intelligence in Healthcare
- Text Readability and Simplification
- Public Relations and Crisis Communication
Stanford Health Care
2024-2025
Georgia State University
2015-2024
Manchester University NHS Foundation Trust
2024
Stanford University
2015-2021
Adama (Israel)
2019
Stanford Medicine
2015-2018
Vrije Universiteit Amsterdam
2018
Montana State University
2009-2014
University of Montana
2010
As the COVID-19 pandemic continues its march around world, an unprecedented amount of open data is being generated for genetics and epidemiological research. The unparalleled rate at which many research groups world are releasing publications on ongoing allowing other scientists to learn from local experiences in front lines pandemic. However, there a need integrate additional sources that map measure role social dynamics such unique world-wide event into biomedical, biological, analyses....
Abstract Identification of adverse drug reactions (ADRs) during the post-marketing phase is one most important goals safety surveillance. Spontaneous reporting systems (SRS) data, which are mainstay traditional surveillance, used for hypothesis generation and to validate newer approaches. The publicly available US Food Drug Administration (FDA) Adverse Event Reporting System (FAERS) data requires substantial curation before they can be appropriately, applying different strategies cleaning...
With the widespread adoption of electronic health records (EHRs), large repositories structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one most fundamental research problems encountered when using these new EHR data. Phenotyping forms basis translational research, comparative effectiveness studies, clinical decision support, population analyses routinely collected...
Hydroxychloroquine, a drug commonly used in the treatment of rheumatoid arthritis, has received much negative publicity for adverse events associated with its authorisation emergency use to treat patients COVID-19 pneumonia. We studied safety hydroxychloroquine, alone and combination azithromycin, determine risk routine care arthritis.
Abstract Objective Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation time-consuming. Machine learning approaches to electronic phenotyping limited by the paucity of labeled training datasets. We demonstrate feasibility utilizing semi-automatically sets create models via machine learning, using comprehensive representation medical record. Methods use list keywords specific interest generate noisy data. train L1 penalized...
Abstract Comorbid conditions appear to be common among individuals hospitalised with coronavirus disease 2019 (COVID-19) but estimates of prevalence vary and little is known about the prior medication use patients. Here, we describe characteristics adults COVID-19 compare them influenza We include 34,128 (US: 8362, South Korea: 7341, Spain: 18,425) patients, summarising between 4811 11,643 unique aggregate characteristics. patients have been majority male in US Spain, predominantly female...
Familial hypercholesterolemia (FH) is an underdiagnosed dominant genetic condition affecting approximately 0.4% of the population and has up to a 20-fold increased risk coronary artery disease if untreated. Simple screening strategies have false positive rates greater than 95%. As part FH Foundation's FIND initiative, we developed classifier identify potential patients using electronic health record (EHR) data at Stanford Health Care. We trained random forest from known (n = 197) matched...
Arjun Magge, Ari Klein, Antonio Miranda-Escalada, Mohammed Ali Al-Garadi, Ilseyar Alimova, Zulfat Miftahutdinov, Eulalia Farre, Salvador Lima López, Ivan Flores, Karen O’Connor, Davy Weissenbacher, Elena Tutubalina, Abeed Sarker, Juan Banda, Martin Krallinger, Graciela Gonzalez-Hernandez. Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task. 2021.
The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach address natural language processing and machine learning challenges inherent utilizing social media data health informatics. In this paper, we present annotated corpora, technical summary participants' systems, performance results.
BackgroundCardiovascular outcomes for people with familial hypercholesterolaemia can be improved diagnosis and medical management. However, 90% of individuals remain undiagnosed in the USA. We aimed to accelerate early timely intervention more than 1·3 million at high risk heart attacks strokes by applying machine learning large health-care encounter datasets.MethodsWe trained FIND FH model using deidentified data, including procedure diagnostic codes, prescriptions, laboratory findings,...
Consensus around an efficient second-line treatment option for type 2 diabetes (T2D) remains ambiguous. The availability of electronic medical records and insurance claims data, which capture routine practice, accessed via the Observational Health Data Sciences Informatics network presents opportunity to generate evidence effectiveness treatments.To identify drug classes among sulfonylureas, dipeptidyl peptidase 4 (DPP-4) inhibitors, thiazolidinediones are associated with reduced hemoglobin...
Abstract Background Low testing rates and delays in reporting hinder the estimation of mortality burden associated with COVID-19 pandemic. During a public health emergency, estimating all cause excess deaths above an expected level death can provide more reliable picture burden. Here, we aim to estimate absolute relative impact pandemic Mexico. Methods We obtained weekly time series due causes for Mexico, by gender, geographic region from 2015 2020. also compiled surveillance data on cases...
Despite growing interest in using large language models (LLMs) healthcare, current explorations do not assess the real-world utility and safety of LLMs clinical settings. Our objective was to determine whether two can serve information needs submitted by physicians as questions an informatics consultation service a safe concordant manner. Sixty six from consult were GPT-3.5 GPT-4 via simple prompts. 12 assessed LLM responses' possibility patient harm concordance with existing reports...
0. Abstract Background The integration of large language models (LLMs) in healthcare offers immense opportunity to streamline tasks, but also carries risks such as response accuracy and bias perpetration. To address this, we conducted a red-teaming exercise assess LLMs developed dataset clinically relevant scenarios for future teams use. Methods We convened 80 multi-disciplinary experts evaluate the performance popular across multiple medical scenarios. Teams composed clinicians, engineering...
Abstract Background In this study we phenotyped individuals hospitalised with coronavirus disease 2019 (COVID-19) in depth, summarising entire medical histories, including medications, as captured routinely collected data drawn from databases across three continents. We then compared COVID-19 to those previously influenza. Methods report demographics, recorded conditions and medication use of patients the US (Columbia University Irving Medical Center [CUIMC], Premier Healthcare Database...
There has been a dramatic increase in the popularity of utilizing social media data for research purposes within biomedical community. In PubMed alone, there have nearly 2,500 publication entries since 2014 that deal with analyzing from Twitter and Reddit. However, vast majority those works do not share their code or replicating studies. With minimal exceptions, few do, place burden on researcher to figure out how fetch data, best format create automatic manual annotations acquired data....
Concern has been raised in the rheumatology community regarding recent regulatory warnings that HCQ used coronavirus disease 2019 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation or psychosis associated with as for RA.
Abstract As the COVID-19 virus continues to infect people across globe, there is little understanding of long term implications for recovered patients. There have been reports persistent symptoms after confirmed infections on patients even three months initial recovery. While some these documented follow-ups clinical records, or participate in longitudinal surveys, datasets are usually not publicly available standardized perform analyses them. Therefore, a need use additional data sources...
Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Associated Risks Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation analysis of COVID-19 RWD.
Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy large language models, but non-model creator-affiliated red teaming scant in healthcare. We convened teams clinicians, medical engineering students, technical professionals (80 participants total) to stress-test models with real-world clinical cases categorize inappropriate responses along axes safety, privacy, hallucinations/accuracy, bias. Six...
Summary Recent adverse event reports have raised the question of increased angioedema risk associated with exposure to levetiracetam. To help address this question, Observational Health Data Sciences and Informatics research network conducted a retrospective observational new‐user cohort study seizure patients exposed levetiracetam (n = 276,665) across 10 databases. With phenytoin users 74,682) as comparator group, propensity score‐matching was hazard ratios computed for events by...
This paper introduces a new public benchmark dataset of solar image data from the Solar Dynamics Observatory (SDO) mission. is first release, which contains over 15,000 images and nearly 24,000 events, spanning six months 2012. It combines region-based event labels automated detection modules, ten pre-computed parameters for each cell grid-based segmentation full resolution images, lower version further analysis visualization. Together, these components serve as standardized, ready-to-use,...