Ian Goethert
- Machine Learning in Healthcare
- Research Data Management Practices
- Data Quality and Management
- Biomedical Text Mining and Ontologies
- Genetic Associations and Epidemiology
- HIV, Drug Use, Sexual Risk
- COVID-19 Clinical Research Studies
- Bioinformatics and Genomic Networks
- Scientific Computing and Data Management
- Advanced X-ray and CT Imaging
- Artificial Intelligence in Healthcare
- Medical Imaging and Analysis
- Prostate Cancer Diagnosis and Treatment
- Genomics and Rare Diseases
- Topic Modeling
- Drug-Induced Hepatotoxicity and Protection
- Sepsis Diagnosis and Treatment
- AI in cancer detection
- Electronic Health Records Systems
- Nutrition, Genetics, and Disease
- Radiology practices and education
- Gene expression and cancer classification
- Radiation Dose and Imaging
- COVID-19 diagnosis using AI
- Radiomics and Machine Learning in Medical Imaging
Oak Ridge National Laboratory
2021-2024
One of the justifiable criticisms human genetic studies is underrepresentation participants from diverse populations. Lack inclusion must be addressed at-scale to identify causal disease factors and understand causes health disparities. We present genome-wide associations for 2068 traits 635,969 in Department Veterans Affairs Million Veteran Program, a longitudinal study United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including...
Abstract Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by multi-population meta-analysis for 2,068 traits derived electronic records 635,969 participants Million Veteran Program (MVP), longitudinal cohort study diverse U.S. Veterans genetically similar to...
Objective: To identify and measure semantic drift (i.e., the change in meaning over time) expert-provided anxiety-related (AR) terminology compare it to other common electronic health record (EHR) vocabulary longitudinal clinical notes. Methods: Computational methods were used investigate a pediatric note corpus from 2009 2022. First, we measured of word using similarity temporal embeddings. Second, analyzed how word's contextual evolved successive years by examining its nearest neighbors....
Abstract Background Injection drug use (IDU) can increase mortality and morbidity. Therefore, identifying IDU early initiating harm reduction interventions benefit individuals at risk. However, extracting behaviors from patients’ electronic health records (EHR) is difficult because there no other structured data available, such as International Classification of Disease (ICD) codes, most often documented in unstructured free-text clinical notes. Although natural language processing...
Data Lakehouse is a new paradigm in data architectures that embodies and integrates already established concepts for the systematic management of disparate, large-scale – lake heterogeneous management, use open standards high-performance querying, maintenance "freshness". In addition to being concept, lakehouse also still conceptual construct. Many projects require maturing, empirical studies, specific implementations. this paper, we present our implementation concept biomedical research...
Hydroxychloroquine (HCQ) was proposed as an early therapy for coronavirus disease 2019 (COVID-19) after in vitro studies indicated possible benefit. Previous vivo observational have presented conflicting results, though recent randomized clinical trials reported no benefit from HCQ among patients hospitalized with COVID-19. We examined the effects of alone and combination azithromycin a population US veterans COVID-19, using propensity score-adjusted survival analysis imputation missing...
Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early initiating harm reduction interventions can benefit individuals at risk. However, extracting behaviors from patients' electronic records (EHR) difficult because there no International Classification of Disease (ICD) code the only place information be indicated unstructured free-text clinical notes. Although natural language processing efficiently extract this...
Abstract The predictive modeling literature for biomedical applications is dominated by biostatistical methods survival analysis, and more recently some out of the box machine learning approaches. In this article, we show a presentation method appropriate time-to-event in area prostate cancer long-term disease progression. Using XGBoost adapted to progression, developed model 118 788 patients with localized at diagnosis from Department Veterans Affairs (VA). Our accounted patient censoring....
The compilation and analysis of radiological images poses numerous challenges for researchers. sheer volume data as well the computational needs algorithms capable operating on are extensive. Additionally, assembly these alone is difficult, exams may differ widely in terms clinical context, structured annotation available model training, modality, patient identifiers. In this paper, we describe our experiences establishing a trusted collection radiology linked to United States Department...
Objectives: This study aims to assess the impact of domain shift on chest X-ray classification accuracy and analyze influence ground truth label quality demographic factors such as age group, sex, year. Materials Methods: We used a DenseNet121 model pretrained MIMIC-CXR dataset for deep learning-based multilabel using labels from radiology reports extracted CheXpert CheXbert Labeler. compared performance 14 Veterans Healthcare Administration (VA-CXR). The VA-CXR comprises over 259k images...
The compilation and analysis of radiological images poses numerous challenges for researchers. sheer volume data as well the computational needs algorithms capable operating on are extensive. Additionally, assembly these alone is difficult, exams may differ widely in terms clinical context, structured annotation available model training, modality, patient identifiers. In this paper, we describe our experiences establishing a trusted collection radiology linked to United States Department...
Abstract Background Genome-wide Association Studies (GWAS) aims to uncover the link between genomic variation and phenotype. They have been actively applied in cancer biology investigate associations variations phenotypes, such as susceptibility certain types of predisposed responsiveness specific treatments. Since GWAS primarily focuses on finding individual there are limitations understanding mechanisms by which phenotypes cooperatively affected more than one variation. Results This paper...
Pediatric Electronic Health Records (EHRs) contain drug/medication data. Despite the importance of standardizing drug data to identify class information and enable interoperability between computer systems, sometimes no biomedical vocabulary is used, therefore not standardized. This paper employed UMLS vocabularies standardize Cincinnati Children's Hospital Medical Center (CCHMC) EHR use it build models for pediatric mental health trajectories. We present an approach that identifies a...