- Topic Modeling
- Artificial Intelligence in Healthcare
- Biomedical Text Mining and Ontologies
- Natural Language Processing Techniques
- Machine Learning in Healthcare
- Bioinformatics and Genomic Networks
- Machine Learning and Data Classification
- Computational Drug Discovery Methods
- COVID-19 Clinical Research Studies
- Big Data and Digital Economy
- Noise Effects and Management
- Air Quality and Health Impacts
- Eosinophilic Disorders and Syndromes
- Chronic Myeloid Leukemia Treatments
- Healthcare Policy and Management
- Domain Adaptation and Few-Shot Learning
- Anomaly Detection Techniques and Applications
- Hormonal Regulation and Hypertension
- Text and Document Classification Technologies
- Meta-analysis and systematic reviews
- Tuberculosis Research and Epidemiology
- Climate Change and Health Impacts
- Chronic Lymphocytic Leukemia Research
- Machine Learning in Bioinformatics
- Primary Care and Health Outcomes
Georgia Institute of Technology
2020-2023
Emory University
2021-2023
Enveda Therapeutics (United States)
2023
Brigham Young University
2017-2018
University of Utah
2018
Intermountain Healthcare
2017
Nearly 60% of U.S. children live in counties with particulate matter less than or equal to 2.5 μm aerodynamic diameter (PM2.5) concentrations above air quality standards. Understanding the relationship between ambient pollution exposure and health outcomes informs actions reduce disease risk.To evaluate association PM2.5 levels healthcare encounters for acute lower respiratory infection (ALRI).Using an observational case-crossover design, subjects (n = 146,397) were studied if they had ALRI...
Link prediction in artificial intelligence is used to identify missing links or derive future relationships that can occur complex networks. A link model was developed using the heterogeneous biomedical knowledge graph, SemNet, predict literature for drug discovery. web application visualized graph embeddings and results TransE, CompleX, RotatE based methods. The achieved up 0.44 hits@10 on entity tasks. recent outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also...
Meta-analysis of randomized clinical trials (RCTs) plays a crucial role in evidence-based medicine but can be labor-intensive and error-prone. This study explores the use large language models to enhance efficiency aggregating results from at scale. We perform detailed comparison performance these zero-shot prompt-based information extraction diverse set RCTs traditional manual annotation methods. analyze for two different meta-analyses aimed drug repurposing cancer therapy...
The success of deep neural networks has inspired many to wonder whether other learners could benefit from deep, layered architectures. We present a general framework called forward thinking for learning that generalizes the architectural flexibility and sophistication while also allowing (i) different types functions in network, than neurons, (ii) ability adaptively deepen network as needed improve results. This is done by training one layer at time, once trained, input data are mapped...
We study the problem of learning neural text classifiers without using any labeled data, but only easy-to-provide rules as multiple weak supervision sources. This is challenging because rule-induced labels are often noisy and incomplete. To address these two challenges, we design a label denoiser, which estimates source reliability conditional soft attention mechanism then reduces noise by aggregating rule-annotated labels. The denoised pseudo supervise classifier to predicts for unmatched...
One of the primary challenges healthcare delivery is aggregating disparate, asynchronous data sources into meaningful indicators individual health. We combine natural language word embedding and network modeling techniques to learn representations medical concepts by using weighted adjacency matrix in GloVe algorithm, which we call Code2Vec. demonstrate that our learned embeddings improve neural performance for disease prediction. However, also popular deep learning models prediction are not...
Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a heterogeneous network or “knowledge graph” of nodes edges to compute relatedness rank concepts pertinent user-specified target. provides way perform multi-factorial multi-scalar analysis complex disease etiology therapeutic identification using the 33+ million articles in PubMed. present work improves efficacy efficiency LBD for end users by augmenting...
Multiple studies have reported new or exacerbated persistent resistant hypertension in patients previously infected with COVID-19. We used literature-based discovery to identify and prioritize multi-scalar explanatory biology that relates Cross-domain text mining of 33+ million PubMed articles within a comprehensive knowledge graph was performed using SemNet 2.0. Unsupervised rank aggregation determined which concepts were most relevant utilizing the normalized HeteSim score. A series...
Identifying disease comorbidities and grouping medical diagnoses into incidents are two important problems in health care delivery assessment. Using vector space embeddings produced using the Global Vectors (GloVe) algorithm, we able to find useful representations of diagnosis codes that can identify related thus improve identification incidents.
David Kartchner, Jennifer Deng, Shubham Lohiya, Tejasri Kopparthi, Prasanth Bathala, Daniel Domingo-Fernández, Cassie Mitchell. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.
A major bottleneck preventing the extension of deep learning systems to new domains is prohibitive cost acquiring sufficient training labels. Alternatives such as weak supervision, active learning, and fine-tuning pretrained models reduce this burden but require substantial human input select a highly informative subset instances or curate labeling functions. REGAL (Rule-Enhanced Generative Active Learning) an improved framework for weakly supervised text classification that performs over...
Identifying future high-cost patients allows healthcare organizations to take preventative measures both reduce patient costs and lessen the burden of illness. This paper expands upon past risk adjustment strategies predict persistently by combining clinical claims data on assessing using machine learning techniques. Our approach not only leads substantial gains in predictive accuracy, but also reduces amount needed identify high-risk patients, enabling providers confidently long-term health...
This work presents a new, original document classification dataset, BioSift, to expedite the initial selection and labeling of studies for drug repurposing. The dataset consists 10,000 human-annotated abstracts from scientific articles in PubMed. Each abstract is labeled with up eight attributes necessary perform meta-analysis utilizing popular patient-intervention-comparator-outcome (PICO) method: has human subjects, clinical trial/cohort, population size, target disease, study drug,...