- Biomedical Text Mining and Ontologies
- Multiple Sclerosis Research Studies
- Machine Learning in Healthcare
- Bioinformatics and Genomic Networks
- Topic Modeling
- Genetic Associations and Epidemiology
- Artificial Intelligence in Healthcare and Education
- Health, Environment, Cognitive Aging
- Peripheral Neuropathies and Disorders
- Diabetes Management and Research
- Neuroinflammation and Neurodegeneration Mechanisms
- Semantic Web and Ontologies
- Cancer-related molecular mechanisms research
- Parkinson's Disease Mechanisms and Treatments
- AI in cancer detection
- Diabetes, Cardiovascular Risks, and Lipoproteins
- bioluminescence and chemiluminescence research
- Chronic Disease Management Strategies
- Sepsis Diagnosis and Treatment
- Systemic Lupus Erythematosus Research
- T-cell and B-cell Immunology
- Radiomics and Machine Learning in Medical Imaging
- Artificial Intelligence in Healthcare
- Polyomavirus and related diseases
- Computational Drug Discovery Methods
University of California, San Francisco
2022-2025
Universidad Católica de Santa Fe
2024
Abstract Motivation Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size heterogeneity of underlying information. Results In this work, we present Scalable Precision Medicine Open Engine (SPOKE), biomedical connecting millions concepts via semantically meaningful relationships. SPOKE contains 27 million nodes 21 different types 53 edges 55 downloaded from 41 databases. The graph is built on framework 11...
Identification of Alzheimer's disease (AD) onset risk can facilitate interventions before irreversible progression. We demonstrate that electronic health records from the University California, San Francisco, followed by knowledge networks (for example, SPOKE) allow for (1) prediction AD and (2) prioritization biological hypotheses, (3) contextualization sex dimorphism. trained random forest models predicted on a cohort 749 individuals with 250,545 controls mean area under receiver operating...
Abstract Motivation Large language models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains such as biomedicine. Solutions pretraining and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging massive biomedical KG (SPOKE) with LLMs Llama-2-13b,...
While neurodegeneration underlies the pathological basis for permanent disability in multiple sclerosis (MS), predictive biomarkers progression are lacking. Using an animal model of chronic MS, we find that synaptic injury precedes neuronal loss and identify thinning inner plexiform layer (IPL) as early feature inflammatory demyelination—prior to symptom onset. As domains anatomically segregated retina can be monitored longitudinally, hypothesize IPL could represent a biomarker MS....
Introduction Early diagnosis of Parkinson’s disease (PD) is important to identify treatments slow neurodegeneration. People who develop PD often have symptoms before the manifests and may be coded as diagnoses in electronic health record (EHR). Methods To predict diagnosis, we embedded EHR data patients onto a biomedical knowledge graph called Scalable Precision medicine Open Knowledge Engine (SPOKE) created patient embedding vectors. We trained validated classifier using these vectors from...
Recent sero-epidemiological studies have strengthened the hypothesis that Epstein-Barr virus (EBV) may be a causal factor in multiple sclerosis (MS). Given complexity of EBV-host interaction, various mechanisms responsible for disease pathogenesis. Furthermore, it remains unclear whether this is disease-specific process. Here, we showed genes encoding EBV interactors are enriched loci associated with MS but not other diseases and prioritized therapeutic targets. Analyses blood brain...
Abstract Glioblastoma multiforme (GM) is a malignant tumor of the central nervous system considered to be highly aggressive and often carrying terrible survival prognosis. An accurate prognosis therefore pivotal for deciding good treatment plan patients. In this context, computational intelligence applied data electronic health records (EHRs) patients diagnosed with disease can useful predict patients’ time. study, we evaluated different machine learning models time in suffering from...
Diabetes is a metabolic disorder that affects more than 420 million of people worldwide, and it caused by the presence high level sugar in blood for long period. can have serious long-term health consequences, such as cardiovascular diseases, strokes, chronic kidney foot ulcers, retinopathy, others. Even if common, this disease uneasy to spot, because often comes with no symptoms. Especially diabetes type 2, happens mainly adults, knowing how has been present patient strong impact on...
Large Language Models (LLMs) have been driving progress in AI at an unprecedented rate, yet still face challenges knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, the latter require domain-expertise. External knowledge infusion is task-specific requires model training. Here, we introduce a task-agnostic Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging...
<strong>Background:</strong> Neuroblastoma is a rare pediatric cancer that affects thousands of children worldwide. Information stored in electronic health records can be useful source data forin silicoscientific studies about this disease, carried out both by humans and computational machines. Several open datasets derived from anonymized patients diagnosed with neuroblastoma are available the internet, but they were released on different websites or as supplementary information...
Abstract Early identification of Alzheimer’s Disease (AD) risk can aid in interventions before disease progression. We demonstrate that electronic health records (EHRs) combined with heterogeneous knowledge networks (e.g., SPOKE) allow for (1) prediction AD onset and (2) generation biological hypotheses linking phenotypes AD. trained random forest models predict mean AUROC 0.72 (-7 years) to .81 (-1 day). Top identified conditions from matched cohort include importance across time, early or...
Meaningful representations of clinical data using embedding vectors is a pivotal step to invoke any machine learning (ML) algorithm for inference. In this article, we propose time-aware approach electronic health records onto biomedical knowledge graph creating readable patient representations. This not only captures the temporal dynamics trajectories, but also enriches it with additional biological information from graph. To gauge predictivity approach, an ML pipeline called TANDEM...
In this work, we integrated summary level data from GWAS with orthogonal evidence of transcriptional regulation to perform a pathway analysis using sub-significant variants plausible biological effect.
The colorectal cancer tumor microenvironment presents significant genetic heterogeneity with mutations in genes several signaling pathways. Detecting these driver through wet lab experiments is costly and time-consuming. Computational models bioinformatic tools have become a vital alternative this effort. One of novel computational methods, Centrality Analysis, molecular functions, biological processes biochemical pathways by creating analyzing protein-protein interaction networks. Analysis...