- Cancer Genomics and Diagnostics
- Tensor decomposition and applications
- Epigenetics and DNA Methylation
- Genomics and Phylogenetic Studies
- Electrostatics and Colloid Interactions
- Evolution and Genetic Dynamics
- Face and Expression Recognition
- Molecular Biology Techniques and Applications
- Machine Learning in Bioinformatics
- Electrochemical Analysis and Applications
- Image Retrieval and Classification Techniques
- Protein Structure and Dynamics
- Spectroscopy and Quantum Chemical Studies
- Topic Modeling
- Genetic factors in colorectal cancer
- Fractional Differential Equations Solutions
- Gene expression and cancer classification
- Parallel Computing and Optimization Techniques
- DNA Repair Mechanisms
- Genomics and Rare Diseases
- Natural Language Processing Techniques
- Bioinformatics and Genomic Networks
- Advanced Image and Video Retrieval Techniques
- Computational Physics and Python Applications
- Face recognition and analysis
University of California, San Diego
2020-2025
La Jolla Bioengineering Institute
2020-2024
Los Alamos National Laboratory
2019-2022
University of New Mexico
2016-2020
Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for
Analysis of mutational signatures is a powerful approach for understanding the mutagenic processes that have shaped evolution cancer genome. To evaluate operative in genome, one first needs to quantify their activities by estimating number mutations imprinted each signature.
Abstract International differences in the incidence of many cancer types indicate existence carcinogen exposures that have not yet been identified by conventional epidemiology make a substantial contribution to burden 1 . In clear cell renal carcinoma, obesity, hypertension and tobacco smoking are risk factors, but they do explain geographical variation its 2 Underlying causes can be inferred sequencing genomes cancers from populations with different rates detecting patterns somatic...
SUMMARY Mutational signature analysis is commonly performed in genomic studies surveying cancer and normal somatic tissues. Here we present SigProfilerExtractor, an automated tool for accurate de novo extraction of mutational signatures all types mutations. Benchmarking with a total 34 distinct scenarios encompassing 2,500 simulated operative more than 60,000 unique synthetic genomes 20,000 exomes demonstrates that SigProfilerExtractor outperforms thirteen other tools across datasets without...
Analysis of mutational signatures is a powerful approach for understanding the mutagenic processes that have shaped evolution cancer genome. Here we present SigProfilerAssignment, desktop and an online computational framework assigning all types to individual samples. SigProfilerAssignment first tool allows both analysis copy-number probabilistic assignment somatic mutations. As its engine, uses custom implementation forward stagewise algorithm sparse regression nonnegative least squares...
Tobacco smoke, alone or combined with alcohol, is the predominant cause of head and neck cancer (HNC). Here, we further explore how tobacco exposure contributes to development by mutational signature analysis 265 whole-genome sequenced HNC from eight countries. Six tobacco-associated signatures were detected, including some not previously reported. Differences in incidence between countries corresponded differences mutation burdens signatures, consistent dominant role causation. found burden...
Topic modeling, or identifying the set of topics that occur in a collection articles, is one primary objectives text mining. Typically, corpus represented as words-by-documents matrix, X, where xij , encodes i-th word importance score j-th document using Term Frequency-Inverse Document Frequency (TF-IDF) representation. Non-negative Matrix Factorization (NMF) can then be used order to extract and model corpus. NMF approximates X product two low-rank non-negative factors:W, which represents...
All cancers harbor somatic mutations in their genomes. In principle, affecting between one and fifty base pairs are generally classified as small mutational events. Conversely, large events affect more than pairs, and, most cases, they encompass copy-number structural variants many thousands of pairs. Prior studies have demonstrated that examining patterns can be leveraged to provide both biological clinical insights, thus, resulting an extensive repertoire tools for evaluating Recently,...
ABSTRACT Colorectal cancer incidence rates vary geographically and have changed over time. Notably, in the past two decades, of early-onset colorectal cancer, affecting individuals under age 50 years, has doubled many countries. The reasons for this increase are unknown. Here, we investigate whether mutational processes contribute to geographic age-related differences by examining 981 genomes from 11 No major were found microsatellite unstable cancers, but variations mutation burden...
Tobacco smoke, alone or combined with alcohol, is the predominant cause of head and neck cancer (HNC). We explore how tobacco exposure contributes to development by mutational signature analysis 265 whole-genome sequenced HNC samples from eight countries. Six tobacco-associated signatures were detected, including some not previously reported. Differences in incidence between countries corresponded differences mutation burdens signatures, consistent dominant role causation. found burden...
Non-negative matrix factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications mining, text recognition, dimension reduction, face anomaly detection, blind source separation, many other fields. An important input NMF is the latent dimensionality of data, that is, number features, K, present explored set. Unfortunately, this quantity rarely known priori. The existing methods determining...
APOBEC enzymes are part of the innate immunity and responsible for restricting viruses retroelements by deaminating cytosine residues
Non-negative Matrix Factorization (NMF) models the topics of a text corpus by decomposing matrix term frequency-inverse document frequency (TF-IDF) representation, X, into two low-rank non-negative matrices: W , representing and H, mapping documents onto space topics. One challenge, common to all topic models, is determination number latent (aka model determination). Determining correct important: underestimating results in poor separation, under-fitting, while overestimating leads noisy...
Electric double layers are complex systems that involve a wide variety of interactions between the different components electrolyte solutions and with charged interface. While role all Coulombic types is clear, non-Coulombic forces less obvious. The focus in present study on effect bulk solvation properties electric layer. analysis based classical density functional theory. This approach allows us to account for correlations (ionic) uncharged (solvent) species solution. surface charge at...
All cancers harbor somatic mutations in their genomes. In principle, affecting between one and fifty base pairs are generally classified as small mutational events. Conversely, large events affect more than pairs, and, most cases, they encompass copy-number structural variants many thousands of pairs. Prior studies have demonstrated that examining patterns can be leveraged to provide both biological clinical insights, thus, resulting an extensive repertoire tools for evaluating Recently,...
ABSTRACT Lung cancer in never smokers (LCINS) accounts for up to 25% of all lung cancers and has been associated with exposure secondhand tobacco smoke air pollution observational studies. Here, we evaluate the mutagenic exposures LCINS by examining deep whole-genome sequencing data from a large international cohort 871 treatment-naïve recruited 28 geographical locations within Sherlock- study. KRAS mutations were 3.8-fold more common adenocarcinomas North America Europe, while 1.6-fold...
The era of exascale computing opens new venues for innovations and discoveries in many scientific, engineering, commercial fields. However, with the exaflops also come extra-large high-dimensional data generated by highperformance computing. High-dimensional is presented as multidimensional arrays, aka tensors. presence latent (not directly observable) structures tensor allows a unique representation compression classical factorization techniques. methods are not always stable or they can be...
The charge and potential distributions in an electric double layer result from various chemical physical interactions between interface the adjacent electrolyte solution. typically originates a equilibrium reactive surface certain determining ions solution (i.e., regulation). This chemistry however, is strongly dependent on wide variety of all species solution, as well with interface. These could be Coulombic non-Coulombic, are system-specific. focus this study ion valency its effect...
The properties of electric double layers are governed by the interface between substrate and adjacent electrolyte solution. This is involved in chemical, Coulombic, non-Coulombic (e.g., van der Waals or Lennard-Jones) interactions with all components fluid phase. We present a detailed study these using classical density functional approach. A particular focus placed on their effect surface chemistry charge regulation. solution structure near charged also analyzed used to offer thorough...
Identifying biologically-active protein structure(s) from an ensemble of computed three-dimensional structures is a major challenge. Clustering-based methods are time-consuming and often under perform on structure datasets that highly imbalanced. Energy landscape-based improve performance over imbalanced but incur significant time costs. In this paper we propose novel method based non-negative matrix factorization. The outperforms energy clustering methods, addressing both costs challenges...
Fractional Brownian motion (fBm) is a ubiquitous diffusion process in which the memory effects of stochastic transport result mean squared particle displacement following power law, $\langle {\Delta r}^2 \rangle \sim t^{\alpha}$, where exponent $\alpha$ characterizes whether subdiffusive, ($\alpha<1$), diffusive ($\alpha = 1$), or superdiffusive, >1$). Due to abundance fBm processes nature, significant efforts have been devoted identification and characterization sources various phenomena....
The focus of the present article is on ionic size variation effects properties charged interfaces involving electrolyte solution, commonly referred to as electric double layers. presence a well defined interface between solution and substrate has profound impact local structure liquid phase. All species are distributed according various fluid surface interactions. excluded volume finite dimensions all ions solvent molecules major contributor detailed structure. determines important layers...