- Gene expression and cancer classification
- Bioinformatics and Genomic Networks
- Single-cell and spatial transcriptomics
- Cancer Genomics and Diagnostics
- Molecular Biology Techniques and Applications
- Genomics and Phylogenetic Studies
- Scientific Computing and Data Management
- Ovarian cancer diagnosis and treatment
- Cancer-related molecular mechanisms research
- Metabolomics and Mass Spectrometry Studies
- Gut microbiota and health
- Genetics, Bioinformatics, and Biomedical Research
- Biomedical Text Mining and Ontologies
- Health and Medical Studies
- Research Data Management Practices
- Cell Image Analysis Techniques
- Chromosomal and Genetic Variations
- Genomic variations and chromosomal abnormalities
- RNA Research and Splicing
- Diet, Metabolism, and Disease
- Genomics and Rare Diseases
- RNA modifications and cancer
- AI in cancer detection
- Optimism, Hope, and Well-being
- Parasite Biology and Host Interactions
City University of New York
2015-2024
Roswell Park Comprehensive Cancer Center
2015-2023
Population Council
2020-2022
University at Buffalo, State University of New York
2017-2019
CUNY School of Law
2017
The Graduate Center, CUNY
2017
Dana-Farber Cancer Institute
2015-2016
Center for Cancer Research
2016
Massachusetts General Hospital
2016
Harvard University
2016
Although gene set enrichment analysis has become an integral part of high-throughput expression data analysis, the assessment methods remains rudimentary and ad hoc. In absence suitable gold standards, evaluations are commonly restricted to selected datasets biological reasoning on relevance resulting enriched sets.We develop extensible framework for reproducible benchmarking based defined criteria applicability, prioritization detection relevant processes. This incorporates a curated...
Abstract Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, analysis. R Bioconductor provide a generic framework for statistical analysis visualization, as well specialized classes variety high-throughput types, but methods lacking integrative multiomics experiments. The MultiAssayExperiment software package, implemented leveraging design principles, provides the coordinated representation of,...
Abstract Multiple studies have identified transcriptome subtypes of high-grade serous ovarian carcinoma (HGSOC), but their interpretation and translation are complicated by tumor evolution polyclonality accompanied extensive accumulation somatic aberrations, varying cell type admixtures, different tissues origin. In this study, we examined the chronology HGSOC subtype in context these factors using a novel integrative analysis absolute copy-number gene expression The Cancer Genome Atlas...
Racial/ethnic minority adults have higher rates of hypertension than non-Hispanic white adults. We examined the prevalence among Hispanic and Asian subgroups in New York City.Data from 2013-2014 City Health Nutrition Examination Survey were used to assess (aged ≥20) (n = 1,476). Hypertension was measured (systolic blood pressure ≥140 mm Hg or diastolic ≥90 self-reported use medication). Participants race/ethnicity country origin. Multivariable logistic regression models assessed differences...
PURPOSE Investigations of the molecular basis for development, progression, and treatment cancer increasingly use complementary genomic assays to gather multiomic data, but management analysis such data remain complex. The cBioPortal genomics currently provides from > 260 public studies, including Cancer Genome Atlas (TCGA) sets, integration different types remains challenging error prone computational methods tools using these resources. Recent advances in infrastructure within...
Phase 1 of the Human Microbiome Project (HMP) investigated 18 body subsites 242 healthy American adults to produce first comprehensive reference for composition and variation "healthy" human microbiome. Publicly available data sets from amplicon sequencing two 16S ribosomal RNA variable regions, with extensive controlled-access participant data, provide a ongoing microbiome studies. However, utilization these can be hindered by complex bioinformatic steps required access, import, decrypt,...
<ns3:p>Gene symbols are recognizable identifiers for gene names but unstable and error-prone due to aliasing, manual entry, unintentional conversion by spreadsheets date format. Official symbol resources such as HUGO Gene Nomenclature Committee (HGNC) human genes the Mouse Genome Informatics project (MGI) mouse provide authoritative sources of valid, aliased, outdated symbols, lack a programmatic interface correction converted spreadsheets. We present HGNChelper, an R package that identifies...
Modern biological research is increasingly data-intensive, leading to a growing demand for effective training in data science. In this article, we provide an overview of key resources and best practices available within the Bioconductor project - open-source software community focused on omics analysis. This guide serves as valuable reference both learners educators field.
Previous benchmarking of differential abundance (DA) analysis methods in microbiome studies have employed synthetic data, simulations, and "real data" examples, but to the best our knowledge, none yet experimental data with known "ground truth" abundance. A key debate field centers on whether compositional are necessary for DA analysis, which is challenging answer due lack ground truth data. To address this gap, we created Bioconductor package MicrobiomeBenchmarkData, featuring three...
We present curatedMetagenomicData, a Bioconductor and command-line interface to thousands of metagenomic profiles from the Human Microbiome Project other publicly available datasets, ExperimentHub, platform for convenient cloud-based distribution data R desktop. The resource provides standardized per-participant metadata linked bacterial, fungal, archaeal, viral taxonomic abundances, as well quantitative metabolic functional profiles. datasets can be immediately analyzed in or software with...
Copy number variation (CNV) is a major type of structural genomic that increasingly studied across different species for association with diseases and production traits. Established protocols experimental detection computational inference CNVs from SNP array next-generation sequencing data are available. We present the CNVRanger R/Bioconductor package which implements comprehensive toolbox structured downstream analysis CNVs. This includes functionality summarizing individual CNV calls...
<ns3:p>Gene symbols are recognizable identifiers for gene names but unstable and error-prone due to aliasing, manual entry, unintentional conversion by spreadsheets date format. Official symbol resources such as HUGO Gene Nomenclature Committee (HGNC) human genes the Mouse Genome Informatics project (MGI) mouse provide authoritative sources of valid, aliased, outdated symbols, lack a programmatic interface correction converted spreadsheets. We present HGNChelper, an R package that identifies...
Whole-genome analysis of cancer specimens is commonplace, and investigators frequently share or re-use in later studies. Duplicate expression profiles public databases will impact re-analysis if left undetected, a so-called "doppelgänger" effect. We propose method that should be routine practice to accurately match duplicate transcriptomes when nucleotide-level sequence data are unavailable, even for samples profiled by different microarray technologies both RNA sequencing. demonstrate the...
Abstract Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation new experiments. We present a method interpreting datasets through instant comparison to without high-performance computing requirements. apply Principal Component Analysis on 536 studies comprising 44,890 human RNA sequencing and aggregate sufficiently similar loading vectors form Replicable Axes Variation (RAV). RAVs are annotated with metadata originating by...
In neurocysticercosis, the larval form of pork tapeworm Taenia solium appears to evolve through three phases-active, degenerative and sometimes calcification-before disappearance. The antihelmintic drug, albendazole, has been shown hasten resolution active cysts in neurocysticercosis. Little is known about time take progress each phase, with or without treatment.
Abstract Background Although gene set enrichment analysis has become an integral part of high-throughput expression data analysis, the assessment methods remains rudimentary and ad hoc. In absence suitable gold standards, evaluations are commonly restricted to selected sets biological reasoning on relevance resulting enriched sets. However, this is typically incomplete biased towards goals individual investigations. Results We present a general framework for standardized structured...
The majority of high-throughput single-cell molecular profiling methods quantify RNA expression; however, recent multimodal add simultaneous measurement genomic, proteomic, epigenetic, and/or spatial information on the same cells. development new statistical and computational in Bioconductor for such data will be facilitated by easy availability landmark datasets using standard classes.
Purpose: Given rising rates of deadly melanoma skin cancer in Hispanics, the study objective was to examine cancer-related risk reduction behaviors and beliefs dictate content for culturally targeted prevention strategies Hispanics. Methods/Data Source: An anonymous survey administered waiting room volunteers a primary care facility Albuquerque, New Mexico assess behaviors, screening, information seeking communication, as well Hispanics (n=48) Non-Hispanic Whites (n=36). Results: We found...