Benjamin Pullman
- Advanced Proteomics Techniques and Applications
- Metabolomics and Mass Spectrometry Studies
- Genomics and Phylogenetic Studies
- Mass Spectrometry Techniques and Applications
- Scientific Computing and Data Management
- Biomedical Text Mining and Ontologies
- Research Data Management Practices
- Gut microbiota and health
- Genetics, Bioinformatics, and Biomedical Research
- Data Management and Algorithms
- Epigenetics and DNA Methylation
- Advanced Database Systems and Queries
- Genomics and Rare Diseases
- Graph Theory and Algorithms
- Genetic Associations and Epidemiology
- Genetic Syndromes and Imprinting
- Telomeres, Telomerase, and Senescence
- Bayesian Modeling and Causal Inference
- Cell Image Analysis Techniques
- Pharmacogenetics and Drug Metabolism
- Advanced Image and Video Retrieval Techniques
- Cancer Treatment and Pharmacology
- Eicosanoids and Hypertension Pharmacology
- Gene expression and cancer classification
- Molecular Biology Techniques and Applications
University of Montana
2020-2024
University of California, San Diego
2016-2024
AstraZeneca (United Kingdom)
2024
AstraZeneca (United States)
2023
UC San Diego Health System
2018
Icahn School of Medicine at Mount Sinai
2015
Abstract The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) has standardized data submission and dissemination mass spectrometry worldwide since 2012. In this paper, we describe the main developments previous update manuscript was published in Nucleic Acids Research 2017. Since then, addition to four PX existing members at time (PRIDE, PeptideAtlas including PASSEL resource, MassIVE jPOST), two new have joined PX: iProX (China) Panorama Public (USA)....
Mass spectrometry (MS) is by far the most used experimental approach in high-throughput proteomics. The ProteomeXchange (PX) consortium of proteomics resources (http://www.proteomexchange.org) was originally set up to standardize data submission and dissemination public MS data. It now 10 years since initial workflow implemented. In this manuscript, we describe main developments PX previous update manuscript Nucleic Acids Research published 2020. six members Consortium are PRIDE,...
The increasing throughput and sharing of proteomics mass spectrometry data have now yielded over one-third a million public runs. However, these discoveries are not continuously aggregated in an open error-controlled manner, which limits their utility. To facilitate the reusability data, we built MassIVE Knowledge Base (MassIVE-KB), community-wide, updating knowledge base that aggregates into reusable format with full provenance information for community scrutiny. Reusing >31 TB human stored...
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to plasma proteome remains largely unknown. Here we associations between protein-coding 2,923 protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide study identified 5,433...
Abstract microbeMASST, a taxonomically informed mass spectrometry (MS) search tool, tackles limited microbial metabolite annotation in untargeted metabolomics experiments. Leveraging curated database of >60,000 monocultures, users can known and unknown MS/MS spectra link them to their respective producers via fragmentation patterns. Identification microbe-derived metabolites relative without priori knowledge will vastly enhance the understanding microorganisms’ role ecology human health.
Telomeres protect chromosome ends from damage and their length is linked with human disease aging. We developed a joint telomere metric, combining quantitative PCR whole-genome sequencing measurements 462,666 UK Biobank participants. This metric increased SNP heritability, suggesting that it better captures genetic regulation of length. Exome-wide rare-variant gene-level collapsing association studies identified 64 variants 30 genes significantly associated length, including allelic series...
DNA methylation has essential roles in transcriptional regulation, imprinting, X chromosome inactivation and other cellular processes, aberrant CpG is directly involved the pathogenesis of human imprinting disorders many cancers. To address need for a quantitative highly multiplexed bisulfite sequencing method with long read lengths targeted analysis, we developed single-molecule real-time (SMRT-BS).Optimized conversion PCR conditions enabled amplification fragments up to ~1.5 kb, subjecting...
It is important for the proteomics community to have a standardized manner represent all possible variations of protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members Consortium Top-Down (CTDP) has developed standard notation called ProForma 2.0, which substantial extension original by CTDP. 2.0 aims unify representation proteoforms...
Understanding the distribution of hundreds thousands plant metabolites across kingdom presents a challenge. To address this, we curated publicly available LC-MS/MS data from 19,075 extracts and developed plantMASST reference database encompassing 246 botanical families, 1,469 genera, 2,793 species. This taxonomically focused facilitates exploration plant-derived molecules using tandem mass spectrometry (MS/MS) spectra. tool will aid in drug discovery, biosynthesis, (chemo)taxonomy,...
High-throughput tandem mass spectrometry has enabled the detection and identification of over 75% all proteins predicted to result in translated gene products human genome. In fact, galloping rate data acquisition sharing led current availability many tens terabytes public thousands sets. The systematic reanalysis these sets been used build a community-scale spectral library 2.1 million precursors for 1 unique sequences from 19,000 (including spectra synthetic peptides). However, it remained...
Abstract Access to web-based platforms has enabled scientists perform research remotely. A critical aspect of mass spectrometry data analysis is the inspection, analysis, and visualization raw validate quality confirm statistical observations. We developed GNPS Dashboard, a tool, facilitate synchronous collaborative visualization, private public
Abstract MicrobeMASST, a taxonomically-informed mass spectrometry (MS) search tool, tackles limited microbial metabolite annotation in untargeted metabolomics experiments. Leveraging curated database of >60,000 monocultures, users can known and unknown MS/MS spectra link them to their respective producers via fragmentation patterns. Identification microbial-derived metabolites relative producers, without priori knowledge, will vastly enhance the understanding microorganisms’ role ecology...
Queries of multi-TB Mass Spectrometry (MS) repositories provide deep insights into biological processes and pose challenging data processing problems. The key bottleneck for running these queries is the number small random reads. Byte-addressable persistent main memory (PMEM) technologies enable real-time MS search systems by delivering low-latency, high-bandwidth storage. This work presents P-Massive, multi-terabyte scale system. P-Massive takes advantage PMEM underlying nature its access...
Abstract MicrobeMASST, a taxonomically-informed mass spectrometry (MS) search tool, tackles limited microbial metabolite annotation in untargeted metabolomics experiments. Leveraging curated database of >60,000 monocultures, users can known and unknown MS/MS spectra link them to their respective producers via fragmentation patterns. Identification microbial-derived metabolites relative producers, without priori knowledge, will vastly enhance the understanding microorganisms' role ecology...
Abstract Mass spectra provide the ultimate evidence for supporting findings of mass spectrometry (MS) proteomics studies in publications, and it is therefore crucial to be able trace conclusions back spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism encoding virtual path any spectrum contained datasets deposited public repositories. USIs enable greater transparency providing spectral support key with more than 1 billion USI identifications from over 3 already...
Given a database of vectors, cosine threshold query returns all vectors in the having similarity to vector above given threshold. These queries arise naturally many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers efficient evaluation queries, providing novel optimality guarantees exhibiting good performance on real datasets. We take starting point Fagin's well-known Threshold Algorithm (TA), which can be used answer follows: an...
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community other fields supported by mass spectrometry since its inception twenty years ago. Here we describe general operation of PSI, including leadership, working groups, yearly workshops, document process which proposals are thoroughly publicly reviewed in order to be ratified as PSI standards. We...
Given a database of vectors, cosine threshold query returns all vectors in the having similarity to vector above given {\theta}. These queries arise naturally many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers efficient evaluation queries, providing novel optimality guarantees exhibiting good performance on real datasets. We take starting point Fagin's well-known Threshold Algorithm (TA), which can be used answer follows: an...