- Genomics and Phylogenetic Studies
- Gut microbiota and health
- Chromosomal and Genetic Variations
- Gene expression and cancer classification
- Metabolomics and Mass Spectrometry Studies
- Bioinformatics and Genomic Networks
- Microbial Community Ecology and Physiology
- Animal Genetics and Reproduction
- Genetic and phenotypic traits in livestock
- Algorithms and Data Compression
- RNA and protein synthesis mechanisms
- Probiotics and Fermented Foods
- Identification and Quantification in Food
- Genetic Mapping and Diversity in Plants and Animals
- Machine Learning in Bioinformatics
- Cocoa and Sweet Potato Agronomy
- Dermatology and Skin Diseases
- Genetic Associations and Epidemiology
- Genomics and Chromatin Dynamics
- Topological and Geometric Data Analysis
- Biosensors and Analytical Detection
- Dental Research and COVID-19
- COVID-19 diagnosis using AI
- Bioenergy crop production and management
- Diabetes and associated disorders
IBM Research - Thomas J. Watson Research Center
2011-2025
IBM (United States)
2011-2024
ORCID
2020
Helsinki Institute for Information Technology
2006-2011
University of Helsinki
2006-2011
Abstract Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated type. The availability of its genome sequence and methods for identifying genes responsible important traits will aid researchers breeders. Results We describe sequencing assembly 1-6. is 445 Mbp, which significantly larger than a sequenced Criollo cultivar, more typical other cultivars. chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 with contig N50 84.4 kbp, scaffold 34.4...
Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure function microbial communities multiple habitats on a planetary scale. Here we present multi-omics analysis diverse set 880 community samples collected for Earth Microbiome Project. We include amplicon (16S, 18S, ITS) shotgun metagenomic sequence data, untargeted metabolomics data (liquid chromatography-tandem mass spectrometry gas chromatography...
We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as minimum for assessing diversity of microbial communities and their relevance environmental factors. This approach is independent taxonomic classification, granting possibility maximal resolution community composition, organizes features into an accurate hierarchy using phylogenomic tree. The outputs are suitable contemporary...
Considerable evidence suggests that the gut microbiome changes with age or even accelerates aging in adults. Whether age-related are more less prominent than those for other body sites and whether predictions can be made about a person’s from sample remain unknown. We therefore combined several large studies different countries to determine which site’s could most accurately predict age. found skin was best, on average yielding within 4 years of chronological This study sets stage future...
Abstract Alterations in the human microbiome have been observed a variety of conditions such as asthma, gingivitis, dermatitis and cancer, much remains to be learned about links between health. The fusion artificial intelligence with rich datasets can offer an improved understanding microbiome’s role To gain actionable insights it is essential consider both predictive power transparency models by providing explanations for predictions. We combine collection leg skin samples from two healthy...
SARS-CoV-2 is an RNA virus responsible for the coronavirus disease 2019 (COVID-19) pandemic. Viruses exist in complex microbial environments, and recent studies have revealed both synergistic antagonistic effects of specific bacterial taxa on viral prevalence infectivity. We set out to test whether communities predict occurrence a hospital setting.We collected 972 samples from hospitalized patients with COVID-19, their health care providers, surfaces before, during, after admission. screened...
Standard workflows for analyzing microbiomes often include the creation and curation of phylogenetic trees. Here we present EMPress, an interactive web tool visualizing trees in context microbiome, metabolome, other community data scalable to with well over 500,000 nodes. EMPress provides novel functionality-including ordination integration animations-alongside many standard tree visualization features thus simplifies exploratory analyses forms 'omic data.IMPORTANCE Phylogenetic are integral...
In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test hypothesis, sequenced total RNA 31 high protein powder (HPP) samples poultry meal pet ingredients. We developed a analysis pipeline employing key eukaryotic matrix filtering step improved microbe detection specificity to >99.96% during silico validation. The identified 119 microbial genera per HPP sample on average with 65 present all...
ABSTRACT The gastrointestinal (GI) tract is a site of replication severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and GI symptoms are often reported by patients. SARS-CoV-2 cell entry depends upon heparan sulfate (HS) proteoglycans, which commensal bacteria that bathe the human mucosa known to modify. To explore gut HS-modifying bacterial abundances how their presence may impact infection, we developed task-based analysis proteoglycan degradation on large-scale shotgun...
Recent advances in sequencing technology have resulted the dramatic increase of data, which, turn, requires efficient management computational resources, such as computing time, memory requirements well prototyping pipelines.We present GenomicTools, a flexible platform, comprising both command-line set tools and C++ API, for analysis manipulation high-throughput data DNA-seq, RNA-seq, ChIP-seq MethylC-seq. GenomicTools implements variety mathematical operations between sets genomic regions...
Abstract Here we propose that using shotgun sequencing to examine food leads accurate authentication of ingredients and detection contaminants. To demonstrate this, developed a bioinformatic pipeline, FASER (Food Authentication from SEquencing Reads), designed resolve the relative composition mixtures eukaryotic species RNA or DNA sequencing. Our comprehensive database includes >6000 plants animals may be present in food. accurately identified with 0.4% median absolute difference between...
Abstract The human microbiota has a close relationship with disease and it remodels components of the glycocalyx including heparan sulfate (HS). Studies severe acute respiratory syndrome coronavirus (SARS-CoV-2) spike protein receptor binding domain suggest that infection requires to HS angiotensin converting enzyme 2 (ACE2) in codependent manner. Here, we show commensal host bacterial communities can modify thereby modulate SARS-CoV-2 these change age sex. Common human-associated bacteria...
The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) a highly utilized alpha metric that has thus far failed to effectively scale trees with millions vertices. Stacked (SFPhD) enables calculation this widely adopted at much larger by implementing computationally efficient algorithm. algorithm reduces the amount computational resources required,...
Abstract In response to the ongoing global pandemic, characterizing molecular-level host interactions of new coronavirus SARS-CoV-2 responsible for COVID-19 has been at center unprecedented scientific focus. However, when virus enters body it also interacts with micro-organisms already inhabiting host. Understanding virus-host-microbiome can yield additional insights into biological processes perturbed by viral invasion. Alterations in gut microbiome species and metabolites have noted during...
Tracking the bacterial communities present in our food has potential to inform safety and product origin. To do so, entire genetic material a sample is extracted using chemical methods or commercially available kits sequenced next-generation platforms provide snapshot of microbial composition.
Chocolate is a highly valued and palatable confectionery product. primarily made from the processed seeds of tree species Theobroma cacao. Cacao cultivation relevant for small-holder farmers throughout tropics, yet its productivity remains limited by low yields widespread pathogens. A panel 148 improved cacao clones was assembled based on disease resistance, phenotypic single-tree replicated clonal evaluation performed 8 years. Using high-density markers, diversity expressed relative to 10...
Abstract We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing diversity of microbial communities and their relevance environmental factors. This approach is independent from taxonomic classification, granting possibility maximal resolution community composition, organizes features into an accurate hierarchy using phylogenomic tree. The outputs are suitable...
Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of representation original data. In this paper we study interplay these two techniques. We formulate problem segmenting a sequence while modeling it with basis small size, thus essentially reducing dimension input sequence. give three different algorithms problem: all combine existing reduction. For proposed prove guarantees quality solutions...
Recent developments in high-throughput sequencing technology have made low-cost an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers usable sequences per instrument-run continue to make whole-genome assembly appealing target application. In this paper we evaluate feasibility de novo from short reads (≤100 nucleotides) through a detailed study involving genomic various lengths origin, conjunction...
Reed canary grass (Phalaris arundinacea) is an economically important forage and bioenergy of the temperate regions world. Despite its economic importance, it lacking in public genomic data. We explore comparative exomics cultivars context response to salt exposure. The limited data set poses challenges computational pipeline. As a prerequisite for study, we generate Phalaris reference transcriptome sequence, one first steps addressing issue paucity processed this species. In addition,...
Randomization is an important technique for assessing the significance of data mining results. Given input set, a randomization method samples at random from some class datasets that share certain characteristics with original data. The measure interest on then compared to assess its significance.For types data, e.g., gene expression matrices, it useful be able sample row and column means variances. Testing whether results algorithm such randomized differ true dataset tells us were artifact...
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such can be approached with next-generation whole-genome and assembly as if it were independent small genome. Using the minimum tiling path guide, specific BAC clones representing prioritized genomic interval are selected, pooled, used to prepare library. This pooled approach was taken sequence assemble QTL-rich region, ~3 Mbp represented by twenty-seven BACs, on...