Niina Haiminen

ORCID: 0000-0002-8663-1019
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Gut microbiota and health
  • Chromosomal and Genetic Variations
  • Gene expression and cancer classification
  • Metabolomics and Mass Spectrometry Studies
  • Bioinformatics and Genomic Networks
  • Microbial Community Ecology and Physiology
  • Animal Genetics and Reproduction
  • Genetic and phenotypic traits in livestock
  • Algorithms and Data Compression
  • RNA and protein synthesis mechanisms
  • Probiotics and Fermented Foods
  • Identification and Quantification in Food
  • Genetic Mapping and Diversity in Plants and Animals
  • Machine Learning in Bioinformatics
  • Cocoa and Sweet Potato Agronomy
  • Dermatology and Skin Diseases
  • Genetic Associations and Epidemiology
  • Genomics and Chromatin Dynamics
  • Topological and Geometric Data Analysis
  • Biosensors and Analytical Detection
  • Dental Research and COVID-19
  • COVID-19 diagnosis using AI
  • Bioenergy crop production and management
  • Diabetes and associated disorders

IBM Research - Thomas J. Watson Research Center
2011-2025

IBM (United States)
2011-2024

ORCID
2020

Helsinki Institute for Information Technology
2006-2011

University of Helsinki
2006-2011

Abstract Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated type. The availability of its genome sequence and methods for identifying genes responsible important traits will aid researchers breeders. Results We describe sequencing assembly 1-6. is 445 Mbp, which significantly larger than a sequenced Criollo cultivar, more typical other cultivars. chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 with contig N50 84.4 kbp, scaffold 34.4...

10.1186/gb-2013-14-6-r53 article EN cc-by Genome biology 2013-06-03
Justin P. Shaffer Louis‐Félix Nothias Luke Thompson Jon G. Sanders Rodolfo A. Salido and 92 more Sneha Couvillion Asker Brejnrod Franck Lejzerowicz Niina Haiminen Shi Huang Holly L. Lutz Qiyun Zhu Cameron Martino James T. Morton Smruthi Karthikeyan Mélissa Nothias-Esposito Kai Dührkop Sebastian Böcker Hyun Woo Kim Alexander A. Aksenov Wout Bittremieux Jeremiah J. Minich Clarisse Marotz MacKenzie Bryant Karenina Sanders Tara Schwartz Greg Humphrey Yoshiki Vásquez-Baeza Anupriya Tripathi Laxmi Parida Anna Paola Carrieri Kristen L. Beck Promi Das Antonio González Daniel McDonald Joshua Ladau Søren Michael Karst Mads Albertsen Gail Ackermann Jeff DeReus Torsten Thomas Daniel Petras Ashley Shade James Stegen Se Jin Song Thomas Metz Austin D. Swafford Pieter C. Dorrestein Janet Jansson Jack A. Gilbert Rob Knight Lars T. Angenant Alison M. Berry Leonora Bittleston Jennifer L. Bowen Max Chavarría Don A. Cowan Daniel L. Distel Peter R. Girguis Jaime Huerta‐Cepas Paul R. Jensen Lingjing Jiang Gary M. King Anton Lavrinienko Aurora MacRae-Crerar Thulani P. Makhalanyane Tapio Mappes Ezequiel M. Marzinelli Gregory D. Mayer Katherine D. McMahon Jessica L. Metcalf Sou Miyake Timothy A. Mousseau Catalina Murillo‐Cruz David D. Myrold Brian Palenik Adrian A. Pinto‐Tomás Dorota L. Porazinska Jean‐Baptiste Ramond Forest Rowher Taniya Roy Chowdhury Stuart A. Sandin Steven K. Schmidt Henning Seedorf Ashley Shade J. Reuben Shipway Jennifer E. Smith James Stegen Frank J. Stewart Karen Tait Torsten Thomas Yael Tarlovsky Tucker Jana M. U′Ren Phillip C. Watts Nicole S. Webster Jesse Zaneveld Shan Zhang

Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure function microbial communities multiple habitats on a planetary scale. Here we present multi-omics analysis diverse set 880 community samples collected for Earth Microbiome Project. We include amplicon (16S, 18S, ITS) shotgun metagenomic sequence data, untargeted metabolomics data (liquid chromatography-tandem mass spectrometry gas chromatography...

10.1038/s41564-022-01266-x article EN cc-by Nature Microbiology 2022-11-28

We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as minimum for assessing diversity of microbial communities and their relevance environmental factors. This approach is independent taxonomic classification, granting possibility maximal resolution community composition, organizes features into an accurate hierarchy using phylogenomic tree. The outputs are suitable contemporary...

10.1128/msystems.00167-22 article EN mSystems 2022-04-04

Considerable evidence suggests that the gut microbiome changes with age or even accelerates aging in adults. Whether age-related are more less prominent than those for other body sites and whether predictions can be made about a person’s from sample remain unknown. We therefore combined several large studies different countries to determine which site’s could most accurately predict age. found skin was best, on average yielding within 4 years of chronological This study sets stage future...

10.1128/msystems.00630-19 article EN cc-by mSystems 2020-02-10

Abstract Alterations in the human microbiome have been observed a variety of conditions such as asthma, gingivitis, dermatitis and cancer, much remains to be learned about links between health. The fusion artificial intelligence with rich datasets can offer an improved understanding microbiome’s role To gain actionable insights it is essential consider both predictive power transparency models by providing explanations for predictions. We combine collection leg skin samples from two healthy...

10.1038/s41598-021-83922-6 article EN cc-by Scientific Reports 2021-02-25

SARS-CoV-2 is an RNA virus responsible for the coronavirus disease 2019 (COVID-19) pandemic. Viruses exist in complex microbial environments, and recent studies have revealed both synergistic antagonistic effects of specific bacterial taxa on viral prevalence infectivity. We set out to test whether communities predict occurrence a hospital setting.We collected 972 samples from hospitalized patients with COVID-19, their health care providers, surfaces before, during, after admission. screened...

10.1186/s40168-021-01083-0 article EN cc-by Microbiome 2021-06-08

Standard workflows for analyzing microbiomes often include the creation and curation of phylogenetic trees. Here we present EMPress, an interactive web tool visualizing trees in context microbiome, metabolome, other community data scalable to with well over 500,000 nodes. EMPress provides novel functionality-including ordination integration animations-alongside many standard tree visualization features thus simplifies exploratory analyses forms 'omic data.IMPORTANCE Phylogenetic are integral...

10.1128/msystems.01216-20 article EN cc-by mSystems 2021-03-15

In this work, we hypothesized that shifts in the food microbiome can be used as an indicator of unexpected contaminants or environmental changes. To test hypothesis, sequenced total RNA 31 high protein powder (HPP) samples poultry meal pet ingredients. We developed a analysis pipeline employing key eukaryotic matrix filtering step improved microbe detection specificity to >99.96% during silico validation. The identified 119 microbial genera per HPP sample on average with 65 present all...

10.1038/s41538-020-00083-y article EN cc-by npj Science of Food 2021-02-08

ABSTRACT The gastrointestinal (GI) tract is a site of replication severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and GI symptoms are often reported by patients. SARS-CoV-2 cell entry depends upon heparan sulfate (HS) proteoglycans, which commensal bacteria that bathe the human mucosa known to modify. To explore gut HS-modifying bacterial abundances how their presence may impact infection, we developed task-based analysis proteoglycan degradation on large-scale shotgun...

10.1128/mbio.04015-24 article EN cc-by mBio 2025-02-25

Recent advances in sequencing technology have resulted the dramatic increase of data, which, turn, requires efficient management computational resources, such as computing time, memory requirements well prototyping pipelines.We present GenomicTools, a flexible platform, comprising both command-line set tools and C++ API, for analysis manipulation high-throughput data DNA-seq, RNA-seq, ChIP-seq MethylC-seq. GenomicTools implements variety mathematical operations between sets genomic regions...

10.1093/bioinformatics/btr646 article EN Bioinformatics 2011-11-22

Abstract Here we propose that using shotgun sequencing to examine food leads accurate authentication of ingredients and detection contaminants. To demonstrate this, developed a bioinformatic pipeline, FASER (Food Authentication from SEquencing Reads), designed resolve the relative composition mixtures eukaryotic species RNA or DNA sequencing. Our comprehensive database includes >6000 plants animals may be present in food. accurately identified with 0.4% median absolute difference between...

10.1038/s41538-019-0056-6 article EN cc-by npj Science of Food 2019-11-19

Abstract The human microbiota has a close relationship with disease and it remodels components of the glycocalyx including heparan sulfate (HS). Studies severe acute respiratory syndrome coronavirus (SARS-CoV-2) spike protein receptor binding domain suggest that infection requires to HS angiotensin converting enzyme 2 (ACE2) in codependent manner. Here, we show commensal host bacterial communities can modify thereby modulate SARS-CoV-2 these change age sex. Common human-associated bacteria...

10.1101/2020.08.17.238444 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2020-08-18

The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) a highly utilized alpha metric that has thus far failed to effectively scale trees with millions vertices. Stacked (SFPhD) enables calculation this widely adopted at much larger by implementing computationally efficient algorithm. algorithm reduces the amount computational resources required,...

10.1101/gr.275777.121 article EN cc-by-nc Genome Research 2021-09-03

Abstract In response to the ongoing global pandemic, characterizing molecular-level host interactions of new coronavirus SARS-CoV-2 responsible for COVID-19 has been at center unprecedented scientific focus. However, when virus enters body it also interacts with micro-organisms already inhabiting host. Understanding virus-host-microbiome can yield additional insights into biological processes perturbed by viral invasion. Alterations in gut microbiome species and metabolites have noted during...

10.1038/s41598-021-85750-0 article EN cc-by Scientific Reports 2021-03-19

Tracking the bacterial communities present in our food has potential to inform safety and product origin. To do so, entire genetic material a sample is extracted using chemical methods or commercially available kits sequenced next-generation platforms provide snapshot of microbial composition.

10.1128/msystems.00619-21 article EN mSystems 2021-06-15

Chocolate is a highly valued and palatable confectionery product. primarily made from the processed seeds of tree species Theobroma cacao. Cacao cultivation relevant for small-holder farmers throughout tropics, yet its productivity remains limited by low yields widespread pathogens. A panel 148 improved cacao clones was assembled based on disease resistance, phenotypic single-tree replicated clonal evaluation performed 8 years. Using high-density markers, diversity expressed relative to 10...

10.3389/fpls.2017.01905 article EN cc-by Frontiers in Plant Science 2017-11-14

Abstract We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing diversity of microbial communities and their relevance environmental factors. This approach is independent from taxonomic classification, granting possibility maximal resolution community composition, organizes features into an accurate hierarchy using phylogenomic tree. The outputs are suitable...

10.1101/2021.04.04.438427 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-04-06

Sequence segmentation and dimensionality reduction have been used as methods for studying high-dimensional sequences — they both reduce the complexity of representation original data. In this paper we study interplay these two techniques. We formulate problem segmenting a sequence while modeling it with basis small size, thus essentially reducing dimension input sequence. give three different algorithms problem: all combine existing reduction. For proposed prove guarantees quality solutions...

10.1137/1.9781611972764.33 article EN 2006-04-20

Recent developments in high-throughput sequencing technology have made low-cost an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers usable sequences per instrument-run continue to make whole-genome assembly appealing target application. In this paper we evaluate feasibility de novo from short reads (≤100 nucleotides) through a detailed study involving genomic various lengths origin, conjunction...

10.1371/journal.pone.0024182 article EN cc-by PLoS ONE 2011-09-07

Reed canary grass (Phalaris arundinacea) is an economically important forage and bioenergy of the temperate regions world. Despite its economic importance, it lacking in public genomic data. We explore comparative exomics cultivars context response to salt exposure. The limited data set poses challenges computational pipeline. As a prerequisite for study, we generate Phalaris reference transcriptome sequence, one first steps addressing issue paucity processed this species. In addition,...

10.1186/1471-2164-15-s6-s18 article EN cc-by BMC Genomics 2014-10-01

Randomization is an important technique for assessing the significance of data mining results. Given input set, a randomization method samples at random from some class datasets that share certain characteristics with original data. The measure interest on then compared to assess its significance.For types data, e.g., gene expression matrices, it useful be able sample row and column means variances. Testing whether results algorithm such randomized differ true dataset tells us were artifact...

10.1137/1.9781611972788.45 article EN 2008-04-24

BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such can be approached with next-generation whole-genome and assembly as if it were independent small genome. Using the minimum tiling path guide, specific BAC clones representing prioritized genomic interval are selected, pooled, used to prepare library. This pooled approach was taken sequence assemble QTL-rich region, ~3 Mbp represented by twenty-seven BACs, on...

10.1186/1471-2164-12-379 article EN cc-by BMC Genomics 2011-07-27
Coming Soon ...