Pierre Peterlongo

ORCID: 0000-0003-0776-6407
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Algorithms and Data Compression
  • RNA and protein synthesis mechanisms
  • Gene expression and cancer classification
  • Microbial Community Ecology and Physiology
  • Molecular Biology Techniques and Applications
  • Machine Learning in Bioinformatics
  • Chromosomal and Genetic Variations
  • Evolutionary Algorithms and Applications
  • Natural Language Processing Techniques
  • Bioinformatics and Genomic Networks
  • Computational Physics and Python Applications
  • Educational Technology and Assessment
  • Genetic and Environmental Crop Studies
  • DNA and Biological Computing
  • Cancer-related molecular mechanisms research
  • Advanced Image and Video Retrieval Techniques
  • Genetics, Bioinformatics, and Biomedical Research
  • Caching and Content Delivery
  • Genetic diversity and population structure
  • Gut microbiota and health
  • MicroRNA in disease regulation
  • Environmental DNA in Biodiversity Studies
  • Genetic Mapping and Diversity in Plants and Animals
  • Bacteriophages and microbial interactions

Institut de Recherche en Informatique et Systèmes Aléatoires
2013-2024

Université de Rennes
2017-2024

Centre National de la Recherche Scientifique
2008-2024

Institut national de recherche en informatique et en automatique
2012-2023

Computer Algorithms for Medicine
2012-2022

Genomics (United Kingdom)
2012-2022

Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement
2021

Université Paris-Saclay
2021

Inria Rennes - Bretagne Atlantique Research Centre
2008-2020

Laboratoire de Génie des Procédés – Environnement – Agro-alimentaire
2015

The Critical Assessment of Metagenome Interpretation (CAMI) community initiative presents results from its first challenge, a rigorous benchmarking software for metagenome assembly, binning and taxonomic profiling. Methods profiling are key to interpreting data, but lack consensus about complicates performance assessment. challenge has engaged the global developer benchmark their programs on highly complex realistic data sets, generated ∼700 newly sequenced microorganisms ∼600 novel viruses...

10.1038/nmeth.4458 article EN cc-by Nature Methods 2017-10-02
Fernando Meyer Adrian Fritz Zhi-Luo Deng David Koslicki Till Robin Lesker and 95 more Alexey Gurevich Gary Robertson Mohammed Alser Dmitry Antipov Francesco Beghini Denis Bertrand Jaqueline Brito C. Titus Brown Jan P. Buchmann Aydın Buluç Bo Chen Rayan Chikhi Philip T. L. C. Clausen Alexandru Cristian Piotr Wojciech Dąbrowski Aaron E. Darling Rob Egan Eleazar Eskin Evangelos Georganas Eugene Goltsman Melissa A. Gray Lars Hestbjerg Hansen Steven Hofmeyr Pingqin Huang Luiz Irber Huijue Jia Tue Sparholt Jørgensen Silas Kieser Terje Klemetsen Axel Kola Mikhail Kolmogorov Anton Korobeynikov Jason C. Kwan Nathan LaPierre Claire Lemaitre Chenhao Li Antoine Limasset Fábio Malcher Miranda Serghei Mangul Vanessa R. Marcelino Camille Marchet Pierre Marijon Dmitry Meleshko Daniel R. Mende Alessio Milanese Niranjan Nagarajan Jakob Nybo Nissen Sergey Nurk Leonid Oliker Lucas Paoli Pierre Peterlongo Vitor C. Piro Jacob S. Porter Simon Rasmussen Evan Rees Knut Reinert Bernhard Y. Renard Espen Mikal Robertsen Gail Rosen Hans‐Joachim Ruscheweyh Varuni Sarwal Nicola Segata Enrico Seiler Lizhen Shi Fengzhu Sun Shinichi Sunagawa Søren J. Sørensen Ashleigh Thomas Chengxuan Tong Mirko Trajkovski Julien Tremblay Gherman Uritskiy Riccardo Vicedomini Zhengyang Wang Ziye Wang Zhong Wang Andrew Warren Nils Peder Willassen Katherine Yelick Ronghui You Georg Zeller Zhengqiao Zhao Shanfeng Zhu Jie Zhu Rubén Garrido‐Oter Petra Gastmeier Stéphane Hacquard Susanne Häußler Ariane Khaledi Friederike Maechler Fantin Mesny Simona Radutoiu Paul Schulze‐Lefert Nathiana Smit Till Strowig

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative Critical Assessment Metagenome Interpretation (CAMI). The CAMI II challenge engaged community to assess methods on realistic complex datasets with long- short-read sequences, created computationally from around 1,700 new known genomes, as well 600 plasmids viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due...

10.1038/s41592-022-01431-4 article EN cc-by Nature Methods 2022-04-01

Abstract Background In this paper, we address the problem of identifying and quantifying polymorphisms in RNA-seq data when no reference genome is available, without assembling full transcripts. Based on fundamental idea that each polymorphism corresponds to a recognisable pattern De Bruijn graph constructed from reads, propose general model for all such graphs. We then introduce an exact algorithm, called K IS S PLICE , extract alternative splicing events. Results show enables identify more...

10.1186/1471-2105-13-s6-s5 article EN cc-by BMC Bioinformatics 2012-04-19

Background Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely a small subset of the sequences that can be associated known organisms. On other hand, de novo methods, compare whole sets sequences, either do not up ambitious provide precise and exhaustive results. Methods These limitations...

10.7717/peerj-cs.94 article EN cc-by PeerJ Computer Science 2016-11-14

Detecting single nucleotide polymorphisms (SNPs) between genomes is becoming a routine task with next-generation sequencing. Generally, SNP detection methods use reference genome. As non-model organisms are increasingly investigated, the need for reference-free has been amplified. Most of existing have fundamental limitations: they can only call SNPs exactly two datasets, and/or require prohibitive amount computational resources. The method we propose, discoSnp, detects both heterozygous and...

10.1093/nar/gku1187 article EN cc-by Nucleic Acids Research 2014-11-17

Abstract Background Literature reports that mature microRNA (miRNA) can be methylated at adenosine, guanosine and cytosine. However, the molecular mechanisms involved in cytosine methylation of miRNAs have not yet been fully elucidated. Here we investigated biological role underlying mechanism glioblastoma multiforme (GBM). Methods RNA immunoprecipitation with anti-5methylcytosine (5mC) antibody followed by Array, ELISA, dot blot, incorporation a radio-labelled methyl group miRNA, miRNA...

10.1186/s12943-020-01155-z article EN cc-by Molecular Cancer 2020-02-25

Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution microscopic whose biogeographical patterns long been debated. Here we assessed global structure plankton geography and its relation to biological, chemical, physical context ocean (the ‘seascape’) by analyzing metagenomes communities sampled across oceans during Tara Oceans expedition, in light environmental data current...

10.7554/elife.78129 article EN cc-by eLife 2022-08-03

Progress in genetics and breeding pea still suffers from the limited availability of molecular resources. SNP markers that can be identified through affordable sequencing processes, without need for prior genome reduction or a reference to assemble data would allow discovery genetic mapping thousands markers. Such an approach could significantly speed up studies marker assisted non-model species. A total 419,024 SNPs were discovered using HiSeq whole four lines, followed by direct...

10.1186/s12864-016-2447-2 article EN cc-by BMC Genomics 2016-02-18

Abstract Motivation: Efficient and fast next-generation sequencing (NGS) algorithms are essential to analyze the terabytes of data generated by NGS machines. A serious bottleneck can be design such algorithms, as they require sophisticated structures advanced hardware implementation. Results: We propose an open-source library dedicated genome assembly analysis fasten process developing efficient software. The is based on a recent optimized de-Bruijn graph implementation allowing complex...

10.1093/bioinformatics/btu406 article EN cc-by-nc Bioinformatics 2014-07-01

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...

10.1101/043430 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2016-03-12

Abstract Background Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not assemble them. In practice, many published genome sequences remain in the state of a large set contigs. Each contig describes found along some path assembly graph, however, contigs does record all information contained that graph. Although subsequent analyses can be performed with contigs, one may ask whether mapping reads on is as informative them paths Currently, lacks...

10.1186/s12859-016-1103-9 article EN cc-by BMC Bioinformatics 2016-06-16

When indexing large collections of short-read sequencing data, a common operation that has now been implemented in several tools (Sequence Bloom Trees and variants, BIGSI) is to construct collection filters, one per sample. Each filter used represent set

10.1093/bioadv/vbac029 article EN cc-by Bioinformatics Advances 2022-01-01

Study of meta-transcriptomic datasets involving non-model organisms represents bioinformatic challenges. The production chimeric sequences and our inability to distinguish the taxonomic origins produced are inherent recurrent difficulties in de novo assembly analyses. As study holobiont meta-transcriptomes is affected by challenges invoked above, we propose an innovative approach tackle such tested it on marine models as a proof concept. We considered three models, which two transcriptomes...

10.1186/s40168-018-0481-9 article EN cc-by Microbiome 2018-06-09

Abstract In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets evaluation metrics complicates proper performance assessment. The Critical Assessment Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on unprecedented complexity realism. Benchmark metagenomes were...

10.1101/099127 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2017-01-09

Nowadays, metagenomic sample analyses are mainly achieved by comparing them with a priori knowledge stored in data banks. While powerful, such approaches do not allow to exploit unknown and/or "unculturable" species, for instance estimated at 99% Bacteria. This work introduces Compareads, de novo comparative approach that returns the reads similar between two possibly datasets generated High Throughput Sequencers. One originality of this consists its ability deal huge datasets. The second...

10.1186/1471-2105-13-s19-s10 article EN cc-by BMC Bioinformatics 2012-12-01

Abstract Biogeographical studies have traditionally focused on readily visible organisms, but recent technological advances are enabling analyses of the large-scale distribution microscopic whose biogeographical patterns long been debated. Here we assessed global structure plankton geography and its relation to biological, chemical physical context ocean (the ‘seascape’) by analyzing metagenomes communities sampled across oceans during Tara Oceans expedition, in light environmental data...

10.1101/867739 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2019-12-06

The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions short sequence fragments newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers) typically employed to process such data. However, these require memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete...

10.1186/1471-2105-13-48 article EN cc-by BMC Bioinformatics 2012-03-23

Abstract Motivation Next Generation Sequencing (NGS) data provide an unprecedented access to life mechanisms. In particular, these enable detect polymorphisms such as SNPs and indels. As represent a fundamental source of information in agronomy, environment or medicine, their detection NGS is now routine task. The main methods for prediction usually need reference genome. However, non-model organisms highly divergent genomes cancer studies are extensively investigated. Results We propose...

10.1101/209965 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2017-10-27

Minimal perfect hash functions provide space-efficient and collision-free hashing on static sets. Existing algorithms implementations that build such have practical limitations the number of input elements they can process, due to high construction time, RAM or external memory usage. We revisit a simple algorithm show it is highly competitive with state art, especially in terms time parallel C++ implementation called BBhash. It capable creating minimal function $10^{10}$ less than 7 minutes...

10.48550/arxiv.1702.03154 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Metagenomics offers a way to analyze biotopes at the genomic level and reach functional taxonomical conclusions. The bio-analyzes of large metagenomic projects face critical limitations: complex metagenomes cannot be assembled or annotations are much smaller than real biological diversity. This motivated development de novo read comparison approaches extract information contained in datasets. However, these new do not scale up projects, generate an important number intermediate result files....

10.1109/bibm.2014.6999135 article EN 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2014-11-01

Long-read sequencing currently provides sequences of several thousand base pairs. It is therefore possible to obtain complete transcripts, offering an unprecedented vision the cellular transcriptome. However literature lacks tools for de novo clustering such data, in particular Oxford Nanopore Technologies reads, because inherent high error rate compared short reads. Our goal process reads from whole transcriptome data accurately and without a reference genome order reliably group coming...

10.1093/nar/gky834 article EN cc-by Nucleic Acids Research 2018-09-11

Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well large datasets or consider reads mere suites of k-mers, without taking into account their full-length sequence information.We propose a new method to correct short using de Bruijn graphs implement it tool called Bcool. As first step, Bcool constructs compacted graph from the reads. This...

10.1093/bioinformatics/btz102 article EN Bioinformatics 2019-02-18
Coming Soon ...