Camille Marchet

ORCID: 0000-0002-7235-7346
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Algorithms and Data Compression
  • RNA and protein synthesis mechanisms
  • Molecular Biology Techniques and Applications
  • Gene expression and cancer classification
  • Microbial Community Ecology and Physiology
  • RNA modifications and cancer
  • Machine Learning in Bioinformatics
  • Environmental DNA in Biodiversity Studies
  • Marine and environmental studies
  • RNA Research and Splicing
  • Bioinformatics and Genomic Networks
  • Error Correcting Code Techniques
  • Protist diversity and phylogeny
  • Cancer-related molecular mechanisms research
  • DNA and Biological Computing
  • Advanced Image and Video Retrieval Techniques
  • Bacteriophages and microbial interactions
  • Data Mining Algorithms and Applications
  • Network Packet Processing and Optimization
  • Fractal and DNA sequence analysis
  • Color Science and Applications
  • Genetics, Bioinformatics, and Biomedical Research
  • Web Data Mining and Analysis
  • Natural Language Processing Techniques

Centre de Recherche en Informatique
2019-2025

Université de Lille
2018-2025

Centre National de la Recherche Scientifique
2016-2025

École Centrale de Lille
2024

Institut des Sciences de l'Information et de leurs Interactions
2023-2024

Centre de Recherche en Informatique, Signal et Automatique de Lille
2019-2022

Institut de Recherche en Informatique et Systèmes Aléatoires
2016-2019

Institut national de recherche en informatique et en automatique
2014-2019

Université de Rennes
2017-2019

Genomics (United Kingdom)
2016-2019

Fernando Meyer Adrian Fritz Zhi-Luo Deng David Koslicki Till Robin Lesker and 95 more Alexey Gurevich Gary Robertson Mohammed Alser Dmitry Antipov Francesco Beghini Denis Bertrand Jaqueline Brito C. Titus Brown Jan P. Buchmann Aydın Buluç Bo Chen Rayan Chikhi Philip T. L. C. Clausen Alexandru Cristian Piotr Wojciech Dąbrowski Aaron E. Darling Rob Egan Eleazar Eskin Evangelos Georganas Eugene Goltsman Melissa A. Gray Lars Hestbjerg Hansen Steven Hofmeyr Pingqin Huang Luiz Irber Huijue Jia Tue Sparholt Jørgensen Silas Kieser Terje Klemetsen Axel Kola Mikhail Kolmogorov Anton Korobeynikov Jason C. Kwan Nathan LaPierre Claire Lemaitre Chenhao Li Antoine Limasset Fábio Malcher Miranda Serghei Mangul Vanessa R. Marcelino Camille Marchet Pierre Marijon Dmitry Meleshko Daniel R. Mende Alessio Milanese Niranjan Nagarajan Jakob Nybo Nissen Sergey Nurk Leonid Oliker Lucas Paoli Pierre Peterlongo Vitor C. Piro Jacob S. Porter Simon Rasmussen Evan Rees Knut Reinert Bernhard Y. Renard Espen Mikal Robertsen Gail Rosen Hans‐Joachim Ruscheweyh Varuni Sarwal Nicola Segata Enrico Seiler Lizhen Shi Fengzhu Sun Shinichi Sunagawa Søren J. Sørensen Ashleigh Thomas Chengxuan Tong Mirko Trajkovski Julien Tremblay Gherman Uritskiy Riccardo Vicedomini Zhengyang Wang Ziye Wang Zhong Wang Andrew Warren Nils Peder Willassen Katherine Yelick Ronghui You Georg Zeller Zhengqiao Zhao Shanfeng Zhu Jie Zhu Rubén Garrido‐Oter Petra Gastmeier Stéphane Hacquard Susanne Häußler Ariane Khaledi Friederike Maechler Fantin Mesny Simona Radutoiu Paul Schulze‐Lefert Nathiana Smit Till Strowig

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative Critical Assessment Metagenome Interpretation (CAMI). The CAMI II challenge engaged community to assess methods on realistic complex datasets with long- short-read sequences, created computationally from around 1,700 new known genomes, as well 600 plasmids viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due...

10.1038/s41592-022-01431-4 article EN cc-by Nature Methods 2022-04-01

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them currently well developed model species, but rely on the availability of (good) reference genome, and therefore cannot be applied non-model species. They also mostly tailored whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can used as cheaper alternative which already enables located transcribed...

10.1093/nar/gkw655 article EN cc-by-nc Nucleic Acids Research 2016-07-25

In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across collection datasets. To the best our knowledge, other methods have so far been unable to record efficiently large

10.1093/bioinformatics/btaa487 article EN cc-by-nc Bioinformatics 2020-05-06

Abstract Third-generation sequencing technologies allow to sequence long reads of tens kbp, that are expected solve various problems. However, they display high error rates, currently capped around 10%. Self-correction is thus regularly used in analysis projects. We introduce CONSENT, a new self-correction method relies both on multiple alignment and local de Bruijn graphs. To ensure scalability, computation benefits from efficient segmentation strategy, allowing massive speedup. CONSENT...

10.1038/s41598-020-80757-5 article EN cc-by Scientific Reports 2021-01-12

Study of meta-transcriptomic datasets involving non-model organisms represents bioinformatic challenges. The production chimeric sequences and our inability to distinguish the taxonomic origins produced are inherent recurrent difficulties in de novo assembly analyses. As study holobiont meta-transcriptomes is affected by challenges invoked above, we propose an innovative approach tackle such tested it on marine models as a proof concept. We considered three models, which two transcriptomes...

10.1186/s40168-018-0481-9 article EN cc-by Microbiome 2018-06-09

Abstract Analyzing the immense diversity of RNA isoforms in large RNA-seq repositories requires laborious data processing using specialized tools. Indexing techniques based on k-mers have previously been effective at searching for sequences across thousands libraries but falling short enabling direct quantification. We show here that RNAs queried form k-mer sets can be quantified seconds, with a precision akin to conventional quantification methods. showcase several applications by exploring...

10.1101/2024.02.27.581927 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-03-01

The AF9 (protein AF9) transcription factor, encoded by MLLT3 (mixed-lineage leukemia translocated to 3) on chromosome 9, functions as a chromatin reader. Through its N-terminal YEATS (Yaf9, ENL, AF9, Taf14, and Sas5) protein domain, it interacts with acetylated [1] or crotonylated [2] histone H3, well the PAF1 (RNA polymerase II-associated factor 1 homolog) P-TEFb (positive elongation b) components of super complex (SEC). also through poly-serine domain (Poly-Ser) TFIID (Transcription II D)...

10.1002/cac2.12650 article EN cc-by-nc-nd Cancer Communications 2025-01-03

Abstract The exponential increase in publicly available sequencing data and genomic resources necessitates the development of highly efficient methods for processing analysis. Locality-sensitive hashing techniques have successfully transformed large datasets into smaller, more manageable sketches while maintaining comparability using metrics such as Jaccard containment indices. However, fixed-size encounter difficulties when applied to divergent datasets. Scalable sketching methods, ,...

10.1186/s13015-024-00268-0 article EN cc-by Algorithms for Molecular Biology 2025-02-08

Abstract Genome-wide analyses estimate that more than 90% of multi exonic human genes produce at least two transcripts through alternative splicing (AS). Various bioinformatics methods are available to analyze AS from RNAseq data. Most start by mapping the reads an annotated reference genome, but some a de novo assembly reads. In this paper, we present systematic comparison mapping-first approach (F RL ine ) and assembly-first (K is S plice ). We applied these independent datasets found...

10.1038/s41598-018-21770-7 article EN cc-by Scientific Reports 2018-03-05

A plethora of methods and applications share the fundamental need to associate information words for high-throughput sequence analysis. Doing so billions k-mers is commonly a scalability problem, as exact associative indexes can be memory expensive. Recent works take advantage overlaps between leverage this challenge. Yet, existing data structures are either unable or not lightweight enough.We present BLight, static structure able unique identifiers determine their membership in set without...

10.1093/bioinformatics/btab217 article EN Bioinformatics 2021-04-02

Abstract Motivation The Sequence Read Archive public database has reached 45 petabytes of raw sequences and doubles its nucleotide content every 2 years. Although BLAST-like methods can routinely search for a sequence in small collection genomes, making searchable immense resources accessible is beyond the reach alignment-based strategies. In recent years, abundant literature tackled task finding extensive collections using k-mer-based At present, most scalable are approximate membership...

10.1093/bioinformatics/btad225 article EN cc-by Bioinformatics 2023-06-01

Long-read sequencing currently provides sequences of several thousand base pairs. It is therefore possible to obtain complete transcripts, offering an unprecedented vision the cellular transcriptome. However literature lacks tools for de novo clustering such data, in particular Oxford Nanopore Technologies reads, because inherent high error rate compared short reads. Our goal process reads from whole transcriptome data accurately and without a reference genome order reliably group coming...

10.1093/nar/gky834 article EN cc-by Nucleic Acids Research 2018-09-11

In this paper, we introduce the Conway-Bromage-Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k -mer sets. Originating from Conway Bromage’s concept, CBL innovatively employs smallest cyclic rotations of -mers, akin to Lyndon words, leverage lexicographic redundancies. order support operations set operations, propose bit vector structure that draws parallel with Elias-Fano’s scheme. This is encapsulated in Rust library, demonstrating balanced blend...

10.1101/2024.01.29.577700 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-01-31

Abstract Summary In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k-mer sets. Originating from Conway Bromage’s concept, CBL innovatively employs smallest cyclic rotations of k-mers, akin to Lyndon words, leverage lexicographic redundancies. order support operations set operations, propose bit vector structure that draws parallel with Elias-Fano’s scheme. This is encapsulated in Rust library, demonstrating...

10.1093/bioinformatics/btae217 article EN cc-by Bioinformatics 2024-04-14

A bstract We propose Cdbgtricks , a new method for updating compacted de Bruijn graph when adding novel sequences, such as full genomes. Our indexes the graph, enabling to identify in constant time location (unitig and offset) of any k -mer. The update operation that we also updates index. results show is faster than Bifrost GGCAT . benefit from index provide functionalities, reporting subgraph shares desired percentage -mers with query sequence ability set reads. open-source software...

10.1101/2024.05.24.595676 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-05-28

The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In transcriptome RNA-seq reads, on other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences transcriptomics, they do create ambiguities confuse assemblers if not addressed properly. Most short reads based Bruijn graphs (DBG) no clear explicit model for data, relying instead heuristics them. results work...

10.1186/s13015-017-0091-2 article EN cc-by Algorithms for Molecular Biology 2017-02-22
Fernando Meyer Adrian Fritz Zhi-Luo Deng David Koslicki Alexey Gurevich and 95 more Gary Robertson Mohammed Alser Dmitry Antipov Francesco Beghini Denis Bertrand Jaqueline Brito Christopher T. Brown Jan P. Buchmann Aydın Buluç B. Chen Rayan Chikhi Philip T. L. C. Clausen A. Cristian Piotr Wojciech Dąbrowski Aaron E. Darling Rob Egan Eleazar Eskin Evangelos Georganas Eugene Goltsman Melissa A. Gray Lars Hestbjerg Hansen Steven Hofmeyr Pei‐Qiang Huang Luiz Irber Huijue Jia Tue Sparholt Jørgensen Silas Kieser Terje Klemetsen Axel Kola Mikhail Kolmogorov Anton Korobeynikov Jason C. Kwan Nathan LaPierre Claire Lemaitre C. Li Antoine Limasset Fábio Malcher Miranda Serghei Mangul Vanessa R. Marcelino Camille Marchet Pierre Marijon Dmitry Meleshko Daniel R. Mende Alessio Milanese Niranjan Nagarajan Jakob Nybo Nissen Sergey Nurk Leonid Oliker Lucas Paoli Pierre Peterlongo Vitor C. Piro Jacob Porter Simon Rasmussen Evan Rees Knut Reinert Bernhard Y. Renard Espen Mikal Robertsen Gail Rosen Hans‐Joachim Ruscheweyh Varuni Sarwal Nicola Segata Enrico Seiler Lizhen Shi Fengzhu Sun Shinichi Sunagawa Søren J. Sørensen Ashleigh Thomas Catherine Tong Mirko Trajkovski Julien Tremblay Gherman Uritskiy Riccardo Vicedomini Zi. Wang Zhe Wang Zho. Wang Andrew Warren Nils Peder Willassen Katherine Yelick Ronghui You Georg Zeller Z. Zhao Shanfeng Zhu Jie Zhu Rubén Garrido‐Oter Petra Gastmeier Stéphane Hacquard Susanne Häußler Ariane Khaledi Friederike Maechler Fantin Mesny Simona Radutoiu Paul Schulze‐Lefert Nathiana Smit Till Strowig Andreas Bremges

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the community-driven initiative Critical Assessment Metagenome Interpretation (CAMI). In its second challenge, CAMI engaged community to assess their methods on realistic complex datasets with long short reads, created from ∼1,700 novel known microbial genomes, as well ∼600 plasmids viruses. Altogether 5,002 results by 76 program versions were analyzed, representing a 22x increase in...

10.1101/2021.07.12.451567 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-07-12

Abstract It has been ten years since the first publication of a method dedicated entirely to mapping third-generation sequencing long-reads. The unprecedented characteristics this new type data created shift, and methods moved on from seed-and-extend framework previously used for short reads seed-and-chain due abundance seeds in each read. As result, main novelties proposed long-read algorithms are typically based alternative seed constructs or chaining formulations. Dozens tools now exist,...

10.1101/2022.05.21.492932 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-05-22

Motivation Third-generation sequencing technologies Pacific Biosciences and Oxford Nanopore allow the of long reads tens kbp, that are expected to solve various problems, such as contig haplotype assembly, scaffolding, structural variant calling. However, they also display high error rates can reach 10 30%, for basic ONT non-CCS PacBio reads. As a result, correction is often first step projects dealing with experiments produced displaying higher than 15% on average, most methods relied...

10.1101/546630 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-02-11
Coming Soon ...