- Genomics and Phylogenetic Studies
- Algorithms and Data Compression
- RNA and protein synthesis mechanisms
- DNA and Biological Computing
- Parallel Computing and Optimization Techniques
- Gene expression and cancer classification
- Embedded Systems Design Techniques
- Chromosomal and Genetic Variations
- Machine Learning in Bioinformatics
- Interconnection Networks and Systems
- Distributed and Parallel Computing Systems
- Advanced Data Storage Technologies
- Advanced biosensing and bioanalysis techniques
- Bacteriophages and microbial interactions
- Network Packet Processing and Optimization
- Microbial Community Ecology and Physiology
- Evolutionary Algorithms and Applications
- Molecular Biology Techniques and Applications
- Cellular Automata and Applications
- Plant Virus Research Studies
- Gut microbiota and health
- Low-power high-performance VLSI design
- Remote-Sensing Image Classification
- Genetics, Bioinformatics, and Biomedical Research
- Genomic variations and chromosomal abnormalities
Institut de Recherche en Informatique et Systèmes Aléatoires
2014-2024
Centre National de la Recherche Scientifique
2011-2024
Université de Rennes
1993-2024
Institut national de recherche en informatique et en automatique
2013-2024
Computer Algorithms for Medicine
2014-2023
Genomics (United Kingdom)
2014-2023
Indian Institute of Technology Delhi
2017
Inria Rennes - Bretagne Atlantique Research Centre
2011-2016
Université Européenne de Bretagne
2015
Pennsylvania State University
1997-2014
The Critical Assessment of Metagenome Interpretation (CAMI) community initiative presents results from its first challenge, a rigorous benchmarking software for metagenome assembly, binning and taxonomic profiling. Methods profiling are key to interpreting data, but lack consensus about complicates performance assessment. challenge has engaged the global developer benchmark their programs on highly complex realistic data sets, generated ∼700 newly sequenced microorganisms ∼600 novel viruses...
The prevailing paradigm of host-parasite evolution is that arms races lead to increasing specialisation via genetic adaptation. Insect herbivores are no exception and the majority have evolved colonise a small number closely related host species. Remarkably, green peach aphid, Myzus persicae, colonises plant species across 40 families single M. persicae clonal lineages can distantly plants. This remarkable ability makes highly destructive pest many important crop species.To investigate...
The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly such into high-quality, finished sequences remains challenging. Many tools are available, but they differ greatly in terms their performance (speed, scalability, hardware requirements, acceptance newer read technologies) final output (composition assembled sequence). More importantly, it largely unclear how best assess the quality sequences. Assemblathon competitions...
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe Assemblathon 1 competition, which aimed to comprehensively assess state art in methods when applied current technologies. In collaborative effort, teams were asked assemble simulated Illumina HiSeq data set an unknown, diploid A total 41 assemblies from 17 different groups received. Novel haplotype aware...
Abstract Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is preliminary step many bioinformatics applications. However, state art k-mer counting methods require that a large data structure resides memory. Such typically grows with number distinct to count. We present new streaming algorithm for counting, called DSK (disk k-mers), which only requires fixed user-defined amount memory and disk space. This approach realizes memory, time trade-off. The...
Background Large scale metagenomic projects aim to extract biodiversity knowledge between different environmental conditions. Current methods for comparing microbial communities face important limitations. Those based on taxonomical or functional assignation rely a small subset of the sequences that can be associated known organisms. On other hand, de novo methods, compare whole sets sequences, either do not up ambitious provide precise and exhaustive results. Methods These limitations...
Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need more efficient methods than general purpose compression tools, such as widely used gzip method.We present novel reference-free method meant to compress issued from high throughput technologies. Our approach, implemented in software LEON, employs techniques derived existing assembly principles. The based on reference probabilistic de...
Abstract Background Post-transcriptional regulation in eukaryotes can be operated through microRNA (miRNAs) mediated gene silencing. MiRNAs are small (18-25 nucleotides) non-coding RNAs that play crucial role of expression eukaryotes. In insects, miRNAs have been shown to involved multiple mechanisms such as embryonic development, tissue differentiation, metamorphosis or circadian rhythm. Insect identified different species belonging five orders: Coleoptera, Diptera, Hymenoptera, Lepidoptera...
Abstract Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts sequence data is leading a wide range new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in improve speed, whereas more flexible aligners are too slow large-scale Moreover, many current becoming inefficient as generated reads grow ever larger. Our goal with our aligner GASSST (Global Alignment Short Sequence...
Abstract Motivation: Efficient and fast next-generation sequencing (NGS) algorithms are essential to analyze the terabytes of data generated by NGS machines. A serious bottleneck can be design such algorithms, as they require sophisticated structures advanced hardware implementation. Results: We propose an open-source library dedicated genome assembly analysis fasten process developing efficient software. The is based on a recent optimized de-Bruijn graph implementation allowing complex...
The Streams-C compiler ([5]) synthesizes hardware circuits for reconfigurable FPGA-based computers from parallel C programs. language consists of a small number libraries and intrinsic functions added to synthesizable subset C, supports communicating process programming model. processes may be either software or processes, the manages communication among transparently programmer. For generates Register-Transfer-Level (RTL) VHDL, targeting multiple FPGAs with dedicated memories....
Sequence similarity searching is an important and challenging task in molecular biology next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At same time, internal architecture current microprocessors tending towards more parallelism, leading use chips with two, four cores integrated on die. The main purpose this work was design effective algorithm fit parallel capabilities modern microprocessors. A comparing large genomic...
Abstract In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets evaluation metrics complicates proper performance assessment. The Critical Assessment Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on unprecedented complexity realism. Benchmark metagenomes were...
Nowadays, metagenomic sample analyses are mainly achieved by comparing them with a priori knowledge stored in data banks. While powerful, such approaches do not allow to exploit unknown and/or "unculturable" species, for instance estimated at 99% Bacteria. This work introduces Compareads, de novo comparative approach that returns the reads similar between two possibly datasets generated High Throughput Sequencers. One originality of this consists its ability deal huge datasets. The second...
Abstract Understanding the molecular evolution of genes involved in parasite adaptation and role transposable elements (TEs) driving their diversification is key to unraveling how populations adapt environments. In phytophagous insects like aphids, olfactory (OR) gustatory receptor (GR) are crucial for host recognition, yet post-duplication remains insufficiently explored. Here, we analyzed 521 OR 399 GR genes, alongside TEs, across 12 aphid genomes with varying ranges. Aphid lineages...
Metagenomics offers a way to analyze biotopes at the genomic level and reach functional taxonomical conclusions. The bio-analyzes of large metagenomic projects face critical limitations: complex metagenomes cannot be assembled or annotations are much smaller than real biological diversity. This motivated development de novo read comparison approaches extract information contained in datasets. However, these new do not scale up projects, generate an important number intermediate result files....
SAMBA (Systolic Accelerator for Molecular Biological Applications) is a 128 processor hardware accelerator speeding up the sequence comparison process. The short-term objective to provide low-cost board boost PC or workstation performance on this class of applications. This paper places amongst other existing systems and highlights original features. Real obtained from prototype demonstrated. For example, 300 amino acids scanned against SWISS-PROT-34 (21 210389 residues) in 30 s using Smith...
We describe a new algorithm for solving the all-pairs shortest-path (APSP) problem planar graphs and with small separators that exploits massive on-chip parallelism available in today's Graphics Processing Units (GPUs). Our algorithm, based on Floyd-War shall has near optimal complexity terms of total number operations, while its matrix-based structure is regular enough to allow efficient parallel implementation GPUs. By applying divide-and-conquer approach, we are able make use multi-node...
Next generation sequencing technologies produce large amounts of data at very low cost. They short reads DNA fragments. These fragments have many overlaps, lots repeats and may also include errors. The assembly process involves merging these sequences to form the original sequences. In recent years software programs been developed for this purpose. All them take significant amount time execute. Velvet is a commonly used de novo program. We propose method reduce overall by using...
This paper presents the implementation of a mapping algorithm on new Processing-in-Memory (PIM) architecture developed by UPMEM Company. UPMEM's solution consists in adding processing units into DRAM, to minimize data access time and maximize bandwidth, order drastically accelerate data-consuming algorithms. The technology makes it possible combine 256 cores with 16 GBytes standard DIMM module. An experimentation DNA Mapping Human genome dataset shows that speed-up 25 can be obtained...
The "pixel purity index" (PPI) algorithm proposed by Boardman, et al1 identifies potential endmember pixels in multispectral imagery. generates a large number of "skewers" (unit vectors random directions), and then computes the dot product each skewer with pixel. PPI is incremented for those associated extreme values products. A small (a subset largest values) are selected as "pure" rest image expressed linear mixtures these pure endmembers. This provides convenient physically-motivated...
This paper presents a parallel architecture for computing genomic sequence alignments using seed-based algorithms. Originality comes from the simultaneous use of FPGA components and flash memories. The technology brings computer power while memory provides high bandwidth able to feed large array specific operators. A 64 GBytes connected Xilinx Virtex-2 Pro PCI board has been developed an 160 distance-computation operators have implemented perform first step alignment Compared blast reference...