Grégory Kucherov

ORCID: 0000-0001-5899-5424
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Algorithms and Data Compression
  • Genomics and Phylogenetic Studies
  • semigroups and automata theory
  • DNA and Biological Computing
  • Natural Language Processing Techniques
  • RNA and protein synthesis mechanisms
  • Machine Learning in Bioinformatics
  • Gene expression and cancer classification
  • Network Packet Processing and Optimization
  • Machine Learning and Algorithms
  • Chromosomal and Genetic Variations
  • Logic, programming, and type systems
  • Microbial Natural Products and Biosynthesis
  • Genetics, Bioinformatics, and Biomedical Research
  • Alzheimer's disease research and treatments
  • Bacterial Identification and Susceptibility Testing
  • Advanced biosensing and bioanalysis techniques
  • Antibiotic Resistance in Bacteria
  • Fractal and DNA sequence analysis
  • Biochemical and Structural Characterization
  • Genome Rearrangement Algorithms
  • Caching and Content Delivery
  • Coding theory and cryptography
  • Computability, Logic, AI Algorithms
  • Genomics and Rare Diseases

Laboratoire d'Informatique Gaspard-Monge
2016-2025

Université Gustave Eiffel
2014-2025

Centre National de la Recherche Scientifique
2015-2025

Skolkovo Institute of Science and Technology
2018-2022

Paris-Est Sup
2015-2019

Université Paris Cité
2011-2019

Ben-Gurion University of the Negev
2011-2016

Institut national de recherche en informatique et en automatique
2000-2011

Centre de recherche Inria Lille - Nord Europe
2008-2011

Laboratoire d'Informatique Fondamentale de Lille
2006-2011

YASS is a DNA local alignment tool based on an efficient and sensitive filtering algorithm. It applies transition-constrained seeds to specify the most probable conserved motifs between homologous sequences, combined with flexible hit criterion used identify groups of that are likely exhibit significant alignments. A web interface ( http://www.loria.fr/projects/YASS/ ) available upload input sequences in fasta format, query program visualize results obtained several forms (dot-plot, tabular...

10.1093/nar/gki478 article EN cc-by-nc Nucleic Acids Research 2005-06-26

A repetition in a word w is subword with the period of at most half length. We study maximal repetitions occurring w, that those for which any extended has bigger period. The set such represents compact way all w. first prove combinatorial result asserting sum exponents length n bounded by linear function n. This implies, particular there only number word. allows us to construct linear-time algorithm finding repetitions. Some consequences and applications these results are discussed, as well...

10.1109/sffcs.1999.814634 article EN 2003-01-20

Norine is the first database entirely dedicated to nonribosomal peptides (NRPs). In bacteria and fungi, in addition traditional ribosomal proteic biosynthesis, an alternative ribosome-independent pathway called NRP synthesis allows peptide production. It performed by huge protein complexes synthetases (NRPSs). The molecules synthesized NRPS contain a high proportion of nonproteogenic amino acids. primary structure these not always linear but often more complex may cycles branchings. recent...

10.1093/nar/gkm792 article EN cc-by-nc Nucleic Acids Research 2007-10-03

Nonribosomal peptides (NRPs) are molecules produced by microorganisms that have a broad spectrum of biological activities and pharmaceutical applications (e.g., antibiotic, immunomodulating, antitumor activities). One particularity the NRPs is biodiversity their monomers, extending far beyond 20 proteogenic amino acid residues. Norine, comprehensive database NRPs, allowed us to review for first time main characteristics especially monomer biodiversity. Our analysis highlighted significant...

10.1128/jb.00315-10 article EN Journal of Bacteriology 2010-08-07

Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis k-mers shared between read be classified and sampled reference genomes. Within this general framework, we show work spaced seeds provide significant improvement classification accuracy as opposed traditional contiguous k-mers. We support thesis...

10.1093/bioinformatics/btv419 article EN Bioinformatics 2015-07-25

Abstract Surveillance of drug-resistant bacteria is essential for healthcare providers to deliver effective empirical antibiotic therapy. However, traditional molecular epidemiology does not typically occur on a timescale that could affect patient treatment and outcomes. Here, we present method called ‘genomic neighbour typing’ inferring the phenotype bacterial sample by identifying its closest relatives in database genomes with metadata. We show this technique can infer susceptibility...

10.1038/s41564-019-0656-6 article EN cc-by Nature Microbiology 2020-02-10

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential represent de Bruijn compactly, and several approaches this problem have been proposed recently.In work, we show how reduce the memory required by data structure Chikhi Rizk (WABI'12) that represents using Bloom filters. Our method requires 30% 40% less with respect their method, insignificant impact on construction time. At same time,...

10.1186/1748-7188-9-2 article EN cc-by Algorithms for Molecular Biology 2014-01-01

We propose a general approach to compute the seed sensitivity, that can be applied different definitions of seeds. It treats separately three components sensitivity problem — set target alignments, an associated probability distribution, and model are specified by distinct finite automata. The is then new concept subset seeds for which we efficient automaton construction. Experimental results confirm sensitive efficiently designed using our approach, used in similarity search producing...

10.1142/s0219720006001977 article EN Journal of Bioinformatics and Computational Biology 2006-04-01

Abstract de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as compact, efficient, and representation, ProphAsm, fast algorithm for their computation. For the example of assemblies model organisms two bacterial pan-genomes, compare to unitigs, best existing demonstrate that provide substantial improvement cumulative sequence length number. When combined with commonly used Burrows-Wheeler Transform index,...

10.1186/s13059-021-02297-z article EN cc-by Genome biology 2021-04-06

10.1016/j.tcs.2009.09.013 article EN publisher-specific-oa Theoretical Computer Science 2009-09-16

Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, rapid growth these has made it effectively impossible to search data using tools such as BLAST and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history guide compression efficiently large microbial existing algorithms structures. We show that, when applied modern diverse genomes, lossless improves...

10.1101/2023.04.15.536996 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2023-04-16

10.1016/s0166-218x(98)00070-5 article EN publisher-specific-oa Discrete Applied Mathematics 1998-11-01

The hit criterion is a key component of heuristic local alignment algorithms. It specifies class patterns assumed to witness potential similarity, and this choice decisive for the selectivity sensitivity whole method.In paper, we propose two ways improve criterion. First, define group combining advantages single-seed double-seed approaches used in existing Second, introduce transition-constrained seeds that extend spaced by possibility distinguishing transition transversion mismatches. We...

10.1186/1471-2105-5-149 article EN cc-by BMC Bioinformatics 2004-10-14

10.1016/s0304-3975(02)00448-6 article EN Theoretical Computer Science 2003-04-23

We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The is based on simultaneous use several spaced seeds rather than single seed as studied by Burkhardt Karkkainen. present algorithms to compute important parameters families, their combinatorial properties, describe techniques construct efficient families. also report large-scale application the proposed technique problem oligonucleotide selection an EST sequence database.

10.1109/tcbb.2005.12 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2005-01-01

Although modern high-throughput biomolecular technologies produce various types of data, biosequence data remain at the core bioinformatic analyses. However, computational techniques for dealing with this evolved dramatically.In bird's-eye review, we overview evolution main algorithmic comparing and searching biological sequences. We highlight key ideas emerged in response to several interconnected factors: shifts analytical paradigm, advent new sequencing a substantial increase size...

10.1093/bioinformatics/btz272 article EN Bioinformatics 2019-04-11

Analysis of genetic sequences is usually based on finding similar parts sequences, e.g. DNA reads and/or genomes. For big data, this typically done via 'seeds': simple similarities (e.g. exact matches) that can be found quickly. huge sparse seeding useful, where we only consider seeds at a subset positions in sequence.Here, study sparse-seeding method: using certain 'words' ac, at, gc or gt). Sensitivity maximized by words with minimal overlaps. That because, random sequence, minimally...

10.1093/bioinformatics/btaa1054 article EN cc-by Bioinformatics 2020-12-01

10.1016/j.tcs.2015.10.043 article EN publisher-specific-oa Theoretical Computer Science 2015-11-04
Coming Soon ...