NFDI4DS | UHH-SEMS - Publication Details

Grégory Kucherov

ORCID: 0000-0001-5899-5424

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5012626407

Research Areas

Algorithms and Data Compression
Genomics and Phylogenetic Studies
semigroups and automata theory
DNA and Biological Computing
Natural Language Processing Techniques
RNA and protein synthesis mechanisms
Machine Learning in Bioinformatics
Gene expression and cancer classification
Network Packet Processing and Optimization
Machine Learning and Algorithms
Chromosomal and Genetic Variations
Logic, programming, and type systems
Microbial Natural Products and Biosynthesis
Genetics, Bioinformatics, and Biomedical Research
Alzheimer's disease research and treatments
Bacterial Identification and Susceptibility Testing
Advanced biosensing and bioanalysis techniques
Antibiotic Resistance in Bacteria
Fractal and DNA sequence analysis
Biochemical and Structural Characterization
Genome Rearrangement Algorithms
Caching and Content Delivery
Coding theory and cryptography
Computability, Logic, AI Algorithms
Genomics and Rare Diseases

Laboratoire d'Informatique Gaspard-Monge
2016-2025

Université Gustave Eiffel
2014-2025

Centre National de la Recherche Scientifique
2015-2025

Skolkovo Institute of Science and Technology
2018-2022

Paris-Est Sup
2015-2019

Université Paris Cité
2011-2019

Ben-Gurion University of the Negev
2011-2016

Institut national de recherche en informatique et en automatique
2000-2011

Centre de recherche Inria Lille - Nord Europe
2008-2011

Laboratoire d'Informatique Fondamentale de Lille
2006-2011

YASS: enhancing the sensitivity of DNA similarity search

OPENALEX - Publications

Laurent Noé Grégory Kucherov

YASS is a DNA local alignment tool based on an efficient and sensitive filtering algorithm. It applies transition-constrained seeds to specify the most probable conserved motifs between homologous sequences, combined with flexible hit criterion used identify groups of that are likely exhibit significant alignments. A web interface ( http://www.loria.fr/projects/YASS/ ) available upload input sequences in fasta format, query program visualize results obtained several forms (dot-plot, tabular...

10.1093/nar/gki478 article EN cc-by-nc Nucleic Acids Research 2005-06-26

Finding maximal repetitions in a word in linear time

OPENALEX - Publications

Roman Kolpakov Grégory Kucherov

A repetition in a word w is subword with the period of at most half length. We study maximal repetitions occurring w, that those for which any extended has bigger period. The set such represents compact way all w. first prove combinatorial result asserting sum exponents length n bounded by linear function n. This implies, particular there only number word. allows us to construct linear-time algorithm finding repetitions. Some consequences and applications these results are discussed, as well...

10.1109/sffcs.1999.814634 article EN 2003-01-20

NORINE: a database of nonribosomal peptides

OPENALEX - Publications

Ségolène Caboche Maude Pupin Valérie Leclère Arnaud Fontaine P. Jacques and 1 more

Norine is the first database entirely dedicated to nonribosomal peptides (NRPs). In bacteria and fungi, in addition traditional ribosomal proteic biosynthesis, an alternative ribosome-independent pathway called NRP synthesis allows peptide production. It performed by huge protein complexes synthetases (NRPSs). The molecules synthesized NRPS contain a high proportion of nonproteogenic amino acids. primary structure these not always linear but often more complex may cycles branchings. recent...

10.1093/nar/gkm792 article EN cc-by-nc Nucleic Acids Research 2007-10-03

Diversity of Monomers in Nonribosomal Peptides: towards the Prediction of Origin and Biological Activity

OPENALEX - Publications

Ségolène Caboche Valérie Leclère Maude Pupin Grégory Kucherov Philippe Jacques

Nonribosomal peptides (NRPs) are molecules produced by microorganisms that have a broad spectrum of biological activities and pharmaceutical applications (e.g., antibiotic, immunomodulating, antitumor activities). One particularity the NRPs is biodiversity their monomers, extending far beyond 20 proteogenic amino acid residues. Norine, comprehensive database NRPs, allowed us to review for first time main characteristics especially monomer biodiversity. Our analysis highlighted significant...

10.1128/jb.00315-10 article EN Journal of Bacteriology 2010-08-07

Spaced seeds improvek-mer-based metagenomic classification

OPENALEX - Publications

Karel Břinda Maciej Sykulski Grégory Kucherov

Metagenomics is a powerful approach to study genetic content of environmental samples that has been strongly promoted by NGS technologies. To cope with massive data involved in modern metagenomic projects, recent tools [4, 39] rely on the analysis k-mers shared between read be classified and sampled reference genomes. Within this general framework, we show work spaced seeds provide significant improvement classification accuracy as opposed traditional contiguous k-mers. We support thesis...

10.1093/bioinformatics/btv419 article EN Bioinformatics 2015-07-25

Rapid inference of antibiotic resistance and susceptibility by genomic neighbour typing

OPENALEX - Publications

Karel Břinda Alanna Callendrello C. Kevin Derek R. MacFadden Themoula Charalampous and 8 more

Abstract Surveillance of drug-resistant bacteria is essential for healthcare providers to deliver effective empirical antibiotic therapy. However, traditional molecular epidemiology does not typically occur on a timescale that could affect patient treatment and outcomes. Here, we present method called ‘genomic neighbour typing’ inferring the phenotype bacterial sample by identifying its closest relatives in database genomes with metadata. We show this technique can infer susceptibility...

10.1038/s41564-019-0656-6 article EN cc-by Nature Microbiology 2020-02-10

Using cascading Bloom filters to improve the memory usage for de Brujin graphs

OPENALEX - Publications

Kamil Salikhov Gustavo Sacomoto Grégory Kucherov

De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential represent de Bruijn compactly, and several approaches this problem have been proposed recently.In work, we show how reduce the memory required by data structure Chikhi Rizk (WABI'12) that represents using Bloom filters. Our method requires 30% 40% less with respect their method, insignificant impact on construction time. At same time,...

10.1186/1748-7188-9-2 article EN cc-by Algorithms for Molecular Biology 2014-01-01

A UNIFYING FRAMEWORK FOR SEED SENSITIVITY AND ITS APPLICATION TO SUBSET SEEDS

OPENALEX - Publications

Grégory Kucherov Laurent Noé Mikhail Roytberg

We propose a general approach to compute the seed sensitivity, that can be applied different definitions of seeds. It treats separately three components sensitivity problem — set target alignments, an associated probability distribution, and model are specified by distinct finite automata. The is then new concept subset seeds for which we efficient automaton construction. Experimental results confirm sensitive efficiently designed using our approach, used in similarity search producing...

10.1142/s0219720006001977 article EN Journal of Bioinformatics and Computational Biology 2006-04-01

Simplitigs as an efficient and scalable representation of de Bruijn graphs

OPENALEX - Publications

Karel Břinda Michael Baym Grégory Kucherov

Abstract de Bruijn graphs play an essential role in bioinformatics, yet they lack a universal scalable representation. Here, we introduce simplitigs as compact, efficient, and representation, ProphAsm, fast algorithm for their computation. For the example of assemblies model organisms two bacterial pan-genomes, compare to unitigs, best existing demonstrate that provide substantial improvement cumulative sequence length number. When combined with commonly used Burrows-Wheeler Transform index,...

10.1186/s13059-021-02297-z article EN cc-by Genome biology 2021-04-06

Optimal Reconstruction of Graphs under the Additive Model

OPENALEX - Publications

Vladimir Grebinski Grégory Kucherov

10.1007/s004530010033 article EN Algorithmica 2000-09-01

Searching for gapped palindromes

OPENALEX - Publications

Roman Kolpakov Grégory Kucherov

10.1016/j.tcs.2009.09.013 article EN publisher-specific-oa Theoretical Computer Science 2009-09-16

Efficient and Robust Search of Microbial Genomes via Phylogenetic Compression

OPENALEX - Publications

Karel Břinda Leandro Lima Simone Pignotti Natalia Quinones‐Olvera Kamil Salikhov and 4 more

Comprehensive collections approaching millions of sequenced genomes have become central information sources in the life sciences. However, rapid growth these has made it effectively impossible to search data using tools such as BLAST and its successors. Here, we present a technique called phylogenetic compression, which uses evolutionary history guide compression efficiently large microbial existing algorithms structures. We show that, when applied modern diverse genomes, lossless improves...

10.1101/2023.04.15.536996 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2023-04-16

Reconstructing a Hamiltonian cycle by querying the graph: Application to DNA physical mapping

OPENALEX - Publications

Vladimir Grebinski Grégory Kucherov

10.1016/s0166-218x(98)00070-5 article EN publisher-specific-oa Discrete Applied Mathematics 1998-11-01

Improved hit criteria for DNA local alignment

OPENALEX - Publications

Laurent Noé Grégory Kucherov

The hit criterion is a key component of heuristic local alignment algorithms. It specifies class patterns assumed to witness potential similarity, and this choice decisive for the selectivity sensitivity whole method.In paper, we propose two ways improve criterion. First, define group combining advantages single-seed double-seed approaches used in existing Second, introduce transition-constrained seeds that extend spaced by possibility distinguishing transition transversion mismatches. We...

10.1186/1471-2105-5-149 article EN cc-by BMC Bioinformatics 2004-10-14

Finding approximate repetitions under Hamming distance

OPENALEX - Publications

Roman Kolpakov Grégory Kucherov

10.1016/s0304-3975(02)00448-6 article EN Theoretical Computer Science 2003-04-23

Multiseed Lossless Filtration

OPENALEX - Publications

Grégory Kucherov Laurent Noé Mikhail Roytberg

We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The is based on simultaneous use several spaced seeds rather than single seed as studied by Burkhardt Karkkainen. present algorithms to compute important parameters families, their combinatorial properties, describe techniques construct efficient families. also report large-scale application the proposed technique problem oligonucleotide selection an EST sequence database.

10.1109/tcbb.2005.12 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2005-01-01

Evolution of biosequence search algorithms: a brief survey

OPENALEX - Publications

Grégory Kucherov

Although modern high-throughput biomolecular technologies produce various types of data, biosequence data remain at the core bioinformatic analyses. However, computational techniques for dealing with this evolved dramatically.In bird's-eye review, we overview evolution main algorithmic comparing and searching biological sequences. We highlight key ideas emerged in response to several interconnected factors: shifts analytical paradigm, advent new sequencing a substantial increase size...

10.1093/bioinformatics/btz272 article EN Bioinformatics 2019-04-11

Minimally overlapping words for sequence similarity search

OPENALEX - Publications

Martin C. Frith Laurent Noé Grégory Kucherov

Analysis of genetic sequences is usually based on finding similar parts sequences, e.g. DNA reads and/or genomes. For big data, this typically done via 'seeds': simple similarities (e.g. exact matches) that can be found quickly. huge sparse seeding useful, where we only consider seeds at a subset positions in sequence.Here, study sparse-seeding method: using certain 'words' ac, at, gc or gt). Sensitivity maximized by words with minimal overlaps. That because, random sequence, minimally...

10.1093/bioinformatics/btaa1054 article EN cc-by Bioinformatics 2020-12-01

Approximate string matching using a bidirectional index

OPENALEX - Publications

Grégory Kucherov Kamil Salikhov Dekel Tsur

10.1016/j.tcs.2015.10.043 article EN publisher-specific-oa Theoretical Computer Science 2015-11-04

Coming Soon ...