Sam Kovaka

ORCID: 0000-0002-4835-8023
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • RNA modifications and cancer
  • RNA and protein synthesis mechanisms
  • Algorithms and Data Compression
  • Nanopore and Nanochannel Transport Studies
  • Mycorrhizal Fungi and Plant Interactions
  • Fungal Biology and Applications
  • Semantic Web and Ontologies
  • Molecular Biology Techniques and Applications
  • Data Management and Algorithms
  • Cancer-related molecular mechanisms research
  • Genetic Syndromes and Imprinting
  • Protist diversity and phylogeny
  • Gene expression and cancer classification
  • Fibroblast Growth Factor Research
  • Web Data Mining and Analysis
  • Circular RNAs in diseases
  • Cancer Genomics and Diagnostics
  • Epigenetics and DNA Methylation
  • Lichen and fungal ecology
  • Advanced biosensing and bioanalysis techniques
  • MicroRNA in disease regulation
  • Connective tissue disorders research

Johns Hopkins University
2018-2025

Clark University
2018

RNA sequencing using the latest single-molecule instruments produces reads that are thousands of nucleotides long. The ability to assemble these long can greatly improve sensitivity long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler works with both short and reads. StringTie2 includes new methods handle high error rate offers work full-length super-reads assembled from reads, which further improves quality short-read assemblies. is more accurate faster...

10.1186/s13059-019-1910-1 article EN cc-by Genome biology 2019-12-01
David Pellerin Giulia Gobbo Madeline Couse Egor Dolzhenko Sathiji Nageshwaran and 95 more Warren Cheung Isaac Xu Marie-Josée Dicaire Guinevere Spurdens Gabriel Matos‐Rodrigues Igor Stevanovski Carolin K. Scriba Adriana Rebelo Virginie Roth Marion Wandzel Céline Bonnet Catherine Ashton Aman Agarwal Cyril Peter Dan Hasson Nadejda M. Tsankova Ken Dewar Phillipa J. Lamont Nigel G. Laing Mathilde Renaud Henry Houlden Matthis Synofzik Karen Usdin André Nussenzweig Марек Напиерала Zhao Chen Hong Jiang Ira W. Deveson Gianina Ravenscroft Schahram Akbarian Michael A. Eberle Kym M. Boycott Tomi Pastinen Emily Bateman Chelsea Berngruber Fabio Cunial Colleen Davis Huyen Dinh HarshaVardhan Doddapaneni Kim K. Doheny Shannon Dugan‐Perez Tara Dutka Evan E. Eichler Philip E. Empey Sarah Fazal Chris Frazar Kiran Garimella Jessica Gearhart Richard C. Gibbs Jane Grimwood Namrata Gupta Salina K. Hall Yi Han William T. Harvey Jess Hosea PingHsun Hsieh Jianhong Hu Yongqing Huang James C. M. Hwang Michal Bogumil Izydorczyk Hyeonsoo Jeong Ziad Khan Sarah Kirkpatrick Michelle Kokosinski Sam Kovaka Nehir Edibe Kurtas Rebecca Lakatos Emily L. LaPlante Samuel K. Lee Niall J. Lennon Shawn Levy Qiuhui Li Lee Lichtenstein Glennis A. Logsdon Chris Lord Ryan Lorig-Roach Medhat Madmoud Anant Maheshwari Beth Marosy Heer H. Mehta Ginger Metcalf David W. Mohr Carolina Montaño Luke B Morina Yulia Mostovoy Anjene Musick Donna M. Muzny Shane Neph Justin Paschall Karynne Patterson A. Pionzio David Porubský Nripesh Prasad Allison N. Rozanski Alba Sanchis-Juan

10.1038/s41588-024-01808-5 article EN Nature Genetics 2024-06-27

Abstract Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic/transcriptomic epigenetic information without additional library preparation. Presently, only a limited set can be directly basecalled (e.g. 5-methylcytosine), while most others require exploratory methods that often begin with alignment nanopore to reference. We present Uncalled4, toolkit for alignment, analysis, visualization. Uncalled4...

10.1101/2024.03.05.583511 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-03-11

Circular RNAs (circRNAs) are a new class of RNA involved in multiple human malignancies. However, limited information exists regarding the involvement circRNAs gastric carcinoma (GC). Therefore, we sought to identify novel circRNAs, their functions and mechanisms carcinogenesis. We analyzed next-generation sequencing data from GC tissues cell lines, identifying 75,201 candidate circRNAs. Among these, focused on one circRNA, circNF1 , which was upregulated lines. Loss- gain-of-function...

10.1530/erc-18-0478 article EN Endocrine Related Cancer 2018-12-21

Abstract ReadUntil sequencing allows nanopore devices to selectively eject individual reads from the pore in real-time. This could enable purely computational targeted sequencing, however most mapping methods require basecalling, which is computationally intensive. Here we present UNCALLED ( github.com/skovaka/UNCALLED ), an open-source mapper that rapidly matches streaming current signals a reference sequence. probabilistically considers k-mers signal represent, and then prunes candidates...

10.1101/2020.02.03.931923 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2020-02-03

Abstract Summary Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck basecalling. But past methods signal-based do not scale efficiently to large, repetitive references like pangenomes, limiting their utility partial or individual genomes. We introduce Sigmoni: a rapid, multiclass method based on r-index...

10.1093/bioinformatics/btae213 article EN cc-by Bioinformatics 2024-04-11

Abstract Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition conservation within pangenomes have limitations. Methods based on graph require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes k -mers de Bruijn graphs limited answering questions at specific substring length . We present Maximal Exact Match Ordered (MEMO), pangenome...

10.1186/s13015-025-00272-y article EN cc-by Algorithms for Molecular Biology 2025-03-01

Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic epigenetic information without additional library preparation. At present, only a limited set can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment nanopore to reference. We present Uncalled4, toolkit for alignment, visualization. Uncalled4 features an...

10.1038/s41592-025-02631-4 article EN cc-by Nature Methods 2025-03-28

Lentinus tigrinus is a species of wood-decaying fungi (Polyporales) that has an agaricoid form (a gilled mushroom) and secotioid (puffball-like, with enclosed spore-bearing structures). Previous studies suggested the conferred by recessive allele single locus. We sequenced genomes one (Aga) strain (Sec) (39.53-39.88 Mb, 15,581-15,380 genes, respectively). mated Sec Aga monokaryons, genotyped progeny, performed bulked segregant analysis (BSA). also fruited three Sec/Sec Aga/Aga dikaryons,...

10.1093/gbe/evy246 article EN cc-by-nc Genome Biology and Evolution 2018-11-03

Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze data real time and signal eject "nontarget" DNA molecules. We present novel method called SPUMONI, which enables rapid accurate using efficient pan-genome indexes. SPUMONI uses compressed index rapidly generate exact or approximate matching statistics streaming fashion. When used target...

10.1016/j.isci.2021.102696 article EN cc-by iScience 2021-06-01

Abstract RNA sequencing using the latest single-molecule instruments produces reads that are thousands of nucleotides long. The ability to assemble these long can greatly improve sensitivity long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler works with both short and reads. StringTie2 includes new computational methods handle high error rate technology, which previous assemblers could not tolerate. It also offers work full-length super-reads assembled...

10.1101/694554 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2019-07-08

Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore technologies offer the potential for broader clinical utility, example smaller hospitals, due to lower instrument cost, higher portability, ease use. Nonetheless, devices are limited retrievable reads/molecules compared platforms, limiting CNA inference...

10.1093/nar/gkab812 article EN cc-by Nucleic Acids Research 2021-09-09

Improvements in nanopore sequencing necessitate efficient classification methods, including pre-filtering and adaptive sampling algorithms that enrich for reads of interest. Signal-based approaches circumvent the computational bottleneck basecalling. But past methods signal-based do not scale efficiently to large, repetitive references like pangenomes, limiting their utility partial or individual genomes. We introduce Sigmoni: a rapid, multiclass method based on

10.1101/2023.08.15.553308 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-08-17

ABSTRACT Genome copy number is an important source of genetic variation in health and disease. In cancer, clinically actionable Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore technologies offer the potential for broader clinical utility, example smaller hospitals, due to lower instrument cost, higher portability, ease use. Nonetheless, devices are limited terms retrievable reads/molecules compared...

10.1101/2020.12.28.424602 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2020-12-29

Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition conservation within pangenomes have limitations. Methods based on graph require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes

10.1101/2024.05.20.595044 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-05-22

<title>Abstract</title> Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition conservation within pangenomes have limitations. Methods based on graph require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes k-mers de Bruijn graphs limited answering questions at specific substring length k. We present Maximal Exact Match Ordered (MEMO),...

10.21203/rs.3.rs-5363291/v1 preprint EN cc-by Research Square (Research Square) 2024-11-13

Abstract Nanopore sequencing is an increasingly powerful tool for genomics. Recently, computational advances have allowed nanopores to sequence in a targeted fashion; as the sequencer emits data, software can analyze data real time and signal eject “non-target” DNA molecules. We present novel method called SPUMONI, which enables rapid accurate with help of efficient pangenome indexes. SPUMONI uses compressed index rapidly generate exact or approximate matching statistics (half-maximal...

10.1101/2021.03.23.436610 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-03-23
Coming Soon ...