Samantha Zarate

ORCID: 0000-0001-5570-2059
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Genomic variations and chromosomal abnormalities
  • Genetics, Bioinformatics, and Biomedical Research
  • Chromosomal and Genetic Variations
  • Genomics and Rare Diseases
  • Evolutionary Algorithms and Applications
  • Cancer Genomics and Diagnostics
  • Distributed and Parallel Computing Systems
  • Advanced Data Storage Technologies
  • Gene expression and cancer classification
  • Molecular Biology Techniques and Applications
  • Scientific Computing and Data Management
  • Particle physics theoretical and experimental studies
  • Genomics and Chromatin Dynamics
  • Genetic Associations and Epidemiology
  • Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
  • Machine Learning in Bioinformatics
  • Evolution and Genetic Dynamics
  • Prenatal Screening and Diagnostics
  • Renal Diseases and Glomerulopathies
  • Blood Coagulation and Thrombosis Mechanisms
  • Hemophilia Treatment and Research
  • Atrial Fibrillation Management and Outcomes
  • Systemic Lupus Erythematosus Research
  • Biomedical Text Mining and Ontologies

Regeneron (United States)
2023-2025

Johns Hopkins University
2020-2023

DNAnexus (United States)
2018-2021

Federico Santa María Technical University
2015-2016

Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 95 more Alla Mikheenko Mitchell R. Vollger Nicolas Altemose Lev Uralsky Ariel Gershman Sergey Aganezov Savannah J. Hoyt Mark Diekhans Glennis A. Logsdon Michael Alonge Stylianos E. Antonarakis Matthew Borchers Gerard G. Bouffard Shelise Brooks Gina V. Caldas Nae-Chyun Chen Haoyu Cheng Chen-Shan Chin William Chow Leonardo Gomes de Lima Philip C. Dishuck Richard Durbin Tatiana Dvorkina Ian T. Fiddes Giulio Formenti Robert S. Fulton Arkarachai Fungtammasan Erik Garrison Patrick G. S. Grady Tina A. Graves-Lindsay Ira M. Hall Nancy F. Hansen Gabrielle A. Hartley Marina Haukness Kerstin Howe Michael W. Hunkapiller Chirag Jain Miten Jain Erich D. Jarvis Peter Kerpedjiev Melanie Kirsche Mikhail Kolmogorov Jonas Korlach Milinn Kremitzki Heng Li Valerie V. Maduro Tobias Marschall Ann M. Mc Cartney Jennifer McDaniel Danny E. Miller James C. Mullikin Eugene W. Myers Nathan D. Olson Benedict Paten Paul Peluso Pavel A. Pevzner David Porubský Tamara Potapova Е. И. Рогаев Jeffrey Rosenfeld Steven L. Salzberg Valérie Schneider Fritz J. Sedlazeck Kishwar Shafin Colin J. Shew Alaina Shumate Ying Sims Arian F. A. Smit Daniela C. Soto Ivan Sović Jessica M. Storer Aaron Streets Beth A. Sullivan Françoise Thibaud‐Nissen James Torrance Justin Wagner Brian P. Walenz Aaron M. Wenger Jonathan Wood Chunlin Xiao Stephanie M. Yan Alice Young Samantha Zarate Urvashi Surti Rajiv C. McCoy Megan Y. Dennis Ivan A. Alexandrov Jennifer L. Gerton Rachel J. O’Neill Winston Timp Justin M. Zook Michael C. Schatz Evan E. Eichler Karen H. Miga Adam M. Phillippy

Since its initial release in 2000, the human reference genome has covered only euchromatic fraction of genome, leaving important heterochromatic regions unfinished. Addressing remaining 8% Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors prior references, and introduces nearly 200 million base pairs containing 1956 gene predictions, 99 which are predicted to be...

10.1126/science.abj6987 article EN Science 2022-03-31

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands structural errors, and unlocks most complex regions human for clinical functional study. We show how this reference universally improves read mapping variant calling 3202 17 globally diverse samples sequenced with short long reads, respectively. identify hundreds variants per sample in previously unresolved regions, showcasing promise T2T-CHM13 evolutionary...

10.1126/science.abl3533 article EN Science 2022-03-31

10.1038/s41586-023-06457-y article EN Nature 2023-08-23
Sergey Nurk Sergey Koren Arang Rhie Mikko Rautiainen Andrey V. Bzikadze and 94 more Alla Mikheenko Mitchell R. Vollger Nicolas Altemose Lev Uralsky Ariel Gershman Sergey Aganezov Savannah J. Hoyt Mark Diekhans Glennis A. Logsdon Michael Alonge Stylianos E. Antonarakis Matthew Borchers Gerard G. Bouffard Shelise Brooks Gina V. Caldas Haoyu Cheng Chen-Shan Chin William Chow Leonardo Gomes de Lima Philip C. Dishuck Richard Durbin Tatiana Dvorkina Ian T. Fiddes Giulio Formenti Robert S. Fulton Arkarachai Fungtammasan Erik Garrison Patrick G. S. Grady Tina A. Graves-Lindsay Ira M. Hall Nancy F. Hansen Gabrielle A. Hartley Marina Haukness Kerstin Howe Michael W. Hunkapiller Chirag Jain Miten Jain Erich D. Jarvis Peter Kerpedjiev Melanie Kirsche Mikhail Kolmogorov Jonas Korlach Milinn Kremitzki Heng Li Valerie V. Maduro Tobias Marschall Ann M. Mc Cartney Jennifer McDaniel Danny E. Miller James C. Mullikin Eugene W. Myers Nathan D. Olson Benedict Paten Paul Peluso Pavel A. Pevzner David Porubský Tamara Potapova Е. И. Рогаев Jeffrey Rosenfeld Steven L. Salzberg Valérie Schneider Fritz J. Sedlazeck Kishwar Shafin Colin J. Shew Alaina Shumate Yumi Sims Arian F. A. Smit Daniela C. Soto Ivan Sović Jessica M. Storer Aaron Streets Beth A. Sullivan Françoise Thibaud‐Nissen James Torrance Justin Wagner Brian P. Walenz Aaron M. Wenger Jonathan Wood Chunlin Xiao Stephanie M. Yan Alice Young Samantha Zarate Urvashi Surti Rajiv C. McCoy Megan Y. Dennis Ivan A. Alexandrov Jennifer L. Gerton Rachel J. O’Neill Winston Timp Justin M. Zook Michael C. Schatz Evan E. Eichler Karen H. Miga Adam M. Phillippy

Abstract In 2001, Celera Genomics and the International Human Genome Sequencing Consortium published their initial drafts of human genome, which revolutionized field genomics. While these updates that followed effectively covered euchromatic fraction heterochromatin many other complex regions were left unfinished or erroneous. Addressing this remaining 8% Telomere-to-Telomere (T2T) has finished first truly complete 3.055 billion base pair (bp) sequence a representing largest improvement to...

10.1101/2021.05.26.445798 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-05-27

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling methods. Here we use accurate linked long reads expand 7 samples include difficult-to-map regions segmental duplications that challenging for short reads. These add more than 300,000 SNVs 50,000 insertions or deletions (indels) 16% exonic variants, many challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, 92% of the autosomal GRCh38...

10.1016/j.xgen.2022.100128 article EN cc-by Cell Genomics 2022-04-28
Michael C. Schatz Anthony Philippakis Enis Afgan Eric Banks Vincent J. Carey and 95 more Robert J. Carroll Alessandro Culotti Kyle Ellrott Jeremy Goecks Robert L. Grossman Ira M. Hall Kasper D. Hansen Jonathan Lawson Jeffrey T. Leek Anne O’Donnell‐Luria Stephen Mosher Martin Morgan Anton Nekrutenko Brian D. O’Connor Kevin Osborn Benedict Paten Candace Patterson Frederick J. Tan Casey Overby Taylor Jennifer Vessio Levi Waldron Ting Wang Kristin Wuichet Alexander Baumann Andrew Rula Anton Kovalsy C. Bernard Derek Caetano-Anollés Géraldine A. Van der Auwera Justin Canas K. Ümit Yüksel Kate Herman Megan Taylor Marianie Simeon Michaël Baumann Qi Wang Robert Title Ruchi Munshi Sushma Chaluvadi Valerie B Reeves William Disman Salin Thomas Allie Hajian Elizabeth Kiernan Namrata Gupta Trish Vosburg Ludwig Geistlinger Marcel Ramos Sehyun Oh Dave Rogers Frances McDade Mim Hastie Nitesh Turaga Alexander Ostrovsky Alexandru Mahmoud Dannon Baker Dave Clements Katherine E.L. Cox Keith Suderman Nataliya Kucher Sergey Golitsynskiy Samantha Zarate Sarah J. Wheelan Kai Kammers Ana Stevens Carolyn M. Hutter Christopher Wellington Elena M. Ghanaim Ken Wiley Shurjo K. Sen Valentina Di Francesco Deni s Yuen Brian Walsh Luke Sargent Vahid Jalili John Chilton Lori Shepherd Benjamin J. Stubbs Ash O’Farrell Benton A. Vizzier Charles Overbeck Charles Reid David Steinberg Elizabeth A. Sheets Julian Lucas Lon Blauvelt Louise Cabansay Noah Warren Brian Hannafious Tim Harris Radhika Reddy Eric S. Torstenson M. Katie Banasiewicz Haley Abel Jason Walker

10.1016/j.xgen.2021.100085 article EN Cell Genomics 2022-01-01
Kathie Sun Xiaodong Bai Siying Chen Suying Bao Chuanyi Zhang and 95 more Manav Kapoor Joshua Backman Tyler Joseph Evan K. Maxwell George Mitra Alexander Gorovits Adam J. Mansfield Boris Boutkov Sujit Gokhale Lukas Habegger Anthony Marcketta Adam E. Locke Liron Ganel Alicia Hawes Michael D. Kessler Deepika Sharma Jeffrey Staples Jonas Bovijn Sahar Gelfman Alessandro Di Gioia Veera M. Rajagopal Alexander Lopez Jennifer Rico Varela Jesús Alegre-Díaz Jaime Berúmen Roberto Tapia‐Conyer Pablo Kuri‐Morales Jason Torres Jonathan Emberson Rory Collins Gonçalo R. Abecasis Giovanni Coppola Andrew Deubler Aris Economides Adolfo A. Ferrando Luca A. Lotta Alan R. Shuldiner Katherine Siminovitch Christina Beechert Erin D. Brian Laura M. Cremona Hang Du Caitlin Forsythe Zhenhua Gu Kristy Guevara Michael Lattari Kia Manoochehri Prathyusha Challa Manasi Pradhan Raymond Reynoso Ricardo Schiavo Maria Sotiropoulos Padilla Chenggu Wang Sarah E. Wolf Amelia Averitt Nilanjana Banerjee Dadong Li Sameer Malhotra Justin Mower Mudasar Sarwar Jeffrey C. Staples Sean Yu Aaron Zhang Andrew Bunyea Krishna Pawan Punuru Sanjay Sreeram Gisu Eom Benjamin Sultan Rouel Lanche Vrushali Mahajan Eliot Austin Sean O’Keeffe Razvan Panea Tommy Polanco Ayesha Rasool Lance Zhang Evan Edelstein Ju Guan Olga Krasheninina Samantha Zarate Adam J. Mansfield Evan K. Maxwell Kathie Sun Manuel Allen Revez Ferreira Kathy Burch Adrián I. Campos Lei Chen Sam Choi Amy Damask Sheila M. Gaynor Benjamin Geraghty Arkopravo Ghosh Salvador Romero Martinez Christopher E. Gillies Lauren Gurski

Abstract Rare coding variants that substantially affect function provide insights into the biology of a gene 1–3 . However, ascertaining frequency such requires large sample sizes 4–8 Here we present catalogue human protein-coding variation, derived from exome sequencing 983,578 individuals across diverse populations. In total, 23% Regeneron Genetics Center Million Exome (RGC-ME) data come African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The includes more...

10.1038/s41586-024-07556-0 article EN cc-by Nature 2024-05-20

Abstract Background Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates both occurrence frequency preferably from large, ethnically diverse cohorts. Thus, current standard approach requires use short paired-end reads, which remain challenging detect, especially at scale hundreds thousands samples. Findings We present Parliament2, consensus SV framework that leverages...

10.1093/gigascience/giaa145 article EN cc-by GigaScience 2020-12-01

Abstract Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long and linked now enable us construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - Major Histocompatibility Complex (MHC). Here, we develop genome benchmark derived from for openly-consented Genome in Bottle sample HG002. assemble single contig each...

10.1038/s41467-020-18564-9 article EN cc-by Nature Communications 2020-09-22

Abstract Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 Mbp of sequence, corrects thousands structural errors, and unlocks most complex regions human clinical functional study. Here we demonstrate how new reference universally improves read mapping variant calling for 3,202 17 globally diverse samples sequenced with short long reads, respectively. We identify hundreds novel variants per sample—a frontier evolutionary biomedical discovery. Simultaneously,...

10.1101/2021.07.12.452063 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-07-13

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, segmental duplications 1–3 . As a result, more than half the is missing from GRCh38 reference it remains last be finished 4, 5 Here, Telomere-to-Telomere (T2T) consortium presents complete 62,460,029 base pair HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y adds over 30 million pairs reference, revealing ampliconic...

10.1101/2022.12.01.518724 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2022-12-01
Oliver Bundgaard Vad Laia Meseguer Monfort Christian Paludan‐Müller Konstantin Kahnert Søren Zöga Diederichsen and 95 more Laura Andreasen Luca A. Lotta Jonas B. Nielsen Alicia Lundby Jesper Hastrup Svendsen Morten Olesen Aris Baras Gonçalo R. Abecasis Adolfo A. Ferrando Michael Cantor Giovanni Coppola Andrew Deubler Aris N. Economides Luca A. Lotta John D. Overton Jeffrey G. Reid Alan R. Shuldiner Katherine Siminovitch Jason Portnoy Marcus B. Jones Lyndon J. Mitnaul Alison Fenney Jonathan Marchini Manuel A. R. Ferreira Maya Ghoussaini Mona Nafde William Salerno Christina Beechert Erin D. Brian Laura M. Cremona Hang Du Caitlin Forsythe Zhenhua Gu Kristy Guevara Michael Lattari Alexander Lopez Kia Manoochehri Prathyusha Challa Manasi Pradhan Raymond Reynoso Ricardo Schiavo Maria Sotiropoulos Padilla Chenggu Wang Sarah E. Wolf Amelia Averitt Nilanjana Banerjee Dadong Li Sameer Malhotra Justin Mower Mudasar Sarwar Deepika Sharma Jeffrey Staples Sean Yu Aaron Zhang Muhammad Aqeel George Mitra Sujit Gokhale Andrew Bunyea Krishna Pawan Punuru Sanjay Sreeram Gisu Eom Benjamin Sultan Rouel Lanche Vrushali Mahajan Eliot Austin Sean O’Keeffe Razvan Panea Tommy Polanco Ayesha Rasool Xiaodong Bai Lance Zhang Boris Boutkov Evan Edelstein Alexander Gorovits Ju Guan Lukas Habegger Alicia Hawes Olga Krasheninina Samantha Zarate Adam J. Mansfield Evan K. Maxwell Suganthi Balasubramanian Suying Bao Kathie Sun Chuanyi Zhang Vikhna Raj Kumar Karuppaiya Joshua Backman Kathy Burch Adrián I. Campos Lei Chen Sam Choi Amy Damask Liron Ganel Sheila M. Gaynor Benjamin Geraghty

Atrial fibrillation (AF) has a substantial genetic component. The importance of polygenic risk is well established, while the contribution rare variants to disease warrants characterization in large cohorts.

10.1001/jamacardio.2024.1528 article EN cc-by JAMA Cardiology 2024-06-26

Summary Genome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling methods. Here, we use accurate linked reads long expand the prior 7 samples include difficult-to-map regions segmental duplications that are not readily accessible short reads. Our benchmark adds more than 300,000 SNVs, 50,000 indels, 16 % exonic variants, many challenging, clinically relevant genes previously covered (e.g., PMS2 ). For HG002, 92%...

10.1101/2020.07.24.212712 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2020-07-25

Abstract Background Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection SV from next-generation sequencing data remains challenging. Results In this study, DNA a Chinese family quartet is sequenced at three different centers triplicate. A total 288 derivative sets generated utilizing analysis pipelines compared to identify sources analytical variability. Mapping methods provide major contribution...

10.1186/s13059-021-02558-x article EN cc-by Genome biology 2021-12-01

The EventIndex is the complete catalogue of all ATLAS events, keeping references to files that contain a given event in any processing stage. It replaces TAG database, which had been use during LHC Run 1. For each it contains its identifiers, trigger pattern and GUIDs containing it. Major cases are picking, feeding Event Service used on some production sites, technical checks completion consistency campaigns. system design highly modular so components (data collection system, storage based...

10.1088/1742-6596/664/4/042003 article EN Journal of Physics Conference Series 2015-12-23

Abstract Here we present Parliament2 – a structural variant caller which combines multiple best-in-class callers to create highly accurate callset. This captures more events than the individual achieve independently. uses call-overlap-genotype approach that is extensible new methods and presents users choice run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, Manta run. applies an additional parallelization framework speed certain executes these in parallel, taking advantage...

10.1101/424267 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2018-09-23

Over the past 30 years, a community of scientists has pieced together every base pair human reference genome from telomere to telomere. Interestingly, most genomics studies omit more than 5% their analyses. Under "normal" circumstances, omitting any chromosome(s) an analysis would be cause for concern, with exception being sex chromosomes. Sex chromosomes in eutherians share evolutionary origin as ancestral autosomes. In humans, they 3 regions high-sequence identity (∼98-100%), which, along...

10.1093/g3journal/jkad169 article EN cc-by G3 Genes Genomes Genetics 2023-07-27

Abstract Genome sequencing at population scale provides unprecedented access to the genetic foundations of human phenotypic diversity, but genotype-phenotype association analyses limited small variants have failed comprehensively characterize architecture health and disease because they ignore structural (SVs) known contribute variation pathogenic conditions 1–3 . Here we demonstrate significance SVs when assessing associations importance ethnic diversity in study design by analyzing across...

10.1101/2020.05.02.074096 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-05-03

The ATLAS EventIndex is a data catalogue system that stores event-related metadata for all (real and simulated) events, on processing stages. As it consists of different components depend other applications (such as distributed storage, sources information) we need to monitor the conditions many heterogeneous subsystems, make sure everything working correctly. This paper describes how gather information about related subsystems: Producer-Consumer architecture collection, health parameters...

10.1088/1742-6596/762/1/012004 article EN Journal of Physics Conference Series 2016-10-01
Eric Manderstedt Christina Lind‐Halldén Christer Halldén Johan Elf Peter J. Svensson and 95 more Gunnar Engström Olle Melander Aris Baras Luca A. Lotta Bengt Zöller Gonçalo R. Abecasis Adolfo A. Ferrando Aris Baras Michael Cantor Giovanni Coppola Andrew Deubler Aris N. Economides Luca A. Lotta John D. Overton Jeffrey G. Reid Alan R. Shuldiner Katherine Siminovitch John D. Overton Christina Beechert Erin D. Brian Laura M. Cremona Hang Du Caitlin Forsythe Zhenhua Gu Kristy Guevara Michael Lattari Alexander Lopez Kia Manoochehri Prathyusha Challa Manasi Pradhan Raymond Reynoso Ricardo Schiavo Maria Sotiropoulos Padilla Chenggu Wang Sarah E. Wolf Michael Cantor Amelia Averitt Nilanjana Banerjee Dadong Li Sameer Malhotra Justin Mower Mudasar Sarwar Deepika Sharma Jeffrey C. Staples Jay Sundaram Sean Yu Aaron Zhang Jeffrey G. Reid Mona Nafde George Mitra Sujit Gokhale Andrew Bunyea Janice Clauer Krishna Pawan Punuru Sanjay Sreeram Gisu Eom Sujit Gokhale Benjamin Sultan Rouel Lanche Vrushali Mahajan Eliot Austin Koteswararao Makkena Sean O’Keeffe Razvan Panea Tommy Polanco Ayesha Rasool William Salerno Xiaodong Bai Lance Zhang Boris Boutkov Evan Edelstein Alexander Gorovits Ju Guan Lukas Habegger Alicia Hawes Olga Krasheninina Samantha Zarate Adam J. Mansfield Evan K. Maxwell Suganthi Balasubramanian Suying Bao Kathie Sun Chuanyi Zhang Gonçalo R. Abecasis Manuel Allen Revez Ferreira Joshua Backman Kathy Burch Adrián I. Campos Lei Chen Sam Choi Amy Damask Liron Ganel Sheila M. Gaynor Benjamin Geraghty Akropravo Ghosh

10.1016/j.tru.2024.100190 article EN cc-by Thrombosis Update 2024-09-01

Over the past 30 years, a community of scientists have pieced together every base pair human reference genome from telomere-to-telomere. Interestingly, most genomics studies omit more than 5% their analyses. Under 'normal' circumstances, omitting any chromosome(s) analysis would be reason for concern-the exception being sex chromosomes. Sex chromosomes in eutherians share an evolutionary origin as ancestral autosomes. In humans, they three regions high sequence identity (~98-100%),...

10.1101/2023.02.22.529542 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-02-22

The ATLAS EventIndex System, developed for use in LHC Run 2, is designed to index every processed event ATLAS, replacing the TAG System used 1. Its storage infrastructure, based on Hadoop open-source software framework, necessitates revamping how information this system relates other systems. It will store more indexes since fundamental mechanisms retrieving these be better integrated into all stages of data processing, allowing events from later processing indexed than was possible with...

10.1088/1742-6596/664/4/042045 article EN Journal of Physics Conference Series 2015-12-23

The ATLAS EventIndex is the catalogue of event-related metadata for information collected from detector. basic unit this event record, containing identification parameters, pointers to files as well trigger decision information. main use case picking, data consistency checks large production campaigns. employs Hadoop platform storage and handling, a messaging system collection both at Tier-0, when are first produced, Grid, various types derived produced. uses auxiliary other sources...

10.1088/1742-6596/762/1/012028 article EN Journal of Physics Conference Series 2016-10-01

Researchers rely on the human reference genome as a baseline to identify genetic differences between individuals, which are crucial for understanding physiology, disease, and evolution. In this study, we focused implications of first-ever complete genome, improves identification variation ushers in beginning new era genetics.

10.25250/thescbr.brk721 article EN cc-by-sa TheScienceBreaker 2023-07-17
Coming Soon ...