Jesse Farek

ORCID: 0000-0003-4939-8083
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Rare Diseases
  • Genetic Associations and Epidemiology
  • Genomics and Phylogenetic Studies
  • Cancer Genomics and Diagnostics
  • Myeloproliferative Neoplasms: Diagnosis and Treatment
  • Molecular Biology Techniques and Applications
  • RNA modifications and cancer
  • T-cell and B-cell Immunology
  • Lipoproteins and Cardiovascular Health
  • Genomic variations and chromosomal abnormalities
  • HIV Research and Treatment
  • Gene expression and cancer classification
  • Bioinformatics and Genomic Networks
  • RNA and protein synthesis mechanisms
  • Acute Myeloid Leukemia Research
  • vaccines and immunoinformatics approaches
  • Folate and B Vitamins Research
  • Genetics, Bioinformatics, and Biomedical Research
  • Caveolin-1 and cellular processes
  • Platelet Disorders and Treatments
  • Genetic Syndromes and Imprinting
  • Advanced Proteomics Techniques and Applications
  • Hematological disorders and diagnostics
  • Atrial Fibrillation Management and Outcomes
  • CRISPR and Genetic Engineering

Baylor College of Medicine
2016-2025

Baylor Genetics
2021-2025

Beth Israel Deaconess Medical Center
2021

St. Edward's University
2015

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling methods. Here we use accurate linked long reads expand 7 samples include difficult-to-map regions segmental duplications that challenging for short reads. These add more than 300,000 SNVs 50,000 insertions or deletions (indels) 16% exonic variants, many challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, 92% of the autosomal GRCh38...

10.1016/j.xgen.2022.100128 article EN cc-by Cell Genomics 2022-04-28

Abstract Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long and linked now enable us construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - Major Histocompatibility Complex (MHC). Here, we develop genome benchmark derived from for openly-consented Genome in Bottle sample HG002. assemble single contig each...

10.1038/s41467-020-18564-9 article EN cc-by Nature Communications 2020-09-22

Abstract Background The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from analysis comparatively small homogeneous sample sets. Findings We have xAtlas, a single-sample caller for single-nucleotide variants (SNVs) insertions deletions (indels) in NGS data. xAtlas...

10.1093/gigascience/giac125 article EN cc-by GigaScience 2022-12-28

Summary Genome in a Bottle (GIAB) benchmarks have been widely used to help validate clinical sequencing pipelines and develop new variant calling methods. Here, we use accurate linked reads long expand the prior 7 samples include difficult-to-map regions segmental duplications that are not readily accessible short reads. Our benchmark adds more than 300,000 SNVs, 50,000 indels, 16 % exonic variants, many challenging, clinically relevant genes previously covered (e.g., PMS2 ). For HG002, 92%...

10.1101/2020.07.24.212712 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2020-07-25

Abstract The current version of the human reference genome, GRCh38, contains a number errors including 1.2 Mbp falsely duplicated and 8.04 collapsed regions. These impact variant calling 33 protein-coding genes, 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together modified GRCh38 genome that improves subsequent analysis across these genes within minutes for existing alignment file while maintaining same coordinates. We showcase improvements over...

10.1186/s13059-023-02863-7 article EN cc-by Genome biology 2023-02-21

Abstract The repetitive nature and complexity of multiple medically important genes make them intractable to accurate analysis, despite the maturity short-read sequencing, resulting in a gap clinical applications genome sequencing. Genome Bottle Consortium has provided benchmark variant sets, but these excluded some relevant due their repetitiveness or polymorphic complexity. In this study, we characterize 273 395 challenging autosomal that have implications for medical This extended,...

10.1101/2021.06.07.444885 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2021-06-07

Abstract Background The All of Us Research Program ( ) is one the world’s largest sequencing efforts that will generate genetic data for over million individuals from diverse backgrounds. This historic megaproject create novel research platforms integrate an unprecedented amount with longitudinal health information. Here, we describe design Celeste , a resilient, open-source cloud architecture implementing genomics workflows has successfully analyzed petabytes participant genomic information...

10.1101/2025.04.29.25326690 preprint EN cc-by-nd medRxiv (Cold Spring Harbor Laboratory) 2025-05-01

Abstract Motivation The rapid development of next-generation sequencing (NGS) technologies has lowered the barriers to genomic data generation, resulting in millions samples sequenced across diverse experimental designs. growing volume and heterogeneity these complicate further optimization methods for identifying DNA variation, especially considering that curated highconfidence variant call sets commonly used evaluate are generally developed by reference results from analysis comparatively...

10.1101/295071 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2018-04-05

Abstract The GRCh38 reference is the current standard in human genomics research and clinical applications, but includes errors across 33 protein-coding genes, including 12 with medical relevance. Current studies rely on correctness of this genome require an accurate cost-effective way to improve variant calling expression analysis these erroneous loci. We identified likely artifacts GTEx, gnomAD, 1000 Genomes Project, other important genomic resources leading wrong interpretations for...

10.1101/2022.07.18.500506 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-07-20
Coming Soon ...