Santiago Marco‐Sola

ORCID: 0000-0001-7951-3914
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Algorithms and Data Compression
  • Chromosomal and Genetic Variations
  • RNA and protein synthesis mechanisms
  • RNA modifications and cancer
  • Advanced Data Storage Technologies
  • Medical Imaging and Pathology Studies
  • Parallel Computing and Optimization Techniques
  • Protist diversity and phylogeny
  • Genomics and Rare Diseases
  • Genomic variations and chromosomal abnormalities
  • Plant Virus Research Studies
  • RNA Research and Splicing
  • Evolutionary Algorithms and Applications
  • Machine Learning in Bioinformatics
  • Enzyme Structure and Function
  • Plant Disease Resistance and Genetics
  • Epigenetics and DNA Methylation
  • Genetic Syndromes and Imprinting
  • Protein Structure and Dynamics
  • Biomedical Text Mining and Ontologies
  • Genetic Mapping and Diversity in Plants and Animals
  • Network Packet Processing and Optimization
  • COVID-19 diagnosis using AI
  • Molecular Biology Techniques and Applications

Universitat Politècnica de Catalunya
2017-2025

Barcelona Supercomputing Center
2020-2025

Universitat Autònoma de Barcelona
2018-2023

Centro Nacional de Análisis Genómico
2012-2016

Centre for Genomic Regulation
2016

Universitat Pompeu Fabra
2016

Wen‐Wei Liao Mobin Asri Jana Ebler Daniel Doerr Marina Haukness and 95 more Glenn Hickey Shuangjia Lu Julian Lucas Jean Monlong Haley Abel Silvia Buonaiuto Xian Chang Haoyu Cheng Justin Chu Vincenza Colonna Jordan M. Eizenga Xiaowen Feng Christian Fischer Robert S. Fulton Shilpa Garg Cristian Groza Andrea Guarracino William T. Harvey Simon Heumos Kerstin Howe Miten Jain Tsung-Yu Lu Charles Markello Fergal J. Martin Matthew W. Mitchell Katherine M. Munson Moses Njagi Mwaniki Adam M. Novak Hugh E. Olsen Trevor Pesout David Porubský Pjotr Prins Jonas A. Sibbesen Jouni Sirén Chad Tomlinson Flavia Villani Mitchell R. Vollger Lucinda Antonacci-Fulton Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Andrew Carroll Pi-Chuan Chang Sarah Cody Daniel E. Cook Robert Cook‐Deegan Omar E. Cornejo Mark Diekhans Peter Ebert Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Giulio Formenti Adam Frankish Yan Gao Nanibaa’ A. Garrison Carlos García Girón Richard E. Green Leanne Haggerty Kendra Hoekzema Thibaut Hourlier Hanlee P. Ji Eimear E. Kenny Barbara A. Koenig Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky Sergey Koren HoJoon Lee Alexandra P. Lewis Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Ann M. Mc Cartney Jennifer McDaniel Jacquelyn Mountcastle Maria Nattestad Sergey Nurk Nathan D. Olson Alice B. Popejoy Daniela Puiu Mikko Rautiainen Allison Regier Arang Rhie Samuel Sacco Ashley D. Sanders Valérie Schneider Baergen I. Schultz Kishwar Shafin Michael W. Smith Heidi J. Sofia Ahmad Abou Tayoun Françoise Thibaud‐Nissen Francesca Floriana Tricomi

Abstract Here the Human Pangenome Reference Consortium presents a first draft of human pangenome reference. The contains 47 phased, diploid assemblies from cohort genetically diverse individuals 1 . These cover more than 99% expected sequence in each genome and are accurate at structural base pair levels. Based on alignments assemblies, we generate that captures known variants haplotypes reveals new alleles structurally complex loci. We also add 119 million pairs euchromatic polymorphic...

10.1038/s41586-023-05896-x article EN cc-by Nature 2023-05-10
Glenn Hickey Jean Monlong Jana Ebler Adam M. Novak Jordan M. Eizenga and 95 more Yan Gao Haley Abel Lucinda Antonacci-Fulton Mobin Asri Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Guillaume Bourque Silvia Buonaiuto Andrew Carroll Mark Chaisson Pi-Chuan Chang Xian Chang Haoyu Cheng Justin Chu Sarah Cody Vincenza Colonna Daniel E. Cook Robert Cook‐Deegan Omar E. Cornejo Mark Diekhans Daniel Doerr Peter Ebert Jana Ebler Evan E. Eichler Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Xiaowen Feng Christian Fischer Paul Flicek Giulio Formenti Adam Frankish Robert S. Fulton Shilpa Garg Erik Garrison Nanibaa’ A. Garrison Carlos García Girón Richard E. Green Cristian Groza Andrea Guarracino Leanne Haggerty Ira M. Hall William T. Harvey Marina Haukness David Haussler Simon Heumos Kendra Hoekzema Thibaut Hourlier Kerstin Howe Miten Jain Erich D. Jarvis Hanlee P. Ji Eimear E. Kenny Barbara A. Koenig Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky Sergey Koren HoJoon Lee Alexandra P. Lewis Wen‐Wei Liao Shuangjia Lu Tsung-Yu Lu Julian Lucas Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Charles Markello Tobias Marschall Fergal J. Martin Ann M. Mc Cartney Jennifer McDaniel Karen H. Miga Matthew W. Mitchell Jacquelyn Mountcastle Katherine M. Munson Moses Njagi Mwaniki Maria Nattestad Sergey Nurk Hugh E. Olsen Nathan D. Olson Trevor Pesout Adam M. Phillippy Alice B. Popejoy David Porubský Pjotr Prins Daniela Puiu Mikko Rautiainen Allison Regier Arang Rhie Samuel Sacco Ashley D. Sanders Valérie Schneider

10.1038/s41587-023-01793-w article EN Nature Biotechnology 2023-05-10
Andrea Guarracino Silvia Buonaiuto Leonardo Gomes de Lima Tamara Potapova Arang Rhie and 95 more Sergey Koren Boris Rubinstein Christian Fischer Haley Abel Lucinda Antonacci-Fulton Mobin Asri Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Guillaume Bourque Andrew Carroll Mark Chaisson Pi-Chuan Chang Xian Chang Haoyu Cheng Justin Chu Sarah Cody Daniel E. Cook Robert Cook‐Deegan Omar E. Cornejo Mark Diekhans Daniel Doerr Peter Ebert Jana Ebler Evan E. Eichler Jordan M. Eizenga Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Xiaowen Feng Paul Flicek Giulio Formenti Adam Frankish Robert S. Fulton Yan Gao Shilpa Garg Nanibaa’ A. Garrison Carlos García Girón Richard E. Green Cristian Groza Leanne Haggerty Ira M. Hall William T. Harvey Marina Haukness David Haussler Simon Heumos Glenn Hickey Kendra Hoekzema Thibaut Hourlier Kerstin Howe Miten Jain Erich D. Jarvis Hanlee P. Ji Eimear E. Kenny Barbara A. Koenig Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky HoJoon Lee Alexandra P. Lewis Heng Li Wen‐Wei Liao Shuangjia Lu Tsung-Yu Lu Julian Lucas Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Charles Markello Tobias Marschall Fergal J. Martin Ann M. Mc Cartney Jennifer McDaniel Karen H. Miga Matthew W. Mitchell Jean Monlong Jacquelyn Mountcastle Katherine M. Munson Moses Njagi Mwaniki Maria Nattestad Adam M. Novak Sergey Nurk Hugh E. Olsen Nathan D. Olson Benedict Paten Trevor Pesout Alice B. Popejoy David Porubský Pjotr Prins Daniela Puiu Mikko Rautiainen Allison Regier Samuel Sacco Ashley D. Sanders

Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats extended segmental duplications 1,2 . Although resolution these regions in first complete assembly a genome—the Telomere-to-Telomere Consortium’s CHM13 (T2T-CHM13)—provided model their homology 3 , it remained unclear whether patterns were ancestral or maintained by ongoing recombination exchange. Here we show that contain...

10.1038/s41586-023-05976-y article EN cc-by Nature 2023-05-10

Abstract Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder (PGGB), pipeline for constructing pangenome without bias exclusion. PGGB uses all-to-all alignments graph in which identify variation, measure conservation, detect recombination events, and infer phylogenetic relationships.

10.1101/2023.04.05.535718 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-04-06
Mitchell R. Vollger Philip C. Dishuck William T. Harvey William S. DeWitt Xavi Guitart and 95 more Michael E. Goldberg Allison N. Rozanski Julian Lucas Mobin Asri Haley Abel Lucinda Antonacci-Fulton Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Guillaume Bourque Silvia Buonaiuto Andrew Carroll Mark Chaisson Pi-Chuan Chang Xian Chang Haoyu Cheng Justin Chu Sarah Cody Vincenza Colonna Daniel E. Cook Robert Cook‐Deegan Omar E. Cornejo Mark Diekhans Daniel Doerr Peter Ebert Jana Ebler Jordan M. Eizenga Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Xiaowen Feng Christian Fischer Paul Flicek Giulio Formenti Adam Frankish Robert S. Fulton Yan Gao Shilpa Garg Erik Garrison Nanibaa’ A. Garrison Carlos García Girón Richard E. Green Cristian Groza Andrea Guarracino Leanne Haggerty Ira M. Hall Marina Haukness David Haussler Simon Heumos Glenn Hickey Thibaut Hourlier Kerstin Howe Miten Jain Erich D. Jarvis Hanlee P. Ji Eimear E. Kenny Barbara A. Koenig Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky Sergey Koren HoJoon Lee Heng Li Wen‐Wei Liao Shuangjia Lu Tsung-Yu Lu Julian Lucas Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Charles Markello Tobias Marschall Fergal J. Martin Ann M. Mc Cartney Jennifer McDaniel Karen H. Miga Matthew W. Mitchell Jean Monlong Jacquelyn Mountcastle Moses Njagi Mwaniki Maria Nattestad Adam M. Novak Sergey Nurk Hugh E. Olsen Nathan D. Olson Benedict Paten Trevor Pesout Adam M. Phillippy Alice B. Popejoy Pjotr Prins Daniela Puiu Mikko Rautiainen Allison Regier Arang Rhie

Abstract Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations mapping short-read sequencing data 1,2 . Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared pattern SNVs between unique duplicated regions 3,4 We find that are elevated 60% to estimate at least 23% this increase is due interlocus gene conversion (IGC) with up 4.3 megabase pairs SD sequence...

10.1038/s41586-023-05895-y article EN cc-by Nature 2023-05-10

Abstract Motivation Pairwise alignment of sequences is a fundamental method in modern molecular biology, implemented within multiple bioinformatics tools and libraries. Current advances sequencing technologies press for the development faster pairwise algorithms that can scale with increasing read lengths production yields. Results In this article, we present wavefront algorithm (WFA), an exact gap-affine takes advantage homologous regions between to accelerate process. As opposed...

10.1093/bioinformatics/btaa777 article EN cc-by-nc Bioinformatics 2020-09-01

Significance One fundamental analysis needed to interpret genome assemblies is alignment. Yet, accurately aligning regulatory and transposon regions outside of genes remains challenging. We introduce Anchored Wavefront alignment (AnchorWave), which implements a duplication informed longest path algorithm identify collinear performs base pair–resolved, end-to-end for blocks using an efficient two-piece affine gap cost strategy. AnchorWave improves the under number scenarios: genomes with high...

10.1073/pnas.2113075119 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2021-12-21

Abstract Motivation Pairwise sequence alignment remains a fundamental problem in computational biology and bioinformatics. Recent advances genomics sequencing technologies demand faster scalable algorithms that can cope with the ever-increasing lengths. Classical pairwise based on dynamic programming are strongly limited by quadratic requirements time memory. The recently proposed wavefront algorithm (WFA) introduced an efficient to perform exact gap-affine O(ns) time, where s is optimal...

10.1093/bioinformatics/btad074 article EN cc-by Bioinformatics 2023-02-01

DNA methylation is essential for normal embryogenesis and development in mammals can be captured at single base pair resolution by whole genome bisulfite sequencing (WGBS). Current available analysis tools are becoming rapidly outdated as they lack sensible functionality efficiency to handle large amounts of data now commonly created.We developed gemBS, a fast high-throughput bioinformatics pipeline specifically designed scale BS-Seq that combines high performance BS-mapper (GEM3) variant...

10.1093/bioinformatics/bty690 article EN Bioinformatics 2018-08-20

As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next-generation as standard practice in research diagnostics. However, computing cost-performance ratio is not advancing at an equivalent rate. Therefore, essential to evaluate the robustness of variant detection process taking into account resources required. We have benchmarked six combinations state-of-the-art read aligners (BWA-MEM GEM3) callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on...

10.1002/humu.23114 article EN Human Mutation 2016-09-08

Advances in genomics and sequencing technologies demand faster more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long noisy like those produced by PacBio Nanopore technologies. The recently proposed wavefront (WFA) algorithm paves the way for efficient tools, improving time memory complexity over previous methods....

10.1093/bioinformatics/btad701 article EN cc-by Bioinformatics 2023-11-16

Approximate string matching is a very important problem in computational biology; it requires the fast computation of distance as one its essential components. Myers' bit-parallel algorithm improves classical dynamic programming approach to Levenshtein computation, and offers competitive performance on CPUs. The main challenge when designing an efficient GPU implementation expose enough SIMD parallelism while at same time keeping relatively small working set for each thread.

10.1145/2597652.2597677 article EN 2014-06-10

Abstract Motivation Pairwise sequence alignment is a core component of multiple sequencing-data analysis tools. Recent advancements in sequencing technologies have enabled the generation longer sequences at much lower price. Thus, long-read become increasingly popular sequencing-based studies. However, classical algorithms face significant scalability challenges when aligning long sequences. As result, several heuristic methods been developed to improve performance expense accuracy, as they...

10.1093/bioinformatics/btaf112 article EN cc-by Bioinformatics 2025-03-10

Chimeric transcripts are commonly defined as linking two or more different genes in the genome, and can be explained by various biological mechanisms such genomic rearrangement, read-through trans-splicing, but also technical artefacts. Several studies have shown their importance cancer, cell pluripotency motility. Many programs recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion cancer). However outputs of on same dataset widely inconsistent, tend include...

10.1186/s12864-016-3404-9 article EN cc-by BMC Genomics 2017-01-03

The recent advent of high-throughput sequencing machines producing big amounts short reads has boosted the interest in efficient string searching techniques. As today, many mainstream sequence alignment software tools rely on a special data structure, called FM-index, which allows for fast exact searches large genomic references. However, such translate into pseudo-random memory access pattern, thus making limiting factor all computation-efficient implementations, both CPUs and GPUs. Here,...

10.1109/tcbb.2014.2377716 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2014-12-04

Given the overwhelming impact of machine learning on last decade, several libraries and frameworks have been developed in recent years to simplify design training neural networks, providing array-based programming, automatic differentiation user-friendly access hardware accelerators. None those tools, however, was designed with native transparent support for Cloud Computing or heterogeneous High-Performance (HPC). The DeepHealth Toolkit is an open source Deep Learning toolkit aimed at...

10.1109/icpr48806.2021.9411954 article EN 2022 26th International Conference on Pattern Recognition (ICPR) 2021-01-10

Sequence alignment remains a fundamental problem with practical applications ranging from pattern recognition to computational biology. Traditional algorithms based on dynamic programming are hard parallelize, require significant amounts of memory, and fail scale for large inputs. This work presents eWFA-GPU, GPU (graphics processing unit)-accelerated tool compute the exact edit-distance sequence wavefront algorithm (WFA). approach exploits similarities between input sequences accelerate...

10.1109/access.2022.3182714 article EN cc-by IEEE Access 2022-01-01

In the last years, advances in next-generation sequencing technologies have enabled proliferation of genomic applications that guide personalized medicine. These an enormous computational cost due to large amount data they process. The first step many these consists aligning reads against a reference genome. Very recently, wavefront alignment algorithm has been introduced, significantly reducing execution time read This paper presents FPGA-based hardware/software co-designed accelerator such...

10.1109/fpl53798.2021.00033 article EN 2021-08-01
HoJoon Lee Stephanie Greer Dmitri S. Pavlichin Bo Zhou Alexander E. Urban and 95 more Tsachy Weissman Hanlee P. Ji Wen‐Wei Liao Mobin Asri Jana Ebler Daniel Doerr Marina Haukness Glenn Hickey Shuangjia Lu Julian Lucas Jean Monlong Haley Abel Silvia Buonaiuto Xian Chang Haoyu Cheng Justin Chu Vincenza Colonna Jordan M. Eizenga Xiaowen Feng Christian Fischer Robert S. Fulton Shilpa Garg Cristian Groza Andrea Guarracino William T. Harvey Simon Heumos Kerstin Howe Miten Jain Tsung-Yu Lu Charles Markello Fergal J. Martin Matthew W. Mitchell Katherine M. Munson Moses Njagi Mwaniki Adam M. Novak Hugh E. Olsen Trevor Pesout David Porubský Pjotr Prins Jonas A. Sibbesen Chad Tomlinson Flavia Villani Mitchell R. Vollger Lucinda Antonacci-Fulton Gunjan Baid Carl Baker Anastasiya Belyaeva Konstantinos Billis Andrew Carroll Pi-Chuan Chang Sarah Cody Daniel E. Cook Omar E. Cornejo Mark Diekhans Peter Ebert Susan Fairley Olivier Fédrigo Adam L. Felsenfeld Giulio Formenti Adam Frankish Yan Gao Carlos García Girón Richard E. Green Leanne Haggerty Kendra Hoekzema Thibaut Hourlier Hanlee P. Ji Alexey Kolesnikov Jan O. Korbel Jennifer Kordosky HoJoon Lee Alexandra P. Lewis Hugo Magalhães Santiago Marco‐Sola Pierre Marijon Jennifer McDaniel Jacquelyn Mountcastle Maria Nattestad Nathan D. Olson Daniela Puiu Allison Regier Arang Rhie Samuel Sacco Ashley D. Sanders Valérie Schneider Baergen I. Schultz Kishwar Shafin Jouni Sirén Michael W. Smith Heidi J. Sofia Ahmad Abou Tayoun Françoise Thibaud‐Nissen Francesca Floriana Tricomi Justin Wagner Jonathan Wood

The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed k-mer indexing strategy for comparative analysis across multiple assemblies, including pangenome reference, GRCh38, and CHM13, telomere-to-telomere assembly. Our approach enabled us to identify valuable collection universally conserved sequences all referred as...

10.1016/j.crmeth.2023.100543 article EN cc-by-nc-nd Cell Reports Methods 2023-08-01

Arm usage has substantially grown in the High-Performance Computing (HPC) community. Japanese supercomputer Fugaku, powered by Arm-based A64FX processors, held top position on Top500 list between June 2020 and 2022, currently sitting fourth position. The recently released 7th generation of Amazon EC2 instances for compute-intensive workloads (C7 g) is also Graviton3 processors. Projects like European Mont-Blanc U.S. DOE/NNSA Astra are further examples irruption HPC. In parallel, over last...

10.1016/j.future.2024.03.050 article EN cc-by-nc Future Generation Computer Systems 2024-04-02

Abstract Motivation Advances in genomics and sequencing technologies demand faster more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long noisy like those produced by PacBio, Nanopore technologies. The recently proposed WFA algorithm paves the way for efficient tools, improving time memory complexity over previous...

10.1101/2022.04.18.488374 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-04-18

Modern Illumina-like high-throughput sequencing machines allow the cheap decoding of great amounts DNA. The GEnomic Multi-tool (GEM) mapper is one fastest and most sensitive methods known to date align such data a genomic reference. This unit explains how use it effectively.

10.1002/0471250953.bi1113s50 article EN Current Protocols in Bioinformatics 2015-06-01
Coming Soon ...