- Genomics and Phylogenetic Studies
- Scientific Computing and Data Management
- Gene expression and cancer classification
- Research Data Management Practices
- RNA and protein synthesis mechanisms
- Algorithms and Data Compression
- Machine Learning in Bioinformatics
- Distributed and Parallel Computing Systems
- Genetic Mapping and Diversity in Plants and Animals
- Genetics and Plant Breeding
- Bioinformatics and Genomic Networks
- Advanced Software Engineering Methodologies
- Genetics, Bioinformatics, and Biomedical Research
- Advanced Data Storage Technologies
- Glycosylation and Glycoproteins Research
- Genetic diversity and population structure
- Parallel Computing and Optimization Techniques
- Model-Driven Software Engineering Techniques
- Advanced Proteomics Techniques and Applications
- Data Visualization and Analytics
- Analog and Mixed-Signal Circuit Design
- Usability and User Interface Design
- Neuroscience and Neural Engineering
- Artificial Intelligence in Healthcare
- Mycorrhizal Fungi and Plant Interactions
Centre for Genomic Regulation
2011-2020
Barcelona Institute for Science and Technology
2017-2020
Universitat Pompeu Fabra
2010-2018
Institut thématique Génétique, génomique et bioinformatique
2015
Barcelona Biomedical Research Park
2011
Universitat Autònoma de Barcelona
2011
Universitat de Lleida
2010
Sapienza University of Rome
2004
This article introduces a new interface for T-Coffee, consistency-based multiple sequence alignment program. provides an easy and intuitive access to the most popular functionality of package. These include default T-Coffee mode protein nucleic acid sequences, M-Coffee that allows combining output any other aligners, template-based modes deliver high accuracy alignments while using structural or homology derived templates. three available template are Expresso with known 3D-Structure,...
Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary are the most common applications of MSAs. Both known to be sensitive underlying MSA accuracy. In this work, we show how problem can partly overcome using transitive consistency score (TCS), an extended version T-Coffee scoring scheme. Using local evaluation function, that one identify reliable portions MSA, as judged from BAliBASE PREFAB structure-based reference...
Abstract Background Transmembrane proteins (TMPs) constitute about 20~30% of all protein coding genes. The relative lack experimental structure has so far made it hard to develop specific alignment methods and the current state art (PRALINE™) only manages recapitulate 50% positions in reference alignments available from BAliBASE2-ref7. Methods We show how homology extension can be adapted combined with a consistency based approach order significantly improve multiple sequence alpha-helical...
Abstract Summary: AMPA is a web application for assessing the antimicrobial domains of proteins, with focus on design new drugs. The provides fast discovery patterns in proteins that can be used to develop peptide-based drugs against pathogens. Results are shown user-friendly graphical interface and downloaded as raw data later examination. Availability: freely available at http://tcoffee.crg.cat/apps/ampa. source code also web. Contact: marc.torrent@upf.edu; david.andreu@upf.edu...
Genomic pipelines consist of several pieces third party software and, because their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment reproducibility issues. Docker containers emerging as a possible solution for many these problems, they allow the packaging in an isolated self-contained manner. This makes it easy to distribute execute portable manner across wide range computing platforms. Thus, question that arises is what extent use...
Abstract Standardised analysis pipelines are an important part of FAIR bioinformatics research. Over the last decade, there has been a notable shift from point-and-click pipeline solutions such as Galaxy towards command-line Nextflow and Snakemake. We report on recent developments in nf-core frameworks that have led to widespread adoption across many scientific communities. describe how adopting standards enables faster development, improved interoperability, collaboration with >8,000...
The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based approach. Homology is performed Position Specific Iterative (PSI) BLAST searches against choice redundant and non-redundant databases. main novelty this to allow databases reduced complexity rapidly perform extension. This also gives the possibility use transmembrane (TMPs) reference even faster on important category proteins. Aside from an MSA, outputs...
This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate local reliability of protein multiple sequence alignments (MSAs) using TCS index. The evaluation can be used identify aligned positions most likely contain structurally analogous residues and also support an accurate phylogenetic reconstruction. scoring scheme has been shown predictor structural alignment correctness among commonly methods. It outperform common filtering...
Abstract The standardization, portability, and reproducibility of analysis pipelines is a renowned problem within the bioinformatics community. Most are designed for execution on-premise, associated software dependencies tightly coupled with local compute environment. This leads to poor pipeline portability ensuing results - both which fundamental requirements validation scientific findings. Here, we introduce nf-core : framework that provides community-driven, peer-reviewed platform...
Biological, clinical, and pharmacological research now often involves analyses of genomes, transcriptomes, proteomes, interactomes, within between individuals across species. Due to large volumes, the analysis integration data generated by such high-throughput technologies have become computationally intensive, can no longer happen on a typical desktop computer.In this chapter we show how describe execute same using number workflow systems these follow different approaches tackle execution...
Scientific workflows have been used almost universally across scientific domains, and underpinned some of the most significant discoveries past several decades. Many these high computational, storage, and/or communication demands, thus must execute on a wide range large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions be managed using software infrastructure. Due popularity workflows, workflow management systems (WMSs)...
The computational complexity of many key bioinformatics problems has resulted in numerous alternative heuristic solutions, where no single approach consistently outperforms all others. This creates difficulties for users trying to identify the most suitable tool their dataset and developers managing evaluating methods. As data volumes grow, deploying these methods becomes increasingly difficult, highlighting need standardized frameworks seamless deployment comparison HPC environments....
Abstract Summary: We present the first parallel implementation of T-Coffee consistency-based multiple aligner. benchmark it on Amazon Elastic Cloud (EC2) and show that parallelization procedure is reasonably effective. also conclude for a web server with moderate usage (10K hits/month) cloud provides cost-effective alternative to in-house deployment. Availability: freeware open source package available from http://www.tcoffee.org/homepage.html Contact: cedric.notredame@crg.es
Phylogenetic reconstructions are essential in genomics data analyses and depend on accurate multiple sequence alignment (MSA) models. We show that all currently available large-scale progressive methods numerically unstable when dealing with amino-acid sequences. They produce significantly different output changing input order. used the HOMFAM protein sequences dataset to datasets larger than 100 sequences, this instability affects average 21.5% of aligned residues. The resulting Maximum...
Nextflow is a data-driven framework for computational pipelines that simplifies writing parallel and scalable in portable manner.
Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce phylogenies, with no simple objective criterion sufficient distinguish among these alternatives.We demonstrate incorporating MSA into bootstrap sampling can significantly increase correlation between clade correctness and its...
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on to orchestrate large and complex experiments that range from execution of a cloud-based data preprocessing pipeline multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape evolving needs emerging applications, it paramount development novel system functionalities seek increase efficiency, resilience, pervasiveness...