NFDI4DS | UHH-SEMS - Publication Details

Diogo Pratas

ORCID: 0000-0003-1176-552X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5051693451

Research Areas

Algorithms and Data Compression
Genomics and Phylogenetic Studies
Fractal and DNA sequence analysis
Computability, Logic, AI Algorithms
Machine Learning in Bioinformatics
Bacteriophages and microbial interactions
RNA and protein synthesis mechanisms
Gene expression and cancer classification
DNA and Biological Computing
Chromosomal and Genetic Variations
Molecular Biology Techniques and Applications
Parvovirus B19 Infection Studies
Advanced Data Storage Technologies
Viral-associated cancers and disorders
Plant and Fungal Interactions Research
Plant Virus Research Studies
Polyomavirus and related diseases
Scientific Computing and Data Management
Natural Language Processing Techniques
Viral Infections and Outbreaks Research
Benford’s Law and Fraud Detection
Forensic and Genetic Research
Cancer Genomics and Diagnostics
Genomics and Chromatin Dynamics
Computational Drug Discovery Methods

University of Aveiro
2015-2024

University of Helsinki
2019-2024

Helsinki University Hospital
2021-2024

Institute of Electronics
2023-2024

A Survey on Data Compression Methods for Biological Sequences

OPENALEX - Publications

Morteza Hosseini Diogo Pratas Armando J. Pinho

The ever increasing growth of the production high-throughput sequencing data poses a serious challenge to storage, processing and transmission these data. As frequently stated, it is deluge. Compression essential address this challenge—it reduces storage space costs, along with speeding up transmission. In paper, we provide comprehensive survey existing compression approaches, that are specialized for biological data, including protein DNA sequences. Also, devote an important part paper...

10.3390/info7040056 article EN cc-by Information 2016-10-14

The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome features

OPENALEX - Publications

Weihong Qi Yi‐Wen Lim Andrea Patrignani Pascal Schläpfer Anna Bratus-Neuenschwander and 9 more

Abstract Background Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and subtropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava a highly heterozygous with repetitive difficult-to-assemble genome. Findings Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, combination the assembler hifiasm, produced genome assemblies at near complete haplotype resolution higher...

10.1093/gigascience/giac028 article EN cc-by GigaScience 2022-01-01

Unmasking the tissue-resident eukaryotic DNA virome in humans

OPENALEX - Publications

Lari Pyöriä Diogo Pratas Mari Toppinen Klaus Hedman Antti Sajantila and 1 more

Abstract Little is known on the landscape of viruses that reside within our cells, nor interplay with host imperative for their persistence. Yet, a lifetime interactions conceivably have an imprint physiology and immune phenotype. In this work, we revealed genetic make-up unique composition eukaryotic human DNA virome in nine organs (colon, liver, lung, heart, brain, kidney, skin, blood, hair) 31 Finnish individuals. By integration quantitative (qPCR) qualitative (hybrid-capture sequencing)...

10.1093/nar/gkad199 article EN cc-by Nucleic Acids Research 2023-03-23

MFCompress: a compression tool for FASTA and multi-FASTA data

OPENALEX - Publications

Armando J. Pinho Diogo Pratas

Abstract Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy use, these tools fall short when intention reduce much possible data, for example, medium- long-term storage. A number of algorithms have been proposed compression genomics but unfortunately only few them made available usable reliable tools. Results: In this article, we...

10.1093/bioinformatics/btt594 article EN cc-by-nc Bioinformatics 2013-10-16

GReEn: a tool for efficient compression of genome resequencing data

OPENALEX - Publications

Armando J. Pinho Diogo Pratas Sara P. Garcia

Research in the genomic sciences is confronted with volume of sequencing and resequencing data increasing at a higher pace than that storage communication resources, shifting significant part research budgets from component project to computational one. Hence, being able efficiently store problem paramount importance. In this article, we describe GReEn (Genome Resequencing Encoding), tool for compressing genome using reference sequence. It overcomes some drawbacks recently proposed GRS,...

10.1093/nar/gkr1124 article EN cc-by-nc Nucleic Acids Research 2011-12-01

Efficient Compression of Genomic Sequences

OPENALEX - Publications

Diogo Pratas Armando J. Pinho Paulo J. S. G. Ferreira

The number of genomic sequences is growing substantially. Besides discarding part the data, only efficient possibility for coping with this trend data compression. We present an compressor sequences, allowing both reference-free and referential This uses a mixture context models several orders, according to two model classes: reference target. A new type model, which capable tolerating substitution errors, introduced. For ensuring flexibility regarding hardware specifications, cache-hashes...

10.1109/dcc.2016.60 article EN 2016-03-01

Three minimal sequences found in Ebola virus genomes and absent from human DNA

OPENALEX - Publications

Raquel M. Silva Diogo Pratas Luísa Castro Armando J. Pinho Paulo J. S. G. Ferreira

Ebola virus causes high mortality hemorrhagic fevers, with more than 25 000 cases and 10 deaths in the current outbreak. Only experimental therapies are available, thus, novel diagnosis tools druggable targets needed.Analysis of genomes from outbreak reveals presence short DNA sequences that appear nowhere human genome. We identify shortest such lengths between 12 14. three absent length exist they consistently at same location on two proteins, all genomes, but The alignment-free method used...

10.1093/bioinformatics/btv189 article EN cc-by-nc Bioinformatics 2015-04-02

Efficient DNA sequence compression with neural networks

OPENALEX - Publications

Milton Silva Diogo Pratas Armando J. Pinho

Abstract Background The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression DNA sequences. Important applications include long-term storage and compression-based analysis. In literature, only a few recent articles propose use neural networks sequence compression. However, they fall short when compared specific tools, such as GeCo2. This limitation is due absence specifically designed this work, we combine...

10.1093/gigascience/giaa119 article EN cc-by GigaScience 2020-11-01

Comparative evaluation of computational methods for reconstruction of human viral genomes

OPENALEX - Publications

Maria J P Sousa Mari Toppinen Lari Pyöriä Klaus Hedman Antti Sajantila and 2 more

The increasing availability of viral sequences has led to the emergence many optimized genome reconstruction tools. Given that number new tools is steadily increasing, it complex identify functional and offer an equilibrium between accuracy computational resources as well features each tool provides. In this paper, we surveyed open-source (including pipelines) used for human reconstruction, identifying specific characteristics, features, similarities, dissimilarities these For quantitative...

10.1101/2025.01.17.633368 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2025-01-22

The landscape of persistent human DNA viruses in femoral bone

OPENALEX - Publications

Mari Toppinen Diogo Pratas Elina Väisänen Maria Söderlund‐Venermo Klaus Hedman and 2 more

10.1016/j.fsigen.2020.102353 article EN Forensic Science International Genetics 2020-07-08

A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level

OPENALEX - Publications

Diogo Pratas Mari Toppinen Lari Pyöriä Klaus Hedman Antti Sajantila and 1 more

Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers knowledge while kindling novel applications research perspectives. Among these is investigation viral communities residing human body their impact on health disease. To this end, study samples from tissues critical, yet, complexity such analysis calls for a dedicated pipeline. We provide an automatic efficient pipeline identification, assembly, genomes that...

10.1093/gigascience/giaa086 article EN cc-by GigaScience 2020-08-01

Bacteria DNA sequence compression using a mixture of finite-context models

OPENALEX - Publications

Armando J. Pinho Diogo Pratas Paulo J. S. G. Ferreira

The ability of finite-context models for compressing DNA sequences has been demonstrated on some recent works. In this paper, we further explore line, proposing a compression method based eight models, with orders from two to sixteen, whose probabilities are averaged using weights calculated through recursive procedure. was tested total 2,338 belonging bacterial genomes, sizes ranging 1,286 13,033,779 bases, showing better results than the state-of-the-art XM coding algorithm and also faster...

10.1109/ssp.2011.5967637 article EN 2011-06-01

An alignment-free method to find and visualise rearrangements between pairs of DNA sequences

OPENALEX - Publications

Diogo Pratas Raquel M. Silva Armando J. Pinho Paulo J. S. G. Ferreira

Abstract Species evolution is indirectly registered in their genomic structure. The emergence and advances sequencing technology provided a way to access genome information, namely identify study evolutionary macro-events, as well chromosome alterations for clinical purposes. This paper describes completely alignment-free computational method, based on blind unsupervised approach, detect large-scale small-scale rearrangements between pairs of DNA sequences. To illustrate the power usefulness...

10.1038/srep10203 article EN cc-by Scientific Reports 2015-05-18

Cryfa: a secure encryption tool for genomic data

OPENALEX - Publications

Morteza Hosseini Diogo Pratas Armando J. Pinho

Abstract Summary The ever-increasing growth of high-throughput sequencing technologies has led to a great acceleration medical and biological research discovery. As these platforms advance, the amount information for diverse genomes increases at unprecedented rates. Confidentiality, integrity authenticity such genomic should be ensured due its extremely sensitive nature. In this paper, we propose Cryfa, fast secure encryption tool data, namely in Fasta, Fastq, VCF, SAM BAM formats, which is...

10.1093/bioinformatics/bty645 article EN cc-by-nc Bioinformatics 2018-07-18

XS: a FASTQ read simulator

OPENALEX - Publications

Diogo Pratas Armando J. Pinho João M. O. S. Rodrigues

The emerging next-generation sequencing (NGS) is bringing, besides the natural huge amounts of data, an avalanche new specialized tools (for analysis, compression, alignment, among others) and large public private network infrastructures. Therefore, a direct necessity specific simulation for testing benchmarking rising, such as flexible portable FASTQ read simulator, without need reference sequence, yet correctly prepared producing approximately same characteristics real data. We present XS,...

10.1186/1756-0500-7-40 article EN cc-by BMC Research Notes 2014-01-01

Reactivation of a Transplant Recipient's Inherited Human Herpesvirus 6 and Implications to the Graft

OPENALEX - Publications

Leo Hannolainen Lari Pyöriä Diogo Pratas Jouko Lohi Sandra Skuja and 5 more

The implications of inherited chromosomally integrated human herpesvirus 6 (iciHHV-6) in solid organ transplantation remain uncertain. Although this trait has been linked to unfavorable clinical outcomes, an association between viral reactivation and complications only conclusively established a few cases. We used hybrid capture sequencing for in-depth analysis the sequences reconstructed from sequential liver biopsies. Moreover, we investigated replication through situ hybridization...

10.1093/infdis/jiae268 article EN cc-by The Journal of Infectious Diseases 2024-05-17

Authorship Attribution Using Relative Compression

OPENALEX - Publications

Armando J. Pinho Diogo Pratas Paulo J. S. G. Ferreira

Authorship attribution is a classical classification problem. We use it here to illustrate the performance of compression-based measure that relies on notion relative compression. Besides comparing with recent approaches multiple discriminant analysis and support vector machines, we compare Normalized Conditional Compression Distance (a direct approximation Information Distance) popular Distance. The Relative (NRC) attained 100% correct in data set used, showing consistency between...

10.1109/dcc.2016.53 article EN 2016-03-01

A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

OPENALEX - Publications

Diogo Pratas Morteza Hosseini Jorge Miguel Silva Armando J. Pinho

The development of efficient data compressors for DNA sequences is crucial not only reducing the storage and bandwidth transmission, but also analysis purposes. In particular, improved compression models directly influences outcome anthropological biomedical compression-based methods. this paper, we describe a new lossless compressor with capabilities representing different domains kingdoms. reference-free method uses competitive prediction model to estimate, each symbol, best class be used...

10.3390/e21111074 article EN cc-by Entropy 2019-11-02

Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements

OPENALEX - Publications

Morteza Hosseini Diogo Pratas Burkhard Morgenstern Armando J. Pinho

Abstract Background The development of high-throughput sequencing technologies and, as its result, the production huge volumes genomic data, has accelerated biological and medical research discovery. Study on rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, cancer. Results We present Smash++, an alignment-free memory-efficient tool find visualize small- large-scale between 2 DNA sequences. This computational solution extracts information contents...

10.1093/gigascience/giaa048 article EN cc-by GigaScience 2020-05-01

The Human Bone Marrow Is Host to the DNAs of Several Viruses

OPENALEX - Publications

Mari Toppinen Antti Sajantila Diogo Pratas Klaus Hedman Maria F. Perdomo

The long-term impact of viruses residing in the human bone marrow (BM) remains unexplored. However, chronic inflammatory processes driven by single or multiple could significantly alter hematopoiesis and immune function. We performed a systematic analysis DNAs 38 BM. detected, quantitative PCRs next-generation sequencing, viral DNA 88.9% samples, up to five one individual. Included were, among others, several herpesviruses, hepatitis B virus, Merkel cell polyomavirus and, unprecedentedly,...

10.3389/fcimb.2021.657245 article EN cc-by Frontiers in Cellular and Infection Microbiology 2021-04-22

The complexity landscape of viral genomes

OPENALEX - Publications

Jorge Miguel Silva Diogo Pratas Tânia Caetano Sérgio Matos

Abstract Background Viruses are among the shortest yet highly abundant species that harbor minimal instructions to infect cells, adapt, multiply, and exist. However, with current substantial availability of viral genome sequences, scientific repertory lacks a complexity landscape automatically enlights genomes’ organization, relation, fundamental characteristics. Results This work provides comprehensive genome’s (or quantity information), identifying most redundant complex groups regarding...

10.1093/gigascience/giac079 article EN cc-by GigaScience 2022-01-01

TargIDe: a machine-learning workflow for target identification of molecules with antibiofilm activity against Pseudomonas aeruginosa

OPENALEX - Publications

João Carneiro Rita P. Magalhães Victor M de la Oliva Roque Manuel Simões Diogo Pratas and 1 more

Abstract Bacterial biofilms are a source of infectious human diseases and heavily linked to antibiotic resistance. Pseudomonas aeruginosa is multidrug-resistant bacterium widely present implicated in several hospital-acquired infections. Over the last years, development new drugs able inhibit by interfering with its ability form has become promising strategy drug discovery. Identifying molecules interfere biofilm formation difficult, but further developing these rationally improving their...

10.1007/s10822-023-00505-5 article EN cc-by Journal of Computer-Aided Molecular Design 2023-04-22

Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard

OPENALEX - Publications

Diogo Pratas Morteza Hosseini Gonçalo Grilo Armando J. Pinho Raquel M. Silva and 3 more

The sequencing of ancient DNA samples provides a novel way to find, characterize, and distinguish exogenous genomes endogenous targets. After sequencing, computational composition analysis enables filtering undesired sources in the focal organism, with purpose improving quality assemblies subsequent data analysis. More importantly, such allows extinct extant species be identified without requiring specific or new run. However, identification organisms is complex task, given nature...

10.3390/genes9090445 article EN Genes 2018-09-06

Coming Soon ...