NFDI4DS | UHH-SEMS - Publication Details

Veli Mäkinen

ORCID: 0000-0003-4454-1493

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5043558409

Research Areas

Algorithms and Data Compression
Genomics and Phylogenetic Studies
Network Packet Processing and Optimization
DNA and Biological Computing
RNA and protein synthesis mechanisms
Genome Rearrangement Algorithms
semigroups and automata theory
Natural Language Processing Techniques
Advanced Image and Video Retrieval Techniques
Chromosomal and Genetic Variations
Cellular Automata and Applications
Machine Learning in Bioinformatics
Gene expression and cancer classification
Graph Theory and Algorithms
Molecular Biology Techniques and Applications
Music and Audio Processing
Machine Learning and Algorithms
Genomics and Rare Diseases
Cancer Genomics and Diagnostics
Genetics, Bioinformatics, and Biomedical Research
Genomics and Chromatin Dynamics
Genomic variations and chromosomal abnormalities
Advanced Data Compression Techniques
Image Retrieval and Classification Techniques
RNA modifications and cancer

University of Helsinki
2015-2024

Helsinki Institute for Information Technology
2012-2021

Helsinki Institute of Physics
2009-2016

Aalto University
2016

Max Planck Institute for Informatics
2015

Saarland University
2015

Centrum Wiskunde & Informatica
2015

Bielefeld University
2005-2015

HAN University of Applied Sciences
2014

Institute of Bioinformatics
2003-2008

Compressed representations of sequences and full-text indexes

OPENALEX - Publications

Paolo Ferragina Giovanni Manzini Veli Mäkinen Gonzalo Navarro

Given a sequence S = s 1 2 … n of integers smaller than r O (polylog( )), we show how can be represented using nH 0 ( ) + o bits, so that know any q , as well answer rank and select queries on in constant time. H is the zero-order empirical entropy provides an information-theoretic lower bound to bit storage via fixed encoding its symbols. This extends previous results binary sequences, improves general sequences where those are answered (log For larger still represent log bits /log Another...

10.1145/1240233.1240243 article EN ACM Transactions on Algorithms 2007-05-01

Indexing Graphs for Path Queries with Applications in Genome Research

OPENALEX - Publications

Jouni Sirén Niko Välimäki Veli Mäkinen

We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications such extensions. extend Burrows-Wheeler transform (BWT) strings acyclic directed labeled graphs, support path queries as an extension substring searching. develop, apply, tailor this technique a) read alignment on extended BWT index representing pan-genome, i.e., reference genome known variants it; b) split-read splicing graph. Other possible...

10.1109/tcbb.2013.2297101 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2014-01-31

The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera

OPENALEX - Publications

Virpi Ahola Rainer Lehtonen Panu Somervuo Leena Salmela Patrik Koskinen and 42 more

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report genome (393 Mb) Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a recognized model species metapopulation biology and eco-evolutionary research, which putative ancestral karyotype n=31. Using phylogenetic analyses Nymphalidae other Lepidoptera, combined with orthologue-level comparisons chromosomes,...

10.1038/ncomms5737 article EN cc-by-nc-sa Nature Communications 2014-09-05

Storage and Retrieval of Highly Repetitive Sequence Collections

OPENALEX - Publications

Veli Mäkinen Gonzalo Navarro Jouni Sirén Niko Välimäki

A repetitive sequence collection is a set of sequences which are small variations each other. prominent example genome individuals the same or close species, where differences can be expressed by short lists basic edit operations. Flexible and efficient data analysis on such typically huge plausible using suffix trees. However, tree occupies much space, very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce space to, essentially, that compressed sequences, while...

10.1089/cmb.2009.0169 article EN Journal of Computational Biology 2010-03-01

Rank and select revisited and extended

OPENALEX - Publications

Veli Mäkinen Gonzalo Navarro

10.1016/j.tcs.2007.07.013 article EN publisher-specific-oa Theoretical Computer Science 2007-07-31

Dynamic entropy-compressed sequences and full-text indexes

OPENALEX - Publications

Veli Mäkinen Gonzalo Navarro

We give new solutions to the Searchable Partial Sums with Indels problem. Given a sequence of n k -bit numbers, we present structure taking kn + o ( ) bits space, able performing operations sum , search insert and delete all in O (log worst-case time, for any = ). This extends previous results by Hon et al. [2003c] achieving same space /log log time complexities queries, yet offering that are amortized worse than ours, supported only (1). Our result matches an existing lower bound large...

10.1145/1367064.1367072 article EN ACM Transactions on Algorithms 2008-06-01

A novel min-cost flow method for estimating transcript expression with RNA-Seq

OPENALEX - Publications

Alexandru I. Tomescu Anna Kuosmanen Roméo Rizzi Veli Mäkinen

Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms), depending on the individual, tissue cell is in, or in response to some stimuli. Recent RNA-Seq technology allows for new high-throughput ways isoform identification quantification based short reads, various methods have been put forward this non-trivial problem.In paper we propose novel radically method minimum-cost network flows. This has two-fold advantage: one hand, it...

10.1186/1471-2105-14-s5-s15 article EN cc-by BMC Bioinformatics 2013-04-01

Fast scaffolding with small independent mixed integer programs

OPENALEX - Publications

Leena Salmela Veli Mäkinen Niko Välimäki Johannes Ylinen Esko Ukkonen

Abstract Motivation: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. Results: present MIP Scaffolder that divides into smaller subproblems and solves these with mixed integer programming. The can be represented as a graph biconnected components this solved independently....

10.1093/bioinformatics/btr562 article EN cc-by-nc Bioinformatics 2011-10-13

Computational Pan-Genomics: Status, Promises and Challenges

OPENALEX - Publications

Tobias Marschall Manja Marz Thomas Abeel Louis J. Dijkstra Bas E. Dutilh and 54 more

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...

10.1101/043430 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2016-03-12

Bit-parallel sequence-to-graph alignment

OPENALEX - Publications

Mikko Rautiainen Veli Mäkinen Tobias Marschall

Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs genome assemblies, such as string and de Bruijn graphs, pan-genome hence genetic variation present population. Being able align sequencing reads is key step for many analyses its applications assembly, read error correction variant calling with respect graph.We generalize two linear...

10.1093/bioinformatics/btz162 article EN cc-by Bioinformatics 2019-03-07

Faster entropy-bounded compressed suffix trees

OPENALEX - Publications

Johannes Fischer Veli Mäkinen Gonzalo Navarro

10.1016/j.tcs.2009.09.012 article EN publisher-specific-oa Theoretical Computer Science 2009-09-16

Detection of Viruses in Sweetpotato from Honduras and Guatemala Augmented by Deep-Sequencing of Small-RNAs

OPENALEX - Publications

Muhammad Kashif Sirpa Pietilä K. Artola Richard A. Jones Arthur K. Tugume and 2 more

Sweetpotato (Ipomoea batatas) plants become infected with over 30 RNA or DNA viruses in different parts of the world but little is known about infecting sweetpotato crops Central America, center domestication. Small-RNA deep-sequencing (SRDS) analysis was used to detect Honduras and Guatemala, which detected Sweet potato feathery mottle virus strain RC C (Potyvirus spp.), chlorotic stunt WA (SPCSV-WA; Crinivirus sp.), leaf curl Georgia (Begomovirus pakakuy B (synonym: badnavirus B). Results...

10.1094/pdis-03-12-0268-re article EN other-oa Plant Disease 2012-05-09

Accurate spliced alignment of long RNA sequencing reads

OPENALEX - Publications

Kristoffer Sahlin Veli Mäkinen

Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses dependent on read alignments. However, error rate length of reads create new challenges for accurately aligning them, particularly around small exons.We present an alignment method uLTRA long based a two-pass collinear chaining algorithm. We show that produces higher accuracy over state-of-the-art aligners with substantially exons simulated synthetic...

10.1093/bioinformatics/btab540 article EN cc-by Bioinformatics 2021-07-20

On the Complexity of String Matching for Graphs

OPENALEX - Publications

Massimo Equi Veli Mäkinen Alexandru I. Tomescu Roberto Grossi

Exact string matching in labeled graphs is the problem of searching paths a graph G=(V, E) such that concatenation their node labels equal to given pattern P [1. m ]. This basic can be found at heart more complex operations on variation computational biology, query databases, and analysis heterogeneous networks. We prove conditional lower bound stating that, for any constant ε > 0, an O (| E | 1 - ) time, or )time algorithm exact graphs, with drawn from binary alphabet, cannot achieved...

10.1145/3588334 article EN cc-by-nc-nd ACM Transactions on Algorithms 2023-03-16

Coming Soon ...