Veli Mäkinen

ORCID: 0000-0003-4454-1493
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Algorithms and Data Compression
  • Genomics and Phylogenetic Studies
  • Network Packet Processing and Optimization
  • DNA and Biological Computing
  • RNA and protein synthesis mechanisms
  • Genome Rearrangement Algorithms
  • semigroups and automata theory
  • Natural Language Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Chromosomal and Genetic Variations
  • Cellular Automata and Applications
  • Machine Learning in Bioinformatics
  • Gene expression and cancer classification
  • Graph Theory and Algorithms
  • Molecular Biology Techniques and Applications
  • Music and Audio Processing
  • Machine Learning and Algorithms
  • Genomics and Rare Diseases
  • Cancer Genomics and Diagnostics
  • Genetics, Bioinformatics, and Biomedical Research
  • Genomics and Chromatin Dynamics
  • Genomic variations and chromosomal abnormalities
  • Advanced Data Compression Techniques
  • Image Retrieval and Classification Techniques
  • RNA modifications and cancer

University of Helsinki
2015-2024

Helsinki Institute for Information Technology
2012-2021

Helsinki Institute of Physics
2009-2016

Aalto University
2016

Max Planck Institute for Informatics
2015

Saarland University
2015

Centrum Wiskunde & Informatica
2015

Bielefeld University
2005-2015

HAN University of Applied Sciences
2014

Institute of Bioinformatics
2003-2008

Given a sequence S = s 1 2 … n of integers smaller than r O (polylog( )), we show how can be represented using nH 0 ( ) + o bits, so that know any q , as well answer rank and select queries on in constant time. H is the zero-order empirical entropy provides an information-theoretic lower bound to bit storage via fixed encoding its symbols. This extends previous results binary sequences, improves general sequences where those are answered (log For larger still represent log bits /log Another...

10.1145/1240233.1240243 article EN ACM Transactions on Algorithms 2007-05-01

We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications such extensions. extend Burrows-Wheeler transform (BWT) strings acyclic directed labeled graphs, support path queries as an extension substring searching. develop, apply, tailor this technique a) read alignment on extended BWT index representing pan-genome, i.e., reference genome known variants it; b) split-read splicing graph. Other possible...

10.1109/tcbb.2013.2297101 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2014-01-31

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report genome (393 Mb) Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a recognized model species metapopulation biology and eco-evolutionary research, which putative ancestral karyotype n=31. Using phylogenetic analyses Nymphalidae other Lepidoptera, combined with orthologue-level comparisons chromosomes,...

10.1038/ncomms5737 article EN cc-by-nc-sa Nature Communications 2014-09-05

A repetitive sequence collection is a set of sequences which are small variations each other. prominent example genome individuals the same or close species, where differences can be expressed by short lists basic edit operations. Flexible and efficient data analysis on such typically huge plausible using suffix trees. However, tree occupies much space, very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce space to, essentially, that compressed sequences, while...

10.1089/cmb.2009.0169 article EN Journal of Computational Biology 2010-03-01

10.1016/j.tcs.2007.07.013 article EN publisher-specific-oa Theoretical Computer Science 2007-07-31

We give new solutions to the Searchable Partial Sums with Indels problem. Given a sequence of n k -bit numbers, we present structure taking kn + o ( ) bits space, able performing operations sum , search insert and delete all in O (log worst-case time, for any = ). This extends previous results by Hon et al. [2003c] achieving same space /log log time complexities queries, yet offering that are amortized worse than ours, supported only (1). Our result matches an existing lower bound large...

10.1145/1367064.1367072 article EN ACM Transactions on Algorithms 2008-06-01

Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms), depending on the individual, tissue cell is in, or in response to some stimuli. Recent RNA-Seq technology allows for new high-throughput ways isoform identification quantification based short reads, various methods have been put forward this non-trivial problem.In paper we propose novel radically method minimum-cost network flows. This has two-fold advantage: one hand, it...

10.1186/1471-2105-14-s5-s15 article EN cc-by BMC Bioinformatics 2013-04-01

Abstract Motivation: Assembling genomes from short read data has become increasingly popular, but the problem remains computationally challenging especially for larger genomes. We study scaffolding phase of sequence assembly where preassembled contigs are ordered based on mate pair data. Results: present MIP Scaffolder that divides into smaller subproblems and solves these with mixed integer programming. The can be represented as a graph biconnected components this solved independently....

10.1093/bioinformatics/btr562 article EN cc-by-nc Bioinformatics 2011-10-13

Abstract Many disciplines, from human genetics and oncology to plant breeding, microbiology virology, commonly face the challenge of analyzing rapidly increasing numbers genomes. In case Homo sapiens , number sequenced genomes will approach hundreds thousands in next few years. Simply scaling up established bioinformatics pipelines not be sufficient for leveraging full potential such rich genomic datasets. Instead, novel, qualitatively different computational methods paradigms are needed. We...

10.1101/043430 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2016-03-12

Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs genome assemblies, such as string and de Bruijn graphs, pan-genome hence genetic variation present population. Being able align sequencing reads is key step for many analyses its applications assembly, read error correction variant calling with respect graph.We generalize two linear...

10.1093/bioinformatics/btz162 article EN cc-by Bioinformatics 2019-03-07

10.1016/j.tcs.2009.09.012 article EN publisher-specific-oa Theoretical Computer Science 2009-09-16

Sweetpotato (Ipomoea batatas) plants become infected with over 30 RNA or DNA viruses in different parts of the world but little is known about infecting sweetpotato crops Central America, center domestication. Small-RNA deep-sequencing (SRDS) analysis was used to detect Honduras and Guatemala, which detected Sweet potato feathery mottle virus strain RC C (Potyvirus spp.), chlorotic stunt WA (SPCSV-WA; Crinivirus sp.), leaf curl Georgia (Begomovirus pakakuy B (synonym: badnavirus B). Results...

10.1094/pdis-03-12-0268-re article EN other-oa Plant Disease 2012-05-09

Long-read RNA sequencing technologies are establishing themselves as the primary techniques to detect novel isoforms, and many such analyses dependent on read alignments. However, error rate length of reads create new challenges for accurately aligning them, particularly around small exons.We present an alignment method uLTRA long based a two-pass collinear chaining algorithm. We show that produces higher accuracy over state-of-the-art aligners with substantially exons simulated synthetic...

10.1093/bioinformatics/btab540 article EN cc-by Bioinformatics 2021-07-20

Exact string matching in labeled graphs is the problem of searching paths a graph G=(V, E) such that concatenation their node labels equal to given pattern P [1. m ]. This basic can be found at heart more complex operations on variation computational biology, query databases, and analysis heterogeneous networks. We prove conditional lower bound stating that, for any constant ε > 0, an O (| E | 1 - ) time, or )time algorithm exact graphs, with drawn from binary alphabet, cannot achieved...

10.1145/3588334 article EN cc-by-nc-nd ACM Transactions on Algorithms 2023-03-16
Coming Soon ...