- Algorithms and Data Compression
- semigroups and automata theory
- DNA and Biological Computing
- Network Packet Processing and Optimization
- Natural Language Processing Techniques
- Cellular Automata and Applications
- Genomics and Phylogenetic Studies
- User Authentication and Security Systems
- Cryptography and Data Security
- Data Mining Algorithms and Applications
- Advanced Authentication Protocols Security
- Web Data Mining and Analysis
- Machine Learning and Algorithms
- Advanced Data Storage Technologies
- RNA and protein synthesis mechanisms
- Time Series Analysis and Forecasting
- Handwritten Text Recognition Techniques
- Advanced Data Compression Techniques
- Advanced Text Analysis Techniques
- Genome Rearrangement Algorithms
- Data Management and Algorithms
- Music and Audio Processing
- Computability, Logic, AI Algorithms
- Gene expression and cancer classification
- Advanced Image and Video Retrieval Techniques
Kyushu University
2015-2024
Japan Science and Technology Agency
2003-2022
Japan Society for the Promotion of Science
2021
University of Helsinki
2004
The University of Tokyo
2002
We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The leads to proof what was known as the "runs" conjecture [R. M. Kolpakov and G. Kucherov, Proceedings IEEE Symposium Foundations Computer Science (FOCS), Society, Los Alamitos, CA, 1999, pp. 596--604]), which states that maximum number runs $\rho(n)$ string length $n$ is less than $n$. remarkably simple, considering numerous endeavors tackle this problem last 15 years, significantly improves...
Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, compression, and discovery. Intuitively, the CDAWG of string $T$ is obtained by merging isomorphic subtrees suffix tree [Weiner 1973] same $T$, thus CDAWGs compact indexing structure. In this paper, we investigate sensitivity when single character edit operation performed at an arbitrary position $T$. We show that size after...
We give a new characterization of maximal repetitions (or runs) in strings, using tree defined on recursive standard factorizations Lyndon words, called the tree. The leads to remarkably simple novel proof linearity maximum number runs ρ(n) string length n. Furthermore, we show an upper bound
The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes and has only O(n) nodes edges. We present first O(n)-time algorithm for computing DAWG given length n over an integer alphabet polynomial size in n. also show that straightforward modification to our construction leads constructing affix tree alphabet. Affix trees are text indexing structure supporting bidirectional pattern searches. As application algorithm, we set MAW(y) minimal...
We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, u = S[i..j] is said to be S containing an interval [s, t] (i j'-i', S[i'..j'] occurs at least twice in S. Given encoding size m length N, we show that can construct data structure O(m+pi_s(N, m)) O(m log + pi_c(N, time such answered in O(pi_q(N, m) k) time, where k output (the number SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, size, construction...
Previous chapter Next Full AccessProceedings Proceedings of the 2015 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)A new characterization maximal repetitions by Lyndon treesHideo Bannai, Tomohiro I, Shunsuke Inenaga, Yuto Nakashima, Masayuki Takeda, and Kazuya TsurutaHideo Tsurutapp.562 - 571Chapter DOI:https://doi.org/10.1137/1.9781611973730.38PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract We give a (or runs) in strings, using tree...
A Longest Common Extension (LCE) query on a text $T$ of length $N$ asks for the longest common prefix suffixes starting at given two positions. We show that signature encoding $\mathcal{G}$ size $w = O(\min(z \log N \log^* M, N))$ [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] $T$, which can be seen as compressed representation has capability to support LCE queries in $O(\log + \ell M)$ time, where $\ell$ is answer query, $z$ Lempel-Ziv77 (LZ77) factorization and $M \geq 4N$ an integer...