Compressed representations of sequences and full-text indexes
Substring
Rank (graph theory)
Sequence (biology)
DOI:
10.1145/1240233.1240243
Publication Date:
2007-06-06T14:37:11Z
AUTHORS (4)
ABSTRACT
Given a sequence S = s 1 2 … n of integers smaller than r O (polylog( )), we show how can be represented using nH 0 ( ) + o bits, so that know any q , as well answer rank and select queries on in constant time. H is the zero-order empirical entropy provides an information-theoretic lower bound to bit storage via fixed encoding its symbols. This extends previous results binary sequences, improves general sequences where those are answered (log For larger still represent log bits /log Another contribution this article combine our compressed representation integer with compression boosting technique design full-text indexes scale size input alphabet Σ. Specifically, variant FM-index string T [1, ] within k storage, th-order . space holds simultaneously for all ≤ α |Σ| < 1, )). index counts occurrences arbitrary pattern P p substring time; it locates each occurrence 1+ε time ε 1; reports text length ℓ (ℓ Compared works, first removes alphabet-size dependance from query times, particular, counting linear length. Still, uses essentially same which best obtained work. We also handle alphabets β ), by paying log|Σ|) extra multiplying times |Σ|/log ).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (38)
CITATIONS (256)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....