VECTOR SPACE INDEXING FOR BIOSEQUENCE SIMILARITY SEARCHES
Pruning
Similarity (geometry)
DOI:
10.1142/s0218213005002405
Publication Date:
2005-09-20T00:33:18Z
AUTHORS (2)
ABSTRACT
We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains build efficient index structures on the transformed vectors. then define distance functions domain examine properties these functions. experimentally compared their (a) approximation quality k-Nearest Neighbor (k-NN) queries both (b) pruning ability (c) ε-range queries. Results k-NN queries, which here, show that our proposed distances FD2 WD2 (i.e. Frequency Wavelet Distance 2-grams) perform significantly better than others. develop structures, based R-trees scalar quantization, top vectors Promising results from experiments real biosequence data sets are presented.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (11)
CITATIONS (1)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....