Faster subsequence recognition in compressed strings

A. Tiskin

A. Tiskin

Zapiski Nauchnykh Seminarov POMI, Studies in constructive mathematics and mathematical logic. Part XI, Tome 358 (2008), pp. 282-300 Cet article a éte moissonné depuis la source Math-Net.Ru

Voir la notice du chapitre de livre

Résumé

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLP-compressed text of length $\overline m$, and an uncompressed pattern of length $n$, Cégielski et al. gave an algorithm for local subsequence recognition running in time $O(\overline mn^2\log n)$. We improve the running time to $O(\overline mn^{1.5})$. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time $O(\overline mn^{1.5})$; the same problem with a compressed pattern is known to be NP-hard. Bibl. – 22 titles.

Export
Comment citer

@article{ZNSL_2008_358_a14,
     author = {A. Tiskin},
     title = {Faster subsequence recognition in compressed strings},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {282--300},
     year = {2008},
     volume = {358},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/}
}

TY  - JOUR
AU  - A. Tiskin
TI  - Faster subsequence recognition in compressed strings
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2008
SP  - 282
EP  - 300
VL  - 358
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/
LA  - en
ID  - ZNSL_2008_358_a14
ER  -

%0 Journal Article
%A A. Tiskin
%T Faster subsequence recognition in compressed strings
%J Zapiski Nauchnykh Seminarov POMI
%D 2008
%P 282-300
%V 358
%U http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/
%G en
%F ZNSL_2008_358_a14

A. Tiskin. Faster subsequence recognition in compressed strings. Zapiski Nauchnykh Seminarov POMI, Studies in constructive mathematics and mathematical logic. Part XI, Tome 358 (2008), pp. 282-300. http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/

Bibliographie
Cité par

[1] A. V. Aho, J. E. Hopcroft, J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1976 | MR

[2] C. E. R. Alves, E. N. Cáceres, S. W. Song, “An all-substrings common subsequence algorithm”, Electr. Notes Discr. Math., 19 (2005), 133–139 | DOI | MR | Zbl

[3] J. L. Bentley, “Multidimensional divide-and-conquer”, Commu. ACM, 23:4 (1980), 214–229 | DOI | MR | Zbl

[4] P. Cégielski, I. Guessarian, Y. Lifshits, Y. Matiyasevich, “Window subsequence problems for compressed texts”, Proceedings of CSR, Lect. Notes Comp. Sci., 3967, 2006, 127–136 | MR | Zbl

[5] P. Cégielski, I. Guessarian, Y. Matiyasevich, “Multiple serial episodes matching”, Inform. Process. Lett., 98:6 (2006), 211–218 | DOI | MR | Zbl

[6] M. Crochemore, G. M. Landau, M. Ziv-Ukelson, “A subquadratic sequence alignment algorithm for unrestricted score matrices”, SIAM J. Comp., 32:6 (2003), 1654–1673 | DOI | MR | Zbl

[7] M. Crochemore, W. Rytter, Text Algorithms, Oxford University Press, 1994 | MR | Zbl

[8] G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos, J. Kärkkäinen, “Episode matching”, Proceedings of CPM, Lect. Notes Comp. Sci., 1264, 1997, 12–27 | MR

[9] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997 | MR | Zbl

[10] J. Kärkkäinen, G. Navarro, E. Ukkonen, “Approximate string matching on Ziv–Lempel compressed text”, J. Discr. Alg., 1 (2003), 313–338 | DOI | MR | Zbl

[11] Y. Lifshits, M. Lohrey, “Querying and embedding compressed texts”, Proceedings of MFCS, Lect. Notes Comp. Sci., 4162, 2006, 681–692 | MR | Zbl

[12] W. J. Masek, M. S. Paterson, “A faster algorithm computing string edit distances”, J. Comp. System Sci., 20 (1980), 18–31 | DOI | MR | Zbl

[13] G. Myers, “Approximately matching context-free languages”, Inform. Process. Lett., 54 (1995), 85–92 | DOI | MR | Zbl

[14] G. Navarro, “A guided tour to approximate string matching”, ACM Comp. Surv., 33:1 (2001), 31–88 | DOI

[15] F. P. Preparata, M. I. Shamos, Computational Geometry: An Introduction, Texts and Monographs in Computer Science, Springer, 1985 | MR | Zbl

[16] W. Rytter, “Application of Lempel–Ziv factorization to the approximation of grammar-based compression”, Theor. Comp. Sci., 302:1–3 (2003), 211–222 | DOI | MR | Zbl

[17] A. Tiskin, “Semi-local longest common subsequences in subquadratic time”, J. Discr. Alg., 6:4 (2008), 570–581 | DOI | MR | Zbl

[18] A. Tiskin, “Semi-local string comparison: Algorithmic techniques and applications”, Math. Comput. Sci., 1:4 (2008), 571–603 | DOI | MR | Zbl

[19] B. W. Watson, G. Zwaan, “A taxonomy of sublinear multiple keyword pattern matching algorithms”, Sci. Comput. Programm., 27:2 (1996), 85–118 | DOI | MR | Zbl

[20] T. A. Welch, “A technique for high-performance data compression”, Computer, 17:6 (1984), 8–19 | DOI

[21] G. Ziv, A. Lempel, “A universal algorithm for sequential data compression”, IEEE Transact. Inform. Theory, 23 (1977), 337–343 | DOI | MR | Zbl

[22] G. Ziv, A. Lempel, “Compression of individual sequences via variable-rate coding”, IEEE Transact. Inform. Theory, 24 (1978), 530–536 | DOI | MR | Zbl

Parcourir par

Geodesic

Parcourir par