Faster subsequence recognition in compressed strings

A. Tiskin

Zapiski Nauchnykh Seminarov POMI, Studies in constructive mathematics and mathematical logic. Part XI, Tome 358 (2008), pp. 282-300 Citer cet article

A. Tiskin. Faster subsequence recognition in compressed strings. Zapiski Nauchnykh Seminarov POMI, Studies in constructive mathematics and mathematical logic. Part XI, Tome 358 (2008), pp. 282-300. http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/

@article{ZNSL_2008_358_a14,
     author = {A. Tiskin},
     title = {Faster subsequence recognition in compressed strings},
     journal = {Zapiski Nauchnykh Seminarov POMI},
     pages = {282--300},
     year = {2008},
     volume = {358},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/}
}

TY  - JOUR
AU  - A. Tiskin
TI  - Faster subsequence recognition in compressed strings
JO  - Zapiski Nauchnykh Seminarov POMI
PY  - 2008
SP  - 282
EP  - 300
VL  - 358
UR  - http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/
LA  - en
ID  - ZNSL_2008_358_a14
ER  -

%0 Journal Article
%A A. Tiskin
%T Faster subsequence recognition in compressed strings
%J Zapiski Nauchnykh Seminarov POMI
%D 2008
%P 282-300
%V 358
%U http://geodesic.mathdoc.fr/item/ZNSL_2008_358_a14/
%G en
%F ZNSL_2008_358_a14

Voir la notice du chapitre de livre provenant de la source Math-Net.Ru

Résumé

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel–Ziv compression. For an SLP-compressed text of length $\overline m$, and an uncompressed pattern of length $n$, Cégielski et al. gave an algorithm for local subsequence recognition running in time $O(\overline mn^2\log n)$. We improve the running time to $O(\overline mn^{1.5})$. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time $O(\overline mn^{1.5})$; the same problem with a compressed pattern is known to be NP-hard. Bibl. – 22 titles.

Bibliographie
Cité par

[1] A. V. Aho, J. E. Hopcroft, J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1976 | MR

[2] C. E. R. Alves, E. N. Cáceres, S. W. Song, “An all-substrings common subsequence algorithm”, Electr. Notes Discr. Math., 19 (2005), 133–139 | DOI | MR | Zbl

[3] J. L. Bentley, “Multidimensional divide-and-conquer”, Commu. ACM, 23:4 (1980), 214–229 | DOI | MR | Zbl

[4] P. Cégielski, I. Guessarian, Y. Lifshits, Y. Matiyasevich, “Window subsequence problems for compressed texts”, Proceedings of CSR, Lect. Notes Comp. Sci., 3967, 2006, 127–136 | MR | Zbl

[5] P. Cégielski, I. Guessarian, Y. Matiyasevich, “Multiple serial episodes matching”, Inform. Process. Lett., 98:6 (2006), 211–218 | DOI | MR | Zbl

[6] M. Crochemore, G. M. Landau, M. Ziv-Ukelson, “A subquadratic sequence alignment algorithm for unrestricted score matrices”, SIAM J. Comp., 32:6 (2003), 1654–1673 | DOI | MR | Zbl

[7] M. Crochemore, W. Rytter, Text Algorithms, Oxford University Press, 1994 | MR | Zbl

[8] G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos, J. Kärkkäinen, “Episode matching”, Proceedings of CPM, Lect. Notes Comp. Sci., 1264, 1997, 12–27 | MR

[9] D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997 | MR | Zbl

[10] J. Kärkkäinen, G. Navarro, E. Ukkonen, “Approximate string matching on Ziv–Lempel compressed text”, J. Discr. Alg., 1 (2003), 313–338 | DOI | MR | Zbl

[11] Y. Lifshits, M. Lohrey, “Querying and embedding compressed texts”, Proceedings of MFCS, Lect. Notes Comp. Sci., 4162, 2006, 681–692 | MR | Zbl

[12] W. J. Masek, M. S. Paterson, “A faster algorithm computing string edit distances”, J. Comp. System Sci., 20 (1980), 18–31 | DOI | MR | Zbl

[13] G. Myers, “Approximately matching context-free languages”, Inform. Process. Lett., 54 (1995), 85–92 | DOI | MR | Zbl

[14] G. Navarro, “A guided tour to approximate string matching”, ACM Comp. Surv., 33:1 (2001), 31–88 | DOI

[15] F. P. Preparata, M. I. Shamos, Computational Geometry: An Introduction, Texts and Monographs in Computer Science, Springer, 1985 | MR | Zbl

[16] W. Rytter, “Application of Lempel–Ziv factorization to the approximation of grammar-based compression”, Theor. Comp. Sci., 302:1–3 (2003), 211–222 | DOI | MR | Zbl

[17] A. Tiskin, “Semi-local longest common subsequences in subquadratic time”, J. Discr. Alg., 6:4 (2008), 570–581 | DOI | MR | Zbl

[18] A. Tiskin, “Semi-local string comparison: Algorithmic techniques and applications”, Math. Comput. Sci., 1:4 (2008), 571–603 | DOI | MR | Zbl

[19] B. W. Watson, G. Zwaan, “A taxonomy of sublinear multiple keyword pattern matching algorithms”, Sci. Comput. Programm., 27:2 (1996), 85–118 | DOI | MR | Zbl

[20] T. A. Welch, “A technique for high-performance data compression”, Computer, 17:6 (1984), 8–19 | DOI

[21] G. Ziv, A. Lempel, “A universal algorithm for sequential data compression”, IEEE Transact. Inform. Theory, 23 (1977), 337–343 | DOI | MR | Zbl

[22] G. Ziv, A. Lempel, “Compression of individual sequences via variable-rate coding”, IEEE Transact. Inform. Theory, 24 (1978), 530–536 | DOI | MR | Zbl

Parcourir par

Geodesic

Parcourir par