Tree compression pushdown automaton
Kybernetika, Tome 48 (2012) no. 3, pp. 429-452 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.
A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.
Classification : 05C05, 68Q68
Keywords: trees; pushdown automata; compression; indexing trees; arbology
@article{KYB_2012_48_3_a5,
     author = {Janou\v{s}ek, Jan and Melichar, Bo\v{r}ivoj and Poliak, Martin},
     title = {Tree compression pushdown automaton},
     journal = {Kybernetika},
     pages = {429--452},
     year = {2012},
     volume = {48},
     number = {3},
     mrnumber = {2975798},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2012_48_3_a5/}
}
TY  - JOUR
AU  - Janoušek, Jan
AU  - Melichar, Bořivoj
AU  - Poliak, Martin
TI  - Tree compression pushdown automaton
JO  - Kybernetika
PY  - 2012
SP  - 429
EP  - 452
VL  - 48
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/KYB_2012_48_3_a5/
LA  - en
ID  - KYB_2012_48_3_a5
ER  - 
%0 Journal Article
%A Janoušek, Jan
%A Melichar, Bořivoj
%A Poliak, Martin
%T Tree compression pushdown automaton
%J Kybernetika
%D 2012
%P 429-452
%V 48
%N 3
%U http://geodesic.mathdoc.fr/item/KYB_2012_48_3_a5/
%G en
%F KYB_2012_48_3_a5
Janoušek, Jan; Melichar, Bořivoj; Poliak, Martin. Tree compression pushdown automaton. Kybernetika, Tome 48 (2012) no. 3, pp. 429-452. http://geodesic.mathdoc.fr/item/KYB_2012_48_3_a5/

[1] Arbology www pages:. Available on http://www.arbology.org/, March 2012.

[2] Aho, A. V., Ullman, J. D.: The Theory of Parsing, Translation, and Compiling. Prentice-Hall Englewood Cliffs, N.J. 1972. | MR

[3] Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E.: The protein data bank. Nucleic Acids Res. 28 (2000), 235–242. | DOI

[4] Bille, P.: Pattern Matching in Trees and Strings. Ph.D. Thesis, FIT University of Copenhagen 2008.

[5] Busatto, G., Lohrey, M., Maneth, S.: Grammar-based Tree Compression. Technical Report 2004.

[6] Cleophas, L.: Tree Algorithms. Two Taxonomies and a Toolkit. Ph.D. Thesis, Technische Universiteit Eindhoven 2008.

[7] Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic $log^3$-time. SODA (1999), 245–254.

[8] Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree automata techniques and applications. Available on: http://www.grappa.univ-lille3.fr/tata (2007).

[9] Gecseg, F., Steinby, M.: Tree languages. In: Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), Vol. 3 Beyond Words. Springer-Verlag, Berlin 1997, pp. 1–68. | MR

[10] Flouri, T., Janoušek, J., Melichar, B.: Subtree and tree pattern pushdown automata for trees in prefix notation (2009). Comput. Sci. Inform. Systems 7 (2010), 2, 331–357.

[11] Hoffmann, Ch., O'Donnell, M.: Pattern matching in trees. J. Assoc. Comput. Mach. 29 (1982), 1, 68–95. | DOI | MR | Zbl

[12] Hopcroft, J. E., Motwani, R., Ullman, J. D.: Introduction to Automata Theory, languages, and Computation. Second edition. Addison-Wesley, Boston 2001. | MR

[13] Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez et al, R.: The Universal Protein Resource (UniProt) – An Expanding Universe of Protein Information. Database issue, Nucleic Acids Research, Oxford University Press, D187–D191 2006.

[14] Schmidt, A. R., Waas, F., Kersten, M. L., Carey, M. J., Manolescu, I., Busse, R.: XMark – A benchmark for XML data management. In: Proc. International Conference on Very Large Data Bases (VLDB), Hong Kong 2002, pp. 974–985.

[15] Van Der Beek, L., Bouma, G., Malouf, R., Van Noord, G., Rijksuniversiteit Groningen: The Alpino dependency treebank. In: Computational Linguistics in the Netherlands (CLIN) 2002, pp. 1686–1691.

[16] Welch, T.: A technique for high-performance data compression. Computer 17 (1984), 6, 8–19. | DOI

[17] Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23 (1977), 3, 337–343. | DOI | MR | Zbl