Implementation of directed acyclic word graph
Kybernetika, Tome 38 (2002) no. 1, pp. 91-103 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.
An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.
Classification : 05C85, 68P05, 68Q45, 68R10, 68W05
Keywords: directed acyclic word graph automaton; string matching; data structures
@article{KYB_2002_38_1_a5,
     author = {Bal{\'\i}k, Miroslav},
     title = {Implementation of directed acyclic word graph},
     journal = {Kybernetika},
     pages = {91--103},
     year = {2002},
     volume = {38},
     number = {1},
     mrnumber = {1899849},
     zbl = {1265.68116},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a5/}
}
TY  - JOUR
AU  - Balík, Miroslav
TI  - Implementation of directed acyclic word graph
JO  - Kybernetika
PY  - 2002
SP  - 91
EP  - 103
VL  - 38
IS  - 1
UR  - http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a5/
LA  - en
ID  - KYB_2002_38_1_a5
ER  - 
%0 Journal Article
%A Balík, Miroslav
%T Implementation of directed acyclic word graph
%J Kybernetika
%D 2002
%P 91-103
%V 38
%N 1
%U http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a5/
%G en
%F KYB_2002_38_1_a5
Balík, Miroslav. Implementation of directed acyclic word graph. Kybernetika, Tome 38 (2002) no. 1, pp. 91-103. http://geodesic.mathdoc.fr/item/KYB_2002_38_1_a5/

[1] Adámek J.: Coding. MVŠT XXXI, SNTL, Prague 1989 (in Czech)

[2] Anderson A., Nilson S.: Efficient implementation of suffix trees. Software–Practice and Expirience 25 (1995), 129–141 | DOI

[3] Balík M.: String Matching in a Text. Diploma Thesis, CTU, Dept. of Computer Science and Engineering, Prague 1998

[4] Crochemore M., Rytter W.: Text Algorithms. Oxford University Press, New York 1994 | MR | Zbl

[5] Crochemore M., Vérin R.: Direct construction of compact directed acyclic word graphs. In: CPM97 (A. Apostolico and J. Hein, eds., Lecture Notes in Computer Science 1264), Springer–Verlag, Berlin 1997, pp. 116–129 | MR

[6] Gonnet G. H., Baeza–Yates R.: Handbook of Algorithms and Data Structures. Pascal and C. Addison–Wesley, Wokingham 1991 | Zbl

[7] Huffman D. A.: A method for construction of minimum redundancy codes. Proc. IRE 40 (1952), 9, 1098–1101

[8] Irving R. W.: Suffix Binary Search Trees, Technical Report TR-1995-7, Computing Science Department, University of Glasgow 199.

[9] Kärkkäinen J.: Suffix cactus: A cross between suffix tree and suffix array. In: Proc. 6th Symposium on Combinatorial Pattern Matching, CPM95, 1995, pp. 191–204 | MR

[10] Kurtz S.: Reducing the Space Requirment of Suffix Trees. Software–Practice and Experience 29 (1999), 13, 1149-1171 | DOI

[11] Melichar B.: Approximate string matching by finite automata. In: Computer Analysis of Images and Patterns (Lecture Notes in Computer Science 970), Springer–Verlag, Berlin 1995

[12] Melichar B.: Fulltext Systems. Publishing House CTU, Prague 1996 (in Czech)

[13] Melichar B.: Pattern matching and finite automata. In: Proceedings of the Prague Stringology Club Workshop ’97, Prague 1997