Multidimensional term indexing for efficient processing of complex queries
Kybernetika, Tome 40 (2004) no. 3, pp. 381-396 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.
The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.
Classification : 14Q15, 68P05, 68P10, 68P20
Keywords: term indexing; complex queries; multidimensional data structures; BUB-forest
@article{KYB_2004_40_3_a8,
     author = {Kr\'atk\'y, Michal and Skopal, Tom\'a\v{s} and Sn\'a\v{s}el, V\'aclav},
     title = {Multidimensional term indexing for efficient processing of complex queries},
     journal = {Kybernetika},
     pages = {381--396},
     year = {2004},
     volume = {40},
     number = {3},
     zbl = {1249.68042},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2004_40_3_a8/}
}
TY  - JOUR
AU  - Krátký, Michal
AU  - Skopal, Tomáš
AU  - Snášel, Václav
TI  - Multidimensional term indexing for efficient processing of complex queries
JO  - Kybernetika
PY  - 2004
SP  - 381
EP  - 396
VL  - 40
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/KYB_2004_40_3_a8/
LA  - en
ID  - KYB_2004_40_3_a8
ER  - 
%0 Journal Article
%A Krátký, Michal
%A Skopal, Tomáš
%A Snášel, Václav
%T Multidimensional term indexing for efficient processing of complex queries
%J Kybernetika
%D 2004
%P 381-396
%V 40
%N 3
%U http://geodesic.mathdoc.fr/item/KYB_2004_40_3_a8/
%G en
%F KYB_2004_40_3_a8
Krátký, Michal; Skopal, Tomáš; Snášel, Václav. Multidimensional term indexing for efficient processing of complex queries. Kybernetika, Tome 40 (2004) no. 3, pp. 381-396. http://geodesic.mathdoc.fr/item/KYB_2004_40_3_a8/

[1] Baeza-Yates R., Ribeiro-Neto B.: Modern Information Retrieval. Addison Wesley, New York 1999

[2] Bayer R.: The iniversal B-tree for multidimensional indexing: General concepts. In: Proc. World-Wide Computing and its Applications’97, WWCA’97, Tsukuba 1997

[3] Beckmann N., Kriegel H., Schneider, R., Seeger B.: The R*-tree: An effcient and robust access method for points and rectangles. In: Proc. 1990 ACM SIGMOD Internat. Conference on Management of Data, Atlantic City, NJ 1990, pp. 322–331

[4] Böhm C., Berchtold, S., Keim D. A.: Searching in high-dimensional spaces – Index structures for improving the performance of multimedia databases. ACM, 2002

[5] Crochemore M., Rytter W.: Text Algorithms. Oxford University Press, Oxford 1994 | MR | Zbl

[6] Deppisch U.: S-tree: a dynamic balanced signature index for office retrieval. In: Proc. 9th ACM SIGIR Conference, Pisa 1986, pp. 77–87

[7] Dvorský J., Krátký M., Skopal, T., Snášel V.: Term indexing in information retrieval systems. In: Proc. CIC’03, CSREA Press, Las Vegas 2003

[8] Faloutsos C.: Gray codes for partial match and range queries. IEEE Trans. Software Engrg. 14 (1988), 10, xxx–xxx | DOI | MR | Zbl

[9] Fenk R.: The BUB-Tree. In: Proc. 28rd VLDB Internat. Conference on VLDB. Hongkong 2002

[10] Fenk R., Markl, V., Bayer R.: Improving multidimensional range queries of non rectangular volumes specified by a query box set. In: Proc. Internat. Symposium on Database, Web and Cooperative Systems (DWACOS), Baden-Baden 1999

[11] Ferragina P., Grossi R.: A fully-dynamic data structure for external substring search. In: Proc. ACM Symposium on Theory of Computing, 1995 | Zbl

[12] Guttman A.: R-Trees: A dynamic index structure for spatial searching. In: Proc. ACM SIGMOD 1984, ACM Press, Boston 1984, pp. 47–57

[13] Manolopoulos Y., Theodoridis, Y., Tsotras V.: Advanced Database Indexing. Kluwer, Dordrecht 2001 | Zbl

[14] Markl V.: Mistral: Processing Relational Queries using a Multidimensional Access Technique. Ph.D. Thesis. Technical University München 1999, http://mistral.in.tum.de/results/publications/Mar99.pdf

[15] NIST: Text REtrieval Conference (TREC). 2003, http://trec.nist.gov/

[16] Ramsak F., Markl V., Fenk R., Zirkel M., Elhardt, K., Bayer R.: Integrating the UB-tree into a database system kernel. In: Proc. 26th VLDB Internat. Conference, Morgan Kaufmann, Cairo 2000

[17] Roman S.: Advanced Linear Algebra. Springer Verlag, Berlin 1995 | Zbl

[18] Salton G., McGill M. J.: Introduction to Modern Information Retrieval. First edition. McGraw Hill, New York 1983 | Zbl

[19] Skopal T., Krátký M., Snášel, V., Pokorný J.: On Range Queries in Universal B-trees. Technical Report No. ARG-TR-01-2003, Department of Computer Science, VŠB-Technical University of Ostrava 2003, http://www.cs.vsb.cz/arg/techreports/range.pdf

[20] Stephen G. A.: String Searching Algorithms. Lecture Notes Series on Computing, World Scientific, 1998 | MR | Zbl

[21] Wirth N.: Algorithms and Data Structures. Prentice Hall, Englewood Cliffs, N. J. 1984 | MR | Zbl

[22] Witten I. H., Moffat, A., Bell T. C.: Managing Gigabytes, Compressing and Indexing Documents and Images. Van Nostrand Reinhold, New York 1994 | Zbl

[23] Consortium, W3: Extensible Markup Language (XML) 1. 0. 1998, http://www.w3.org/ TR/REC-xml