Flexible representation and querying of heterogeneous structured documents
Kybernetika, Tome 36 (2000) no. 6, pp. 617-633 Cet article a éte moissonné depuis la source Czech Digital Mathematics Library

Voir la notice de l'article

In this paper we present a fuzzy model for representing documents having a hierarchical structure and possibly containing multimedia information. We consider an archive containing documents with distinct (heterogeneous) logical structures. We also propose a flexible query language for expressing soft selection conditions on the structured documents. The documents’ content is organized into thematic (topical) sections where the index terms play a distinct role. The proposed document representation is adaptive to the user, who can indicate the preferred sections of documents, i. e. those which they estimate to bear the most interesting information, and can linguistically quantify the number of sections which determine the global potential interest of the documents. Linguistic quantifiers in the query specify the approximate number of the sections in which the query terms should appear.
In this paper we present a fuzzy model for representing documents having a hierarchical structure and possibly containing multimedia information. We consider an archive containing documents with distinct (heterogeneous) logical structures. We also propose a flexible query language for expressing soft selection conditions on the structured documents. The documents’ content is organized into thematic (topical) sections where the index terms play a distinct role. The proposed document representation is adaptive to the user, who can indicate the preferred sections of documents, i. e. those which they estimate to bear the most interesting information, and can linguistically quantify the number of sections which determine the global potential interest of the documents. Linguistic quantifiers in the query specify the approximate number of the sections in which the query terms should appear.
Classification : 68P20, 68T30
Keywords: query language; heterogeneously structured document
@article{KYB_2000_36_6_a1,
     author = {Bordogna, Gloria and Pasi, Gabriella},
     title = {Flexible representation and querying of heterogeneous structured documents},
     journal = {Kybernetika},
     pages = {617--633},
     year = {2000},
     volume = {36},
     number = {6},
     zbl = {1249.68228},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/KYB_2000_36_6_a1/}
}
TY  - JOUR
AU  - Bordogna, Gloria
AU  - Pasi, Gabriella
TI  - Flexible representation and querying of heterogeneous structured documents
JO  - Kybernetika
PY  - 2000
SP  - 617
EP  - 633
VL  - 36
IS  - 6
UR  - http://geodesic.mathdoc.fr/item/KYB_2000_36_6_a1/
LA  - en
ID  - KYB_2000_36_6_a1
ER  - 
%0 Journal Article
%A Bordogna, Gloria
%A Pasi, Gabriella
%T Flexible representation and querying of heterogeneous structured documents
%J Kybernetika
%D 2000
%P 617-633
%V 36
%N 6
%U http://geodesic.mathdoc.fr/item/KYB_2000_36_6_a1/
%G en
%F KYB_2000_36_6_a1
Bordogna, Gloria; Pasi, Gabriella. Flexible representation and querying of heterogeneous structured documents. Kybernetika, Tome 36 (2000) no. 6, pp. 617-633. http://geodesic.mathdoc.fr/item/KYB_2000_36_6_a1/

[1] Bookstein A.: Fuzzy requests: an approach to weighted Boolean searches. J. Amer. Soc. Inform. Science 31 (1980), 240–247 | DOI

[2] Bordogna G., Pasi G.: A fuzzy linguistic approach generalizing Boolean IR: a model and its evaluation. J. Amer. Soc. Inform. Science 44 (1993), 2, 70–82 | DOI

[3] Bordogna G., Pasi G.: Controlling retrieval through a user adaptive representation of documents. Internat. J. Approx. Reason. 12 (1995), 317–339 | DOI | MR | Zbl

[4] Bordogna G., Pasi G.: Linguistic aggregation operators of selection criteria in fuzzy information retrieval. Internat. J. Intelligent Systems 10 (1995), 233–248 | DOI

[5] Chiaramella Y., Kheirbek A.: An integrated model for hypermedia and information retrieval. In: Information Retrieval and Hypertext (M. Agosti and A. Smeaton, eds.), 1996, pp. 136–176

[6] H. D. A. Buell D., Kraft: Threshold values and Boolean retrieval systems. Inform. Process. Management 17 (1981), 127–136 | DOI | Zbl

[7] al V. Christophides et: From structured documents to novel query facilities. In: Proc. ACM SIGMOD Internat. Conf. on Management of Data. ACM Press, Minneapolis 1994

[8] Florescu D., Manolescu I., Kossmann D.: Storing and querying XML data using an RDBMS. IEEE Data Engineering Bulletin 22 (1999), 3, 27–34

[9] Kim H., Cho S.: Structured storage and retrieval of SGML documents using GROVE. Inform. Process. Management 36 (2000), 643–657 | DOI

[10] Krovetz R., Croft W. B.: Lexical ambiguity and information retrieval. ACM Trans. Information System 10 (1992), 2, 115–141 | DOI

[11] Klir G. J., Folger T. A.: Fuzzy Sets, Uncertainty and Information. Prentice Hall PTR Englewood Cliffs, 1998 | MR | Zbl

[12] Kraft D. H., Bordogna G., Pasi G.: An extended fuzzy linguistic approach to generalize Boolean information retrieval. J. Inform. Sciences Appl. 2 (1995), 3, 119–134 | MR

[13] Lalmas M., Ruthven I.: Representing and retrieving structured documents using the Dempster–Shafer theory of evidence: Modelling and Evaluation. J. Documentation 54 (1998), 5, 529–565 | DOI

[14] Macleod I.: Storage and retrieval of structured documents. Inform. Process. Management 26 (1990), 2, 197–208 | DOI

[15] Molinari A., Pasi G.: A fuzzy representation of HTML documents for information retrieval systems: In: Proc. IEEE Internat. Conf. on Fuzzy Systems, New Orleans 1996

[16] Negoita C. V.: On the notion of relevance in information retrieval. Kybernetes 2 (1973), 3, 161–165 | DOI | Zbl

[17] Paice C. D.: Soft evaluation of Boolean search queries in information retrieval systems. Information Technology: Research Development Applications 3 (1984), 1, 33–41

[18] Papakonstantinou Y., Widom J., Molina H. G.: Object exchange and heterogeneous information sources. In: Proc. IEEE Internat. Conf. on Engineering, Birmingham 1996

[19] Paradis F., Berrut C.: Experiments with theme extraction in explanatory texts. In: Proc. II Internat. Conf. on Conceptions of Library and Information (CoLIB 2), Copenhagen 1996, pp. 13–16, 433–446

[20] Perez–Carballo J., Strzalkowski T.: Natural language information retrieval: Progress report. Inform. Process. Management 36 (2000), 155–178 | DOI

[21] al A. Rao et: Query Processing in TREC-6. Inform. Process. Management 36 (2000), 179–186 | DOI

[22] Sager N.: Natural Language Information Processing. Addison Wesley, 1981

[23] Salton G., Fox E., Wu H.: Extended Boolean information retrieval. Comm. ACM 26 (1983), 12, 1022–1036 | DOI | MR | Zbl

[24] Salton G., McGill M. J.: Introduction to modern information retrieval. McGraw–Hill Internat. Book Co., 1984 | Zbl

[25] Jones K. A. Sparck: Automatic Keyword Classification for Information Retrieval. Butterworths, London 1971

[26] Jones K. A. Sparck: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28 (1972), 1, 11–20 | DOI

[27] Rijsbergen C. J. van: Information Retrieval. Butterworths, London 1979

[28] Yager R. R.: On ordered weighted averaging aggregation operators in multi criteria decision making. IEEE Trans. Systems Man Cybernet. 18 (1988), 1, 183–190 | DOI | MR

[29] Yager R. R, (eds.) J. Kacprzyk: The Ordered Weighted Averaging Operators: Theory and Applications. Kluwer, Dordrecht 1997

[30] Zadeh L. A.: Fuzzy sets. Inform. and Control 8 (1965), 338–353 | DOI | MR | Zbl

[31] Zadeh L. A.: A computational approach to fuzzy quantifiers in natural languages. Comput. Math. Appl. 9 (1983), 149–184 | DOI | MR | Zbl