Combining Offline and On-the-fly Disambiguation to perform Semantic-aware XML Querying
Computer Science and Information Systems, Tome 20 (2023) no. 1.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Many efforts have been deployed by the IR community to extend freetext query processing toward semi-structured XML search. Most methods rely on the concept of Lowest Comment Ancestor (LCA) between two or multiple structural nodes to identify the most specific XML elements containing query keywords posted by the user. Yet, few of the existing approaches consider XML semantics, and the methods that process semantics generally rely on computationally expensive word sense disambiguation (WSD) techniques, or apply semantic analysis in one stage only: performing query relaxation/refinement over the bag of words retrieval model, to reduce processing time. In this paper, we describe a new approach for XML keyword search aiming to solve the limitations mentioned above. Our solution first transforms the XML document collection (offline) and the keyword query (on-the-fly) into meaningful semantic representations using context-based and global disambiguation methods, specially designed to allow almost linear computation efficiency. We use a semantic-aware inverted index to allow semantic-aware search, result selection, and result ranking functionality. The semantically augmented XML data tree is processed for structural node clustering, based on semantic query concepts (i.e., key-concepts), in order to identify and rank candidate answer sub-trees containing related occurrences of query key-concepts. Dedicated weighting functions and various search algorithms have been developed for that purpose and will be presented here. Experimental results highlight the quality and potential of our approach.
Keywords: Semi-structured data, XML, Semantic Disambiguation, Keyword Search, Query Processing
@article{CSIS_2023_20_1_a23,
     author = {Joe Tekli and Gilbert Tekli and Richard Chbeir},
     title = {Combining {Offline} and {On-the-fly} {Disambiguation} to perform {Semantic-aware} {XML} {Querying}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {20},
     number = {1},
     year = {2023},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2023_20_1_a23/}
}
TY  - JOUR
AU  - Joe Tekli
AU  - Gilbert Tekli
AU  - Richard Chbeir
TI  - Combining Offline and On-the-fly Disambiguation to perform Semantic-aware XML Querying
JO  - Computer Science and Information Systems
PY  - 2023
VL  - 20
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2023_20_1_a23/
ID  - CSIS_2023_20_1_a23
ER  - 
%0 Journal Article
%A Joe Tekli
%A Gilbert Tekli
%A Richard Chbeir
%T Combining Offline and On-the-fly Disambiguation to perform Semantic-aware XML Querying
%J Computer Science and Information Systems
%D 2023
%V 20
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2023_20_1_a23/
%F CSIS_2023_20_1_a23
Joe Tekli; Gilbert Tekli; Richard Chbeir. Combining Offline and On-the-fly Disambiguation to perform Semantic-aware XML Querying. Computer Science and Information Systems, Tome 20 (2023) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2023_20_1_a23/