Indexing Temporal Information for Web Pages
Computer Science and Information Systems, Tome 8 (2011) no. 3.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Temporal information plays important roles in Web search, as Web pages intrinsically involve crawled time and most Web pages contain time keywords in their content. How to integrate temporal information in Web search engines has been a research focus in recent years, among which some key issues such as temporal-textual indexing and temporal information extraction have to be first studied. In this paper, we first present a framework of temporal-textual Web search engine. And then, we concentrate on designing a new hybrid index structure for temporal and textual information of Web pages. In particular, we propose to integrate B+-tree, inverted file and a typical temporal index called MAP21-Tree, to handle temporal-textual queries. We study five mechanisms to implement a hybrid index structure for temporal-textual queries, which use different ways to organize the inverted file, B+-tree and MAP-21 tree. After a theoretic analysis on the performance of those five index structures, we conduct experiments on both simulated and real data sets to make performance comparison. The experimental results show that among all the index schemes the first-inverted-file-then-MAP21-tree index structure has the best query performance and thus is an acceptable choice to be the temporal-textual index for future time-aware search engines.
Keywords: Web search, temporal-textual query, temporal information, index structure
@article{CSIS_2011_8_3_a10,
     author = {Peiquan Jin and Hong Chen and Xujian Zhao and Xiaowen Li and Lihua Yue},
     title = {Indexing {Temporal} {Information} for {Web} {Pages}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {8},
     number = {3},
     year = {2011},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2011_8_3_a10/}
}
TY  - JOUR
AU  - Peiquan Jin
AU  - Hong Chen
AU  - Xujian Zhao
AU  - Xiaowen Li
AU  - Lihua Yue
TI  - Indexing Temporal Information for Web Pages
JO  - Computer Science and Information Systems
PY  - 2011
VL  - 8
IS  - 3
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2011_8_3_a10/
ID  - CSIS_2011_8_3_a10
ER  - 
%0 Journal Article
%A Peiquan Jin
%A Hong Chen
%A Xujian Zhao
%A Xiaowen Li
%A Lihua Yue
%T Indexing Temporal Information for Web Pages
%J Computer Science and Information Systems
%D 2011
%V 8
%N 3
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2011_8_3_a10/
%F CSIS_2011_8_3_a10
Peiquan Jin; Hong Chen; Xujian Zhao; Xiaowen Li; Lihua Yue. Indexing Temporal Information for Web Pages. Computer Science and Information Systems, Tome 8 (2011) no. 3. http://geodesic.mathdoc.fr/item/CSIS_2011_8_3_a10/