Setting lower bounds on Jensen--Shannon divergence and its application to nearest neighbor document search
Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 14 (2018) no. 4, pp. 334-345
Voir la notice de l'article provenant de la source Math-Net.Ru
The Jensen–Shannon divergence provides a mechanism to determine nearest neighbours in a document collection to a specific query document. This is an effective mechanism however for exhaustive search this can be a time-consuming process. In this paper, we show by setting lower bounds on the Jensen–Shannon divergence search we can reduce by up to a factor of 60% the level of calculation for exhaustive search and 98% for approximate search, based on the nearest neighbour search in a real-world document collection. In these experiments a document corpus that contains 1 854 654 articles published in New York Times from 1987-01-01 till 2007-06-19 (The New York Times Annotated Corpus) was used. As queries, 100 documents from same document corpus were selected randomly. We assess the effect on performance based on the reduction in the number of log function calculations. Approximate nearest neighbour search is based on clustering of documents using Contextual Document Clustering algorithm. We perform an approximated nearest neighbour search by finding the best matching set of cluster attractors to a query and limiting the search for documents to the attractors' corresponding clusters.
Keywords:
nearest neighbors search, dimensionality reduction.
Mots-clés : Jensen–Shannon divergence
Mots-clés : Jensen–Shannon divergence
@article{VSPUI_2018_14_4_a5,
author = {V. Yu. Dobrynin and N. Rooney and J. A. Serdyuk},
title = {Setting lower bounds on {Jensen--Shannon} divergence and its application to nearest neighbor document search},
journal = {Vestnik Sankt-Peterburgskogo universiteta. Prikladna\^a matematika, informatika, processy upravleni\^a},
pages = {334--345},
publisher = {mathdoc},
volume = {14},
number = {4},
year = {2018},
language = {en},
url = {http://geodesic.mathdoc.fr/item/VSPUI_2018_14_4_a5/}
}
TY - JOUR AU - V. Yu. Dobrynin AU - N. Rooney AU - J. A. Serdyuk TI - Setting lower bounds on Jensen--Shannon divergence and its application to nearest neighbor document search JO - Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ PY - 2018 SP - 334 EP - 345 VL - 14 IS - 4 PB - mathdoc UR - http://geodesic.mathdoc.fr/item/VSPUI_2018_14_4_a5/ LA - en ID - VSPUI_2018_14_4_a5 ER -
%0 Journal Article %A V. Yu. Dobrynin %A N. Rooney %A J. A. Serdyuk %T Setting lower bounds on Jensen--Shannon divergence and its application to nearest neighbor document search %J Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ %D 2018 %P 334-345 %V 14 %N 4 %I mathdoc %U http://geodesic.mathdoc.fr/item/VSPUI_2018_14_4_a5/ %G en %F VSPUI_2018_14_4_a5
V. Yu. Dobrynin; N. Rooney; J. A. Serdyuk. Setting lower bounds on Jensen--Shannon divergence and its application to nearest neighbor document search. Vestnik Sankt-Peterburgskogo universiteta. Prikladnaâ matematika, informatika, processy upravleniâ, Tome 14 (2018) no. 4, pp. 334-345. http://geodesic.mathdoc.fr/item/VSPUI_2018_14_4_a5/