Correcting the Hub Occurrence Prediction Bias in Many Dimensions
Computer Science and Information Systems, Tome 13 (2016) no. 1.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets.
Keywords: instance selection, data reduction, classification, bias, k-nearest neighbor, hubness, curse of dimensionality
@article{CSIS_2016_13_1_a1,
     author = {Nenad Toma\v{s}ev and Krisztian Buza and Dunja Mladeni\'c},
     title = {Correcting the {Hub} {Occurrence} {Prediction} {Bias} in {Many} {Dimensions}},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {13},
     number = {1},
     year = {2016},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a1/}
}
TY  - JOUR
AU  - Nenad Tomašev
AU  - Krisztian Buza
AU  - Dunja Mladenić
TI  - Correcting the Hub Occurrence Prediction Bias in Many Dimensions
JO  - Computer Science and Information Systems
PY  - 2016
VL  - 13
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a1/
ID  - CSIS_2016_13_1_a1
ER  - 
%0 Journal Article
%A Nenad Tomašev
%A Krisztian Buza
%A Dunja Mladenić
%T Correcting the Hub Occurrence Prediction Bias in Many Dimensions
%J Computer Science and Information Systems
%D 2016
%V 13
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a1/
%F CSIS_2016_13_1_a1
Nenad Tomašev; Krisztian Buza; Dunja Mladenić. Correcting the Hub Occurrence Prediction Bias in Many Dimensions. Computer Science and Information Systems, Tome 13 (2016) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2016_13_1_a1/