Modeling of nearest neighbor graphs to estimate the probability of independence of data
Matematičeskoe modelirovanie, Tome 35 (2023) no. 7, pp. 63-82.

Voir la notice de l'article provenant de la source Math-Net.Ru

The proposed method is based on calculations of the statistics of the nearest neighbor graphs structures, which are presented as a benchmark of the probabilities of the distribution of graphs by the number of disconnected fragments. The deviation of the actually observed occurrence of connectivity from the calculated one will allow us to determine with what probability this sample can be considered a set of statistically independent variables. The statements about the independence of the nearest neighbor graph statistics from the distribution of distances and from the triangle inequality are proved, which allows numerical modeling of such structures. Estimates of the accuracy of the calculated statistics for graphs and their comparison with estimates obtained by modeling random coordinates of points in $d$-dimensional space are carried out. It is shown that the model of nearest neighbor graphs without taking into account the dimension of the space leads to fairly accurate estimates of the statistics of graph structures in spaces of dimension higher than five. For spaces of smaller dimension, the benchmark can be obtained by directly calculating the distances between points with random coordinates in a unit cube. The proposed method is applied to the problem of analyzing the level of unsteadiness of the earthquake catalog in the Kuril–Kamchatka region. The lengths of samples of time intervals between neighboring events are analyzed. It is shown that the analyzed system as a whole is interconnected with a probability of 0.91, and this dependence is fundamentally different from the lag correlation between the sample elements.
Keywords: nearest neighbor graph, distribution by number of fragments, connected graph.
@article{MM_2023_35_7_a4,
     author = {A. A. Kislitsyn},
     title = {Modeling of nearest neighbor graphs to estimate the probability of independence of data},
     journal = {Matemati\v{c}eskoe modelirovanie},
     pages = {63--82},
     publisher = {mathdoc},
     volume = {35},
     number = {7},
     year = {2023},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/MM_2023_35_7_a4/}
}
TY  - JOUR
AU  - A. A. Kislitsyn
TI  - Modeling of nearest neighbor graphs to estimate the probability of independence of data
JO  - Matematičeskoe modelirovanie
PY  - 2023
SP  - 63
EP  - 82
VL  - 35
IS  - 7
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/MM_2023_35_7_a4/
LA  - ru
ID  - MM_2023_35_7_a4
ER  - 
%0 Journal Article
%A A. A. Kislitsyn
%T Modeling of nearest neighbor graphs to estimate the probability of independence of data
%J Matematičeskoe modelirovanie
%D 2023
%P 63-82
%V 35
%N 7
%I mathdoc
%U http://geodesic.mathdoc.fr/item/MM_2023_35_7_a4/
%G ru
%F MM_2023_35_7_a4
A. A. Kislitsyn. Modeling of nearest neighbor graphs to estimate the probability of independence of data. Matematičeskoe modelirovanie, Tome 35 (2023) no. 7, pp. 63-82. http://geodesic.mathdoc.fr/item/MM_2023_35_7_a4/

[1] Kislitsyn A.A., “Investigation of Statistics of Nearest Neighbor Graphs”, Mathematical Models and Computer Simulations, 15:2 (2023), 235–244 | DOI | DOI | MR

[2] A. A. Kislitsyn, Yu. N. Orlov, “Discussion about Properties of First Neighbor Graphs”, Lobachevskii Journal of Mathematics, 43:12 (2022), 109–118 | DOI | MR

[3] E. Fix, J. Hodges, Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties, USAF School of Aviation Medicine, Texas, 1951

[4] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, 2001, 764 pp. | MR | Zbl

[5] G. Guo, H. Wang, D. Bell, “KNN Model based Approach in Classification”, Lecture Notes in Computer Science, 2888, 2003, 986–996 | DOI

[6] V. Vaidehi, S. Vasuhi, “Person Authentication using Face Recognition”, Proceedings of the world congress on eng. and computer science, 2008

[7] Z. Yong, “An Improved kNN Text Classification Algorithm based on Clustering”, Journal of Computers, 4:3 (2009)

[8] F. Bajramovie et. al., “A Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition”, ACIVS 2006, LNCS, 4179, 1186–1197

[9] T. Bailey, A. K. Jain, “A note on Distance weighted k-nearest neighbor rules”, IEEE Trans. Systems, Man Cybernatics, 8 (1978), 311–313 | DOI | MR | Zbl

[10] V. F. Kolchin, Sluchainye grafy, Fizmatlit, M., 2004, 256 pp.

[11] M. G. Kenui, Bystrye statisticheskie vychisleniia. Uproshchennye metody otsenivaniia i proverki, Statistika, M., 1979, 69 pp.

[12] H. B. Mann, D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than other”, Annals of Mathematical Statistics, 18 (1947), 50–60 | DOI | MR | Zbl

[13] R. A. Fisher, F. Yates, Statistical tables for biological, agricultural and medical research, Oliver and Boyd, Edinburg–London, 1946 | MR

[14] L. N. Bolshev, N. V. Smirnov, Tablitsy matematicheskoi statistiki, Nauka, M., 1965 | MR

[15] F. G. Foster, A. Stuart, “Distribution-free tests in timeseries dated on the breaking of records”, JRSS, B16:1 (1954), 1–22 | MR | Zbl

[16] A. A. Kislitsyn, A. B. Kozlova, M. B. Korsakova, Y. N. Orlov, “Disorder Indicator for Nonstationary Stochastic Processes”, Doklady Mathematics, 99:1 (2019), 57–59 | DOI | DOI | MR | Zbl

[17] Global Centroid Moment Tensor Catalogm GCMT catalog, https://www.globalcmt.org/CMTsearch.html

[18] Musin O.R., “The problem of the twenty-five spheres”, Russian Mathematical Surveys, 58:4 (2003), 794–795 | DOI | DOI | MR | Zbl

[19] V. S. Koroliuk, N. I. Portenko, A. V. Skorokhod, A. F. Turbin, Spravochnik po teorii veroiatnostei i matematicheskoi statistike, Nauka, M., 1985, 640 pp. | MR