A quality index for detection of atypical elements (outliers)

Kulczycki, Piotr; Franus, Krystian; Charytanowicz, Małgorzata

Kulczycki, Piotr ; Franus, Krystian ; Charytanowicz, Małgorzata

International Journal of Applied Mathematics and Computer Science, Tome 34 (2024) no. 3, pp. 439-451

Cet article a éte moissonné depuis la source Library of Science

Voir la notice de l'article

Résumé

Besides clustering and classification, detection of atypical elements (outliers, rare elements) is one of the most fundamental problems in contemporary data analysis. However, contrary to clustering and classification, an atypical element detection task does not possess any natural quality (performance) index. The subject of the research presented here is the creation of one. It will enable not only evaluation of the results of a procedure for atypical element detection, but also optimization of its parameters or other quantities. The investigated quality index works particularly well with frequency types of such procedures, especially in the presence of substantial noise. Using a nonparametric approach in the design of this index practically frees the proposed method from the distribution in the dataset under examination. It may also be successfully applied to multimodal and multidimensional cases.

Keywords: data analysis, atypical element, rare elements, quality index
Mots-clés : analiza danych, element rzadki, wskaźnik jakości

@article{IJAMCS_2024_34_3_a7,
     author = {Kulczycki, Piotr and Franus, Krystian and Charytanowicz, Ma{\l}gorzata},
     title = {A quality index for detection of atypical elements (outliers)},
     journal = {International Journal of Applied Mathematics and Computer Science},
     pages = {439--451},
     year = {2024},
     volume = {34},
     number = {3},
     language = {en},
     url = {http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_3_a7/}
}

TY  - JOUR
AU  - Kulczycki, Piotr
AU  - Franus, Krystian
AU  - Charytanowicz, Małgorzata
TI  - A quality index for detection of atypical elements (outliers)
JO  - International Journal of Applied Mathematics and Computer Science
PY  - 2024
SP  - 439
EP  - 451
VL  - 34
IS  - 3
UR  - http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_3_a7/
LA  - en
ID  - IJAMCS_2024_34_3_a7
ER  -

%0 Journal Article
%A Kulczycki, Piotr
%A Franus, Krystian
%A Charytanowicz, Małgorzata
%T A quality index for detection of atypical elements (outliers)
%J International Journal of Applied Mathematics and Computer Science
%D 2024
%P 439-451
%V 34
%N 3
%U http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_3_a7/
%G en
%F IJAMCS_2024_34_3_a7

Kulczycki, Piotr; Franus, Krystian; Charytanowicz, Małgorzata. A quality index for detection of atypical elements (outliers). International Journal of Applied Mathematics and Computer Science, Tome 34 (2024) no. 3, pp. 439-451. http://geodesic.mathdoc.fr/item/IJAMCS_2024_34_3_a7/

Bibliographie
Cité par

[1] Aggarwal, C.C. (2013). Outlier Analysis, Springer, Cham.

[2] Agresti, A. (2002). Categorical Data Analysis,Wiley, Hoboken.

[3] Baszczyńska, A. (2016). Smoothing Parameter of the Density Functions for Random Variables in Economic Research, Lodz University Press, Łódź, (in Polish).

[4] Batool, F. and Hennig, C. (2021). Clustering with the average silhouette width, Computational Statistics and Data Analysis 158(6): 107190.

[5] Cateni, S., Colla, V. and Vannucci, M. (2008). Outlier detection methods for industrial applications, in J. Aramburo and A.R. Trevino (Eds), Advances in Robotics, Automation and Control, I-Tech, Vienna, pp. 265-282.

[6] Caltech (2024). NASA Exoplanet Archive, https://exoplanetarchive.ipac.caltech.edu/.

[7] Chacon, J.E. and Duong, T. (2020). Multivariate Kernel Smoothing and Its Applications, Chapman and Hall/CRC, Boca Raton.

[8] Charytanowicz, M., Kulczycki, P., Kowalski, P.A., Lukasik, S. and Czabak-Garbacz, R. (2018). An evaluation of utilizing geometric features for wheat grain classification using x-ray images, Computers and Electronics in Agriculture 144(1): 260-268.

[9] Charytanowicz, M., Perzanowski, K., Januszczak, M., Wołoszyn-Gałęza, A. and Kulczycki, P. (2020). Application of complete gradient clustering algorithm for analysis of wildlife spatial distribution, Ecological Indicators 113(6): 106216.

[10] Czmil, S., Kluska, J. and Czmil, A. (2024). An empirical study of a simple incremental classifier based on vector quantizzation and adaptive resonance theory, International Journal of Applied Mathematics and Computer Science 34(1): 149-165, DOI: 10.61822/amcs-2024-0011.

[11] Dalianis, H. (2018). Clinical Text Mining, Springer, Cham.

[12] Hodge, V. (2011). Outlier and Anomaly Detection: A Survey of Outlier and Anomaly Detection Methods, Lambert Academic Publishing, Saarbrucken.

[13] James, G., Witten, D., Hastie, T., Tibshirani, R. and Taylor, J. (2023). An Introduction to Statistical Learning, Springer, Cham.

[14] Kacprzyk, J. and Pedrycz, W. (2015). Springer Handbook of Computational Intelligence, Springer, Berlin.

[15] Kaggle (2024). Suicide rates overview 1985 to 2016, Dataset, http://www.kaggle.com/datasets/russellyates88/suicide-rates-overview-1985-to-2016.

[16] Kłopotek, R., Kłopotek, M. and Wierzchoń, S. (2020). A feasible k-means kernel trick under non-Euclidean feature space, International Journal of Applied Mathematics and Computer Science 30(4): 703-715, DOI: 10.34768/amcs-2020-0052.

[17] Knuth, D.E. (1988). Art of Computer Programming. Vol. 3: Sorting and Searching, Addison-Wesley, Upper Saddle River.

[18] Kulczycki, P. (2005). Kernel Estimators in Systems Analysis, Scientific and Engineering Publishers, Warsaw, (in Polish).

[19] Kulczycki, P. (2020). Methodically unified procedures for outlier detection, clustering and classification, in K. Arai (Ed.), Proceedings of the Future Technologies Conference (FTC), Springer, Cham, pp. 460-474.

[20] Kulczycki, P. and Franus, K. (2021). Methodically unified procedures for a conditional approach to outlier detection, clustering, and classification, Information Sciences 560: 504-527.

[21] Kulczycki, P. and Kruszewski, D. (2017). Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations, Applied Soft Computing 60(11): 623-633.

[22] Kulczycki, P. and Kruszewski, D. (2019). Detection of rare elements in investigation of medical problems, in N.T. Nguen et al., (Eds), Intelligent Information and Database Systems, Springer, Singapore, pp. 257-268.

[23] Lehmann, E.L. and Casella, G. (2011). Theory of Point Estimation, Springer, New York.

[24] Nisbet, R., Miner, G. and Yale, K. (2009). Handbook of Statistical Analysis and Data Mining Applications, Elsevier, London.

[25] Ott, R.L. and Longnecker, M.T. (2015). An Introduction to Statistical Methods and Data Analysis, Cengage, Boston.

[26] Pedrycz, W. and Chen, S.-M. (2017). Data Science and Big Data: An Environment of Computational Intelligence, Springer, Cham.

[27] Rajagopalan, B. and Lall, U. (1995). A kernel estimator for discrete distributions, Journal of Nonparametric Statistics 4(1): 409-426.

[28] Ranga Suri, N.N.R., Narasimha-Murty, M. and Athithan, G. (2019). Outlier Detection: Techniques and Applications, Springer, Cham.

[29] scikit-learn (2004). make_circles, Dataset, https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html.

[30] Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall, London.

[31] Sorzano, C., Vargas, J. and Pascual-Montano, A. (2014). A survey of dimensionality reduction techniques, arXiv: 1403.2877v1.

[32] Wand, M.P. and Jones, M.C. (1995). Kernel Smoothing, Chapman and Hall, New York.

[33] Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference, Springer, New York.

[34] Yang, J., Tan, X. and Rahardja, S. (2023). Outlier detection: How to select k for k-nearest-neighbors-based outlier detectors, Pattern Recognition Letter 174: 112-117.

Parcourir par

Geodesic

Parcourir par