Comparative analysis of class imbalance reduction methods
News of the Kabardin-Balkar scientific center of RAS, Tome 27 (2025) no. 1, pp. 143-151.

Voir la notice de l'article provenant de la source Math-Net.Ru

The article discusses methods for improving quality metrics of machine learning models used in the financial sector. Due to the fact that the data sets on which the models are trained have class imbalances, it is proposed to use models aimed at reducing the imbalance. The study conducted experiments using 9 methods for accounting for class imbalances with three data sets on retail lending. The CatboostClassifier gradient boosting model, which does not take into account class imbalances, was used as the base model. The experiments showed that the use of the RandomOverSampler method provides a significant increase in classification quality metrics compared to the base model. The results indicate the promise of further research into methods for accounting for class imbalances in the study of financial data, as well as the feasibility of application of the considered methods in practice.
Keywords: financial risks, machine learning, class imbalance
Mots-clés : classification
@article{IZKAB_2025_27_1_a4,
     author = {L. P. Dyakonova and L. P. Dyakonova},
     title = {Comparative analysis of class imbalance reduction methods},
     journal = {News of the Kabardin-Balkar scientific center of RAS},
     pages = {143--151},
     publisher = {mathdoc},
     volume = {27},
     number = {1},
     year = {2025},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/IZKAB_2025_27_1_a4/}
}
TY  - JOUR
AU  - L. P. Dyakonova
AU  - L. P. Dyakonova
TI  - Comparative analysis of class imbalance reduction methods
JO  - News of the Kabardin-Balkar scientific center of RAS
PY  - 2025
SP  - 143
EP  - 151
VL  - 27
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IZKAB_2025_27_1_a4/
LA  - ru
ID  - IZKAB_2025_27_1_a4
ER  - 
%0 Journal Article
%A L. P. Dyakonova
%A L. P. Dyakonova
%T Comparative analysis of class imbalance reduction methods
%J News of the Kabardin-Balkar scientific center of RAS
%D 2025
%P 143-151
%V 27
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IZKAB_2025_27_1_a4/
%G ru
%F IZKAB_2025_27_1_a4
L. P. Dyakonova; L. P. Dyakonova. Comparative analysis of class imbalance reduction methods. News of the Kabardin-Balkar scientific center of RAS, Tome 27 (2025) no. 1, pp. 143-151. http://geodesic.mathdoc.fr/item/IZKAB_2025_27_1_a4/

[1] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique”, Journal of artificial intelligence research, 16 (2002), 321–357 | DOI

[2] H. He, Y. Bai, E. A. Garcia, S. Li, “Adasyn: adaptive synthetic sampling approach for imbalanced learning”, In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), 2008, 1322–1328 | DOI | MR

[3] H. Han, W. Y. Wang, B. H. Mao, “Borderline-smote: a new over-sampling method in imbalanced data sets learning”, International conference on intelligent computing, 2005, 878–887, Springer | DOI

[4] I. Tomek, Two modifications of cnn, 6 (1976), 769–772 | DOI | MR

[5] J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution”, In Conference on Artificial Intelligence in Medicine in Europe, 2001, 63–66, Springer | DOI

[6] G. Batista, R. C. Prati, M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data”, ACM Sigkdd Explorations Newsletter, 6:1 (2004), 20–29 | DOI

[7] G. Batista, B. Bazzan, M. Monard, “Balancing Training Data for Automated Annotation of Keywords: a Case Study”, In WOB, 2003, 10–18 (BibTeX key: conf/wob/BatistaBM03)