An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment
Computer Science and Information Systems, Tome 16 (2019) no. 1.

Voir la notice de l'article provenant de la source Computer Science and Information Systems website

In this paper we investigate the role of sample size and class distribution in credit risk assessments, focusing on real life imbalanced data sets. Choosing the optimal sample is of utmost importance for the quality of predictive models and has become an increasingly important topic with the recent advances in automating lending decision processes and the ever growing richness in data collected by financial institutions. To address the observed research gap, a large-scale experimental evaluation of real-life data sets of different characteristics was performed, using several classification algorithms and performance measures. Results indicate that various factors play a role in determining the optimal class distribution, namely the performance measure, classification algorithm and data set characteristics. The study also provides valuable insight on how to design the training sample to maximize prediction performance and the suitability of using different classification algorithms by assessing their sensitivity to class imbalance and sample size.
Keywords: credit risk assessment, imbalanced data sets, class distribution, classification algorithms, sample size, undersampling
@article{CSIS_2019_16_1_a8,
     author = {Kristina Andri\'c and Damir Kalpi\'c and Zoran Boha\v{c}ek},
     title = {An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment},
     journal = {Computer Science and Information Systems},
     publisher = {mathdoc},
     volume = {16},
     number = {1},
     year = {2019},
     url = {http://geodesic.mathdoc.fr/item/CSIS_2019_16_1_a8/}
}
TY  - JOUR
AU  - Kristina Andrić
AU  - Damir Kalpić
AU  - Zoran Bohaček
TI  - An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment
JO  - Computer Science and Information Systems
PY  - 2019
VL  - 16
IS  - 1
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/CSIS_2019_16_1_a8/
ID  - CSIS_2019_16_1_a8
ER  - 
%0 Journal Article
%A Kristina Andrić
%A Damir Kalpić
%A Zoran Bohaček
%T An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment
%J Computer Science and Information Systems
%D 2019
%V 16
%N 1
%I mathdoc
%U http://geodesic.mathdoc.fr/item/CSIS_2019_16_1_a8/
%F CSIS_2019_16_1_a8
Kristina Andrić; Damir Kalpić; Zoran Bohaček. An insight into the effects of class imbalance and sampling on classification accuracy in credit risk assessment. Computer Science and Information Systems, Tome 16 (2019) no. 1. http://geodesic.mathdoc.fr/item/CSIS_2019_16_1_a8/