Building a machine learning model
News of the Kabardin-Balkar scientific center of RAS, Tome 27 (2025) no. 2, pp. 11-22.

Voir la notice de l'article provenant de la source Math-Net.Ru

The article presents development of a machine learning model for predicting fraudulent transactions using transactional data from a bank. It discusses the features of encoding categorical variables related to the presence of time in the transactional data to avoid information leakage. Additionally, experiments were conducted on the application of bagging and the creation of additional variables based on their contribution to the final prediction using Shapley values. The quality metrics of the machine learning model are examined and analyzed.
Mots-clés : fraudulent transactions
Keywords: catboost, encoding categorical variables, catboost_encoder, target_encoder, bagging, variables creation, Shapley values
@article{IZKAB_2025_27_2_a0,
     author = {A. F. Konstantinov and L. P. Dyakonova},
     title = {Building a machine learning model},
     journal = {News of the Kabardin-Balkar scientific center of RAS},
     pages = {11--22},
     publisher = {mathdoc},
     volume = {27},
     number = {2},
     year = {2025},
     language = {ru},
     url = {http://geodesic.mathdoc.fr/item/IZKAB_2025_27_2_a0/}
}
TY  - JOUR
AU  - A. F. Konstantinov
AU  - L. P. Dyakonova
TI  - Building a machine learning model
JO  - News of the Kabardin-Balkar scientific center of RAS
PY  - 2025
SP  - 11
EP  - 22
VL  - 27
IS  - 2
PB  - mathdoc
UR  - http://geodesic.mathdoc.fr/item/IZKAB_2025_27_2_a0/
LA  - ru
ID  - IZKAB_2025_27_2_a0
ER  - 
%0 Journal Article
%A A. F. Konstantinov
%A L. P. Dyakonova
%T Building a machine learning model
%J News of the Kabardin-Balkar scientific center of RAS
%D 2025
%P 11-22
%V 27
%N 2
%I mathdoc
%U http://geodesic.mathdoc.fr/item/IZKAB_2025_27_2_a0/
%G ru
%F IZKAB_2025_27_2_a0
A. F. Konstantinov; L. P. Dyakonova. Building a machine learning model. News of the Kabardin-Balkar scientific center of RAS, Tome 27 (2025) no. 2, pp. 11-22. http://geodesic.mathdoc.fr/item/IZKAB_2025_27_2_a0/

[1] A. Mashrur, W. Luo, N. A. Zaidi, A. Robles-Kelly, “Machine Learning for Financial Risk Management: A Survey”, IEEE Access, 8 (2020), 203203–203223 | DOI

[2] T. Awosika, R. M. Shukla, B. Pranggono, “Transparency and Privacy: The Role of Explainable AI and Federated Learning in Financial Fraud Detection”, IEEE Access, 12 (2024), 64551–64560 | DOI

[3] B. McMahan, E. Moore, D. Ramage et al., “Communication-efficient learning of deep networks from decentralized data”, Proceedings of the 20 th International Conference on Artificial Intelligence and Statistics, PMLR, 54, 2017, 1273–1282 | DOI

[4] A. A. Ali, A. M. Khedr, M. El-Bannany, S. Kanakkayil, “A Powerful Predicting Model for Financial Statement Fraud Based on Optimized XGBoost Ensemble Learning Technique”, Applied Sciences, 13:4 (2023), 2272 | DOI

[5] K. He, Q. Yang, L. Ji et al., “Financial Time Series Forecasting with the Deep Learning Ensemble Model”, Mathematics, 11:4 (2023), 1054 | DOI

[6] L. Prokhorenkova, G. Gusev, A. Vorobev et al., “CatBoost: unbiased boosting with categorical features”, NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 6639–6649 | DOI

[7] D. Micci-Barreca, “A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems”, ACM SIGKDD Explorations Newsletter, 3:1, 27–32 | DOI

[8] A. V. Dorogush, V. Ershov, A. Gulin, “CatBoost: gradient boosting with categorical features support”, Workshop on ML Systems at NIPS, 2017 | DOI

[9] L. Breiman, “Bagging predictors”, Machine Learning, 24:2 (1996), 123–140 | DOI

[10] Official website Catboost. Common parameters (Tochka dostupa: data obrascheniya: 10 yanvarya 2025) https://catboost.ai/en/docs

[11] L. Shapley, Notes on the n-person game, v. II, The value of an n-person game, 1951

[12] Official website SHAP library. (data obrascheniya: 10 yanvarya 2025) https://shap.readthedocs.io/en/latest/example

[13] W. Brier Glenn, “Verification of forecasts expressed in terms of probability”, Monthly Weather Review, 78:1 (1950), 1-3 | 2.0.CO class='badge bg-secondary rounded-pill ref-badge extid-badge'>DOI

[14] T. Akiba, S. Sano, T. Yanase et al., “Optuna: A Next-generation Hyperparameter Optimization Framework”, KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, 2623–2631 | DOI