Machine Learning Approach for Credit Score Predictions

  • Tsholofelo Mokheleli University of Johannesburg, South Africa
  • Tinofirei Museba University of Johannesburg, South Africa
Keywords: Credit Score, Machine learning, Class Imbalance, SMOTE, Ensemble, XGBoost


This paper addresses the problem of managing the significant rise in requests for credit products that banking and financial institutions face. The aim is to propose an adaptive, dynamic heterogeneous ensemble credit model that integrates the XGBoost and Support Vector Machine models to improve the accuracy and reliability of risk assessment credit scoring models. The method employs machine learning techniques to recognise patterns and trends from past data to anticipate future occurrences. The proposed approach is compared with existing credit score models to validate its efficacy using five popular evaluation metrics, Accuracy, ROC AUC, Precision, Recall and F1_Score. The paper highlights credit scoring models’ challenges, such as class imbalance, verification latency and concept drift. The results show that the proposed approach outperforms the existing models regarding the evaluation metrics, achieving a balance between predictive accuracy and computational cost. The conclusion emphasises the significance of the proposed approach for the banking and financial sector in developing robust and reliable credit scoring models to evaluate the creditworthiness of their clients.


Download data is not yet available.


W. Frame, A. Srinivasan and L. Woosley, “The effect of credit scoring on small-business lending,” Journal of Money, Credit and Banking, vol. 33, no. 3, pp. 813-825, 2001.

T. Tang, “Information asymmetry and firms' credit market access: Evidence from Moody's credit rating format refinement,” Journal of Financial Economics, vol. 93, no. 2, pp. 325-351, 2009.

J. Crook, D. Edelman and L. Thomas, “Recent developments in consumer credit risk assessment,” European Journal of Operational Research, vol. 183, no. 3, pp. 1447-1465, 2007.

A. Blöchlinger and M. Leippold, “Economic benefit of powerful credit scoring,” Journal of Banking and Finance, vol. 30, no. 3, pp. 851-873, 2006.

N. Chen, B. Ribeiro and A. Chen, “Financial credit risk assessment: a recent review,” Artificial Intelligence Review, vol. 45, no. 1, pp. 1-23, 2016.

A. El-Qadi, M. Trocan, T. Frossard and N. Díaz-Rodríguez, “Credit Risk Scoring Forecasting Using a Time Series Approach,” in MaxEnt 2022, Basel Switzerland.

A. El Qadi, M. Trocan, N. Díaz-Rodríguez and T. Frossard, “Feature contribution alignment with expert knowledge for artificial intelligence credit scoring,” Signal, Image and Video Processing, vol. 17, no. 2, pp. 427-434, 2023.

A. Aida, S. M. Shamsuddin and A. L. Ralescu, “Classification with class imbalance problem: a review,” International Journal of Advances in Soft Computing and its Applications, vol. 5, no. 3, 2015.

R. Adhao and V. Pachghare, “Feature selection using principal component analysis and genetic algorithm,” Journal of Discrete Mathematical Sciences and Cryptography, vol. 23, no. 2, pp. 595-602, 2020.

A. Asuncion and D. Newman, “UCI Machine Learning Repository,” 2007.

Z. Runchi, X. Liguo and W. Qin, “An ensemble credit scoring model based on logistic regression with heterogeneous balancing and weighting effects,” Expert Systems with Applications, vol. 212, 2023.

J. Mushava and M. Murray, “A novel XGBoost extension for credit scoring class-imbalanced data combining a generalized extreme value link and a modified focal loss function,” Expert Systems with Applications, vol. 202, 2022.

J. Mushava and M. Murray, “An experimental comparison of classification techniques in debt recoveries scoring: Evidence from South Africa's unsecured lending market,” Expert Systems with Applications, vol. 111, pp. 35-50, 2018.

Y. Wu, W. Huang, Y. Tian, Q. Zhu and L. Yu, “An uncertainty-oriented cost-sensitive credit scoring framework with multi-objective feature selection,” Electronic Commerce Research and Applications, vol. 53, 2022.

H. He, W. Zhang and S. Zhang, “A novel ensemble method for credit scoring: Adaption of different imbalance ratios,” Expert Systems with Applications, vol. 98, pp. 105-117, 2018.

W. Liu, H. Fan and M. Xia, “Credit scoring based on tree-enhanced gradient boosting decision trees,” Expert Systems with Applications, vol. 189, 2022.

Y. Xia, C. Liu, B. Da and F. Xie, “A novel heterogeneous ensemble credit scoring model based on bstacking approach,” Expert Systems with Applications, vol. 93, pp. 182-199, 2018.

R. M. Cruz, R. Sabourin and G. D. Cavalcanti, “META-DES.Oracle: Meta-learning and feature selection for dynamic ensemble selection,” Information Fusion, vol. 38, pp. 84-103, 2017.

L. Yang, “Classifiers selection for ensemble learning based on accuracy and diversity,” in Procedia Engineering, 2011.

G. U. Yule, “On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c,” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, pp. 257-319, 1900.

L. L. Minku and X. Yao, “DDD: A New Ensemble Approach for Dealing with Concept Drift,” IEEE Transactions on Knowledge and Data Engineering, vol. 24(4), pp. 619-633, 2012.

T. Chen and C. Guestrin, “XGBoost,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016.

J. Kennedy and R. Eberhart, “Particle swarm optimization,” Proceedings of ICNN'95 - International Conference on Neural Networks, pp. 1942-1948, 1995.

F. van den Bergh and A. Engelbrecht, “A new locally convergent particle swarm optimiser,” IEEE International Conference on Systems, Man and Cybernetics, vol. 6, 2002.

N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

M. Mercier, M. S. Santos, P. H. Abreu, C. Soares, J. P. Soares and J. Santos, “Analysing the Footprint of Classifiers in Overlapped and Imbalanced Contexts,” pp. 200-212, 2018.

R. Wang and G. Liu, “Ensemble Method for Credit Card Fraud Detection,” International Conference on Intelligent Autonomous Systems (ICoIAS), pp. 246-252, 2021.

Y. Xia, C. Liu, Y. Li and N. Liu, “A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring,” Expert Systems with Applications, vol. 78, pp. 225-241, 2017.

C. Sano, “Japanese Credit Screening Data Set”.

“Statlog (German Credit Data) Dataset,” UCI: Machine Learning Repository, 2023.

Y. Xia, L. He, Y. Li, N. Liu and Y. Ding, “Predicting loan default in peer-to-peer lending using narrative data,” Journal of Forecasting, vol. 39(2), pp. 250-280, 2020.

J. Xiao, X. Zhou, Y. Zhong, L. Xie, X. Gu and D. Liu, “Cost-sensitive semi-supervised selective ensemble model for customer credit scoring,” Knowledge-Based Systems, vol. 189, 2020.

X. Chen, S. Li, X. Xu, F. Meng and W. Cao, “A Novel GSCI-Based Ensemble Approach for Credit Scoring,” IEEE Access, vol. 8, 2020.

C. Qin, Y. Zhang, F. Bao, C. Zhang, P. Liu and P. Liu, “XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring,” Mathematical Problems in Engineering, vol. 2021, pp. 1-18, 2021.

S. Lessmann, B. Baesens, H.-V. Seow and L. C. Thomas, “Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research,” European Journal of Operational Research, pp. 124-136, 2015.

J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of Machine Learning Research, vol. 7, pp. 1-30, 1 December 2006.

Abstract views: 68 times
Download PDF: 58 times
How to Cite
Mokheleli, T., & Museba, T. (2023). Machine Learning Approach for Credit Score Predictions. Journal of Information Systems and Informatics, 5(2), 497-517.