Predicting Corporate Loan Defaults Using Deep Learning Algorithms and a Comparative Analysis with Linear Models: A Case Study of a Major Commercial Bank

Document Type : Original Article

Authors

1 PhD. Candidate, Department of Financial Management, Faculty of Financial Engineering, Kish campus, University of Tehran, Kish, Iran.

2 Prof., Department of Financial and Insurance Management, Faculty of Management and Accounting, College of Management, University of Tehran, Tehran, Iran

3 Assistant Prof., Department of Finance, Faculty of Management, Shahrood University, Shahrood, Iran.

10.30699/ijf.2025.444059.1460
Abstract
In today's complex economic landscape, accurately predicting events such as customer loan defaults presents a significant challenge for financial institutions. Traditional methods have shown limitations in accuracy, prompting the adoption of data-driven machine learning techniques for enhanced predictive capabilities. This study investigates the efficacy of novel machine-learning algorithms compared with linear models for predicting loan defaults at a major commercial bank. Data from over six thousand customer loan files spanning 2019 to 2022 were collected, cleaned, and clustered based on key loan indicators. The accuracy of predicting loan defaults was first evaluated using popular machine learning classification models, including LightGBM, XGBoost, Multilayer Perceptron, and Logistic Regression, and XGBoost performed best. After that, prediction accuracy was evaluated using various time-series machine learning algorithms, with a particular focus on a combined Gradient Boosting and Long Short-Term Memory (LSTM) approach. Results indicate that the combined algorithm outperforms traditional linear models, showing a substantial 40% improvement over the ARIMA algorithm in predicting loan default behavior. This study underscores the potential of advanced machine learning techniques to enhance predictive accuracy in the banking sector, offering valuable insights for risk assessment and financial decision-making.

Keywords


Abdoli, G. M. (2020). Comparing the Prediction Accuracy of LSTM and ARIMA Models for Time-Series with Permanent Fluctuation. Periódico do Núcleo de Estudos e Pesquisas sobre Gênero e DireitovCentro de Ciências Jurídicas - Universidade Federal da Paraíba, 9(02), 314-339.
Alborzi, M., Mohammad Pourzandi, M. E., & Khan Babaei, M. (2013). Application of Genetic Algorithm in Decision Tree Optimization for Customer Credit Scoring in Banks. Journal of Information Technology Management, 23(1), 23–38.
Altman, E. I. (1968). Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. Journal of Finance, 23(4), 589–609.
Ameri Sayavashi, S., Kurdlooei, H., & Abdollahi Keyvani, S. M. (2021). Estimating the Probability of Default of Loans Granted to Legal Entities in Non-Banking Deposit Institutions. Financial Knowledge Securities Analysis Journal, 11(1), 17–32.
Bhat, M. T. (2020). Scrutinize The Effectiveness of Loan Portfolio Management: Challenges and Remedial. Studies in Indian Place Names(UGC Care Listed Journal) ISSN:2394-3114, 40(59) , 303 – 314
Boukherouaa, E., Shabsigh, G., AlAjmi, K., Deodoro, J., Farias, A., Iskender, E., & Ravikumar, R. (2021). Powering the Digital Economy: Opportunities and Risks of Artificial Intelligence in Finance. International Monetary Fund, 34(2), 34.
Branzoli, N., & Fringuellotti, F. (2022, February 21). The Effect of Bank Monitoring on Loan Repayment. FRB of New York Staff Report No. 923.
Butaru, F. C. (2016). Risk and risk management in the credit card industry. Journal of Banking & Finance, 11(3), 218-239.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).
Chi, Q., & Li, W. (2017). Economic Policy Uncertainty, Credit Risks, and Banks' Lending Decisions: Evidence from Chinese Commercial Banks. China Journal of Accounting Research, 10(1), 33-50.
Deng, T. (2019). Study of the Prediction of Micro-Loan Default Based on Logit Model. In International Conference on Economic Management and Model Engineering (ICEMME) (pp. 260-264). Malacca, Malaysia.
Dong, X. (2022). Loan Default Prediction based on Machine Learning (LightGBM). BCP Business & Management, 457–468.
Douzas, G., Bacao, F., & Last, F. (2018). Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 1-20.
Emekter, R., Tu, Y., & Jirasakuldech, B. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 12(5), 54–70.
Financial Stability Board (FSB) Annual Financial Report. (2021). Retrieved from https://www.fsb.org/wp-content/uploads/P300821.pdf
FitzPatrick, P. J. (1931). Symptoms of industrial failures as revealed by an analysis of the financial statements of failed companies, 1920-1929. Catholic University of America, Pp. ix, 132.
Fitzpatrick, T., & Mues, C. (2016). An Empirical Comparison of Classification Algorithms for Mortgage Default Prediction: Evidence from a Distressed Mortgage Market. European Journal of Operational Research, 25(2), 427–439.
Gao, J., Sun, W., & Sui, X. (2021). Research on Default Prediction for Credit Card Users Based on XGBoost-LSTM Model. Discrete Dynamics in Nature and Society, 1–13.
Gholamian, M. R., Sadat Rassoul, M., & Haji Mohammadi, Z. (2012). Association Rule Extraction for Credit Ranking Using Genetic Algorithm: A Case Study of Iranian Bank. In Third National Industrial Engineering and Systems Conference.
Jacobson, T., & Roszbach, K. (2003). Bank Lending Policy, Credit Scoring and Value-at-Risk. Journal of Banking & Finance, 27(4), 615–633.
Khaleghi Far, H. (2013). Presentation of an Effective Model on Credit Risk Using Logistic Regression in Bank Eghtesad Novin Kish Island Branch. In First National Conference on Monetary and Banking Management Development.
Kimyagari, M. A., & Amini, M. J. (2012). Credit Risk Model for Repayment of Bank Credit Loans (Case Study: Shahreza Area of Bank Melli Iran). Ninth Industrial Engineering Conference.
Königstorfer, F., & Thalmann, S. (2020). Applications of Artificial Intelligence in Commercial Banks – A Research Agenda for Behavioral Finance. Journal of Behavioral and Experimental Finance, 39(4), 55-60.
Lee, S., & Lundberg, S. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30, 4765–4774.
Liu, J., Gao, Y., & Hu, F. (2021). A Fast Network Intrusion Detection System Using Adaptive Synthetic Oversampling and LightGBM. Journal of Computer and Security, 106, 102289.
Malali, A. B. (2020). Application of Artificial Intelligence and Its Powered Technologies in the Indian Banking and Financial Industry: An Overview. IOSR Journal Of Humanities And Social Science, 25(1), 55–60.
Mileris, R. (2013). Macroeconomic Determinants of Loan Portfolio Credit Risk in Banks. Economics, 12 (3), 496–504.
Mirzaei, H., Nazarian, R., & Bagheri, R. (2011). Investigating Effective Factors on Credit Risk of Bank Customers. Quarterly Journal of Economic Trends, 16(3), 67–98.
Mitchell, T. M. (1997). Machine Learning. New York: McGraw Hill, 2–5.
Mohammadian Hajikord, A., Asgharzadeh Zaferani, M., & Emamdoust, M. (2016). Credit Risk Assessment of Legal Customers Using Support Vector Machine Model and Hybrid Model of Genetic Algorithm - A Case Study of Bank Tejarat. Financial Engineering and Securities Management, 11 (2), 17–32.
Nilchi, M., Moghadam, K., Sadrabadi, A. N., & Farhadian, A. (2018). Prediction of Payable Loans Risk Using Data Mining Tools. Journal of Monetary and Banking Research, 11 (4), 625–654.
Purkazemi, M., Sadeghatparast, A., & Dehpanah. (2017). Estimating the Probability of Real Customer Default in Banks Using Neural Network Method (Case Study: Pasargad Bank). Journal of Studies in Banking and Islamic Banking Management, 10 (1), 1–23.
Rashedul Islam, M., Akter Lima, A., Chandra Das, S., Mridha, M., Rahman Prodeep, A., & Watanobe, Y. (2022). A Comprehensive Survey on the Process, Methods, Evaluation, and Challenges of Feature Selection. IEEE Access, 10 (2), 34–51.
Sabzevari, H., Soleymani, M., & Noorbakhsh, E. (2007, November). A comparison between statistical and data mining methods for credit scoring in case of limited available data. In Proceedings of the 3rd CRC Credit Scoring Conference (pp. 1–5). Citeseer.
Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3 (3), 210–229.
Shoorvarzi, M. R., Masih Abadi, A., & Ghiasi Shahraki, M. (2012). Evaluation of Credit Risk of Legal Customers of Export Bank Based on Two Artificial Neural Network and Logistic Models and Their Comparison. In National Conference on Accounting, Financial Management, and Investment.
Siami‐Namini, S. T. (2019). A Comparative Analysis of Forecasting Financial Time Series Using ARIMA, LSTM, and BiLSTM. https://arxiv.org/abs/1911.09512.
Sohrabi, B., Raeisi Vanani, E., & Zare Mirak Abadi, F. (2016). Designing a Recommender System for Optimization and Management of Banking Loans Based on Clustering and Classification Algorithms. Innovative Research in Decision Making, 13(2), 53–76.
Taghavi, M., Lotfi, A. A., & Sohrabi, A. (2008). Credit Risk Model and Ranking of Agricultural Bank Corporate Customers. Economic Research Journal, 21 (1), 99–128.
Tehrani, R., Mohammadi, M., & Rahimi, A. M. (2009). Credit Scoring System and its Role in Improving the Financial Supply System. In First International Conference on the Development of Financial Supply System in Iran, Sharif University of Technology. https://sid.ir/paper/807684/fa
Thomas, L. C. (2000). Survey of Credit and Behavioral Scoring: Forecasting Financial Risk of Lending to Consumers. International Journal of Forecasting, 16(2), 149–172.
Werner de Vargas, V., Arthur Schneider Aranda, J., dos Santos Costa, R., Ricardo da Silva Pereira, P., & Luis Victória Barbosa, J. (2023). Imbalanced Data Preprocessing Techniques for Machine Learning: A Systematic Mapping Study. Knowledge and Information Systems, 40 (3), 34-51.
Ye, A. (2020). Xgboost, lightgbm, and other kaggle competition favorites. Sep, 2020Towards Data Science. 10(3), 785-794.
Zakavat, S. (2003). Credit Risk Models of Export Development Bank of Iran Customers. Master's Thesis in Banking Management, Banking Institute.
Zamani, S., & Abdipour, S. (2012). Credit Risk Prediction of Banking Customers Using Neural Network Model. In National Accounting, Financial Management, and Investment Conference.
Zhou, J., Li, W., Wang, J., Ding, S., & Xia, C. (2019). Default Prediction in P2P Lending from High-Dimensional Data Based on Machine Learning. Physica A: Statistical Mechanics and its Applications, 10 (2), 785–794.