Comparative Analysis of Machine Learning Algorithms in Predicting Jumps in Stock Closing Price: Case Study of Iran Khodro Using NearMiss and SMOTE Approaches

Document Type : Original Article

Authors

1 Prof., Prof., Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran.

2 Ph.D. Candidate, Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran.

10.30699/ijf.2025.491324.1496
Abstract
Predicting stock price fluctuations has always been one of the most important financial challenges due to the complexities of financial data and nonlinear market behavior. This research aimed to analyze and compare the performance of machine learning algorithms in predicting the closing price jump of Iran Khodro Company shares. Two different methods of managing unbalanced data, NearMiss and SMOTE, were used to overcome the challenge of unbalanced data. The results showed that the NearMiss method outperformed SMOTE by balancing precision and recall in machine learning models. The CatBoost model was recognized as the best machine learning model in this study due to its stable performance in NearMiss and SMOTE methods. The CatBoost model showed a perfect balance between evaluation indicators in the NearMiss method, with an accuracy of 91.46% and an F1 score of 91.29%. This model also had high precision (93.18%) and acceptable recall (89.52%), which showed the ability to detect jumps and avoid wrong predictions correctly. On the other hand, in the SMOTE method, the Random Forest model was superior, with an accuracy of 85.08%. These results show that a combination of unbalanced data management methods and advanced machine learning algorithms can significantly improve the accuracy of price volatility prediction. The results of this research can help investors and financial analysts make better decisions in risk management and optimizing investment strategies.

Keywords


Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4(none). https://doi.org/10.1214/09-ss054
Barreñada, L., Dhiman, P., Timmerman, D., Boulesteix, A., & Van Calster, B. (2024). Understanding overfitting in random forest for probability estimation: a visualization and simulation study. Diagnostic and Prognostic Research, 8(1). https://doi.org/10.1186/s41512-024-00177-1
Bentéjac, C., CsörgÅ‘, A., & Martínez-Muñoz, G. (2020). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
Bhamare, M., Kulkarni, P., Dholwani, D., Katyarmal, M., & Khatri, V. (2023). Prediction of Stock Market Closing Rates Using Deep Learning and Machine Learning Algorithms. 2023 IEEE 5th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA), 131-139., 11, 131–139. https://doi.org/10.1109/icccmla58983.2023.10346622
Blanquero, R., Carrizosa, E., Ramírez-Cobo, P., & Sillero-Denamiel, M. R. (2021). Constrained Naïve Bayes with application to unbalanced data classification. Central European Journal of Operations Research, 30(4), 1403–1425. https://doi.org/10.1007/s10100-021-00782-1
Chandar, S. K. (2024). Deep learning framework for stock price prediction using long short-term memory. Soft Computing. https://doi.org/10.1007/s00500-024-09836-3
Cosenza, D. N., Saarela, S., Strunk, J., Korhonen, L., Maltamo, M., & Packalen, P. (2024). Effects of model-overfit on model-assisted forest inventory in boreal forests with remote sensing data. Forestry, an International Journal of Forest Research. https://doi.org/10.1093/forestry/cpae055
Costa, V. G., & Pedreira, C. E. (2022). Recent advances in decision trees: an updated survey. Artificial Intelligence Review, 56(5), 4765–4800. https://doi.org/10.1007/s10462-022-10275-5
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1810.11363
Elreedy, D., Atiya, A. F., & Kamalov, F. (2023). A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning. Machine Learning, 113(7), 4903–4923. https://doi.org/10.1007/s10994-022-06296-4
Emami, S., & Martínez-Muñoz, G. (2023). Sequential training of neural networks with gradient boosting. IEEE Access, 11, 42738–42750. https://doi.org/10.1109/access.2023.3271515
Gholami, N., & Shams Gharne, N. (2024). Presenting an Optimized CNN-LSTM Model for Stock Price Forecasting in the Tehran Stock Exchange. Financial Management Perspective, 14(45), 123–147. doi: 10.48308/jfmp.2024.104892
Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. Review of Financial Studies, 33(5), 2223–2273. https://doi.org/10.1093/rfs/hhaa009
Gupta, V., & Ahmad, M. (2019). Stock price trend prediction with long short-term memory neural networks. International Journal of Computational Intelligence Studies, 8(4), 289. https://doi.org/10.1504/ijcistudies.2019.10025266
Hancock, J., & Khoshgoftaar, T. M. (2021). Leveraging LightGBM for Categorical Big Data. 2021 IEEE Seventh International Conference on Big Data Computing Service and Applications (BigDataService). https://doi.org/10.1109/bigdataservice52369.2021.00024
Heidari, M., & Amiri, H. (2022). Inspecting the Predictive Power of Artificial Intelligence Models in Predicting the Stock Price Trend in Tehran Stock Exchange. Financial Research Journal, 24(4), 602-623. doi: 10.22059/frj.2022.320064.1007149
Izsák, T., Marák, L., & Ormos, M. (2023). EVALUATION OF SUPPORT VECTOR MACHINE BASED STOCK PRICE PREDICTION. Applied Computer Science, 19(3), 64–82. https://doi.org/10.35784/acs-2023-25
Jafarnejad Chaghoshi, A. , Rezasoltani, A. and Khani, A. M. (2024). Unleashing the Power of Ensemble Learning: Predicting National Ranks in Iran’s University Entrance Examination. Industrial Management Journal, 16(3), 457-481. doi: 10.22059/imj.2024.381521.1008178
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning. In Springer texts in statistics. https://doi.org/10.1007/978-1-0716-1418-1
Johnson, N. F., Jefferies, P., & Hui, P. M. (2003). Financial market complexity. https://doi.org/10.1093/acprof:oso/9780198526650.001.0001
Khairi, T. W. A., Zaki, R. M., & Mahmood, W. A. (2019). Stock Price Prediction using Technical, Fundamental and News based Approach. 2019 2nd Scientific Conference of Computer Sciences (SCCS), 177–181. https://doi.org/10.1109/sccs.2019.8852599
Khandagale, H. P., Patil, R., Patil, S., & Bhosale, D. (2023). PREDICTING STOCK PRICES WITH MACHINE LEARNING USING COMPARATIVE ANALYSIS OF RANDOM FOREST ALGORITHM. International Journal of Engineering Applied Sciences and Technology, 8(6), 60–68. https://doi.org/10.33564/ijeast.2023.v08i06.008
Kleinbaum, D. G., & Klein, M. (2010). Logistic regression. In Statistics in the health sciences. https://doi.org/10.1007/978-1-4419-1742-3
Kotsiantis, S. B. (2011). Decision trees: a recent overview. Artificial Intelligence Review, 39(4), 261–283. https://doi.org/10.1007/s10462-011-9272-4
Li, Q., He, Y., & Pan, J. (2023). CrossFuse-XGBoost: accurate prediction of the maximum recommended daily dose through multi-feature fusion, cross-validation screening and extreme gradient boosting. Briefings in Bioinformatics, 25(1). https://doi.org/10.1093/bib/bbad511
Liu, G., Chen, X., Luan, Y., & Li, D. (2024). VirusPredictor: XGBoost-based software to predict virus-related sequences in human data. Bioinformatics, 40(4). https://doi.org/10.1093/bioinformatics/btae192
Lu, H., & Hu, X. (2023). Enhancing financial risk prediction for listed Companies: A CatBoost-Based Ensemble Learning Approach. Journal of the Knowledge Economy, 15(2), 9824–9840. https://doi.org/10.1007/s13132-023-01601-5
Mathanprasad, L., & Gunasekaran, M. (2022). Analysing the Trend of Stock Marketand Evaluate the performance of Market Prediction using Machine Learning Approach. 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI). https://doi.org/10.1109/accai53970.2022.9752616
Mehta, P., Pandya, S., & Kotecha, K. (2021). Harvesting social media sentiment analysis to enhance stock market prediction using deep learning. PeerJ Computer Science, 7, e476. https://doi.org/10.7717/peerj-cs.476
Moerman, T., Santos, S. A., González-Blas, C. B., Simm, J., Moreau, Y., Aerts, J., & Aerts, S. (2018). GRNBoost2 and Arboreto: efficient and scalable inference of gene regulatory networks. Bioinformatics, 35(12), 2159–2161. https://doi.org/10.1093/bioinformatics/bty916
Parashar, D., DSilva, M., & Kulshreshtha, S. (2023). A Machine Learning Framework for Stock Prediction using Sentiment Analysis. 2023 4th IEEE Global Conference for Advancement in Technology (GCAT), 1–5., 1, 1–5. https://doi.org/10.1109/gcat59970.2023.10353541
Pflueger, C., Siriwardane, E., & Sunderam, A. (2020). Financial Market risk perceptions and the Macroeconomy*. The Quarterly Journal of Economics, 135(3), 1443–1491. https://doi.org/10.1093/qje/qjaa009
Prasad, V. V., Gumparthi, S., Venkataramana, L. Y., Srinethe, S., Sree, R. M. S., & Nishanthi, K. (2021). Prediction of stock prices using statistical and machine learning models: A comparative analysis. The Computer Journal, 65(5), 1338–1351. https://doi.org/10.1093/comjnl/bxab008
Rezaeyan, S., Taleghani, M., & Sherejsharifi, A. (2024). Developing a Comprehensive Model for Predicting Stock Prices in the Stock Market Using an Interpretive Structural Modeling Approach. Financial Research Journal, 26(3), 553–578. doi: 10.22059/frj.2023.364348.1007501
Sharif far, A., Khaliliaraghi, M., Raeesi Vanani, I., & Fallahshams, M. (2022). Application of Deep Learning Architectures in Stock Price Forecasting: A Convolutional Neural Network ‎Approach. Journal of Asset Management and Financing, 10(3), 1-20. doi: 10.22108/amf.2022.129205.1673
Shen, J., & Shafiq, M. O. (2020). Short-term stock market price trend prediction using a comprehensive deep learning system. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00333-6
Sornavalli, G., Angelin, G., & Khanna, N. H. (2022). Intelligent forecast of stock markets to handle COVID-19 economic crisis by modified generative adversarial networks. The Computer Journal, 65(12), 3250–3264. https://doi.org/10.1093/comjnl/bxac056
Subekti, H., & Saepudin, D. (2022). Cross-Sectional Machine Learning Approach on Predicting Stock Return of LQ45 Index. 2022 1st International Conference on Software Engineering and Information Technology (ICoSEIT), 192-197., 2, 192–197. https://doi.org/10.1109/icoseit55604.2022.10030044
Syriopoulos, P. K., Kalampalikis, N. G., Kotsiantis, S. B., & Vrahatis, M. N. (2023). kNN Classification: a review. Annals of Mathematics and Artificial Intelligence. https://doi.org/10.1007/s10472-023-09882-x
Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-020-00349-y
Wickramasinghe, I., & Kalutarage, H. (2020). Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Computing, 25(3), 2277–2293. https://doi.org/10.1007/s00500-020-05297-6
Zhang, C., Zhou, X., & Wang, J. (2022). A financial risk early warning of listed companies based on PCA and BP Neural network. Mobile Information Systems, 2022, 1–11. https://doi.org/10.1155/2022/8320329.