Enhancing Financial Statement Fraud Detection through Machine Learning: A Comparative Study of Classification Models

Authors

  • Mohammad Musa Mia Master of Business Administration, International American University, Los Angeles, California
  • Abdullah Al Mamun Department of Computer & Info Science, Gannon University, Erie, Pennsylvania, USA
  • Md Parvez Ahmed Master of Science in Information Technology, Washington University of Science and Technology, USA
  • Sanjida Akter Tisha Master of Science in Information Technology, Washington University of Science and Technology, USA
  • S M Ahsan Habib Department of Electrical Engineering and Computer Science, South Dakota School of Mines & Technology, USA
  • Fariha Noor Nitu MS in Management Science & Supply Chain Management, Wichita State University, USA

Keywords:

Machine Learning, Fraud Detection, Financial Statements, Gradient Boosting, Random Forest, Ensemble Models, Predictive Analytics

Abstract

Financial statement fraud is a persistent challenge that undermines investor trust, corporate governance, and financial market stability. Traditional auditing approaches often fail to capture subtle manipulations within complex financial data, highlighting the need for advanced computational methods. In this study, we investigate the effectiveness of machine learning models in detecting fraudulent financial reporting. Using a publicly available dataset, we applied rigorous preprocessing, feature selection, and feature extraction techniques before evaluating five models: Logistic Regression, Support Vector Machines, Random Forest, Gradient Boosting Machines, and Deep Neural Networks. The results indicate that Gradient Boosting Machines achieved the best overall performance, with an accuracy of 94%, precision of 91%, recall of 88%, and an AUC-ROC score of 0.96. Random Forest also demonstrated strong performance, particularly in balancing recall and F1-score. These findings suggest that ensemble-based models are highly effective for identifying complex fraud patterns in financial statements. The study provides empirical evidence supporting the integration of machine learning into auditing and financial risk management systems, offering a scalable and reliable approach to strengthen fraud detection practices.

References

Association of Certified Fraud Examiners (ACFE). (2022). Report to the nations: 2022 global study on occupational fraud and abuse. ACFE. https://www.acfe.com

Beasley, M. S. (1996). An empirical analysis of the relation between the board of director composition and financial statement fraud. The Accounting Review, 71(4), 443–465.

Cecchini, M., Aytug, H., Koehler, G. J., & Pathak, P. (2010). Detecting management fraud in public companies. Management Science, 56(7), 1146–1160. https://doi.org/10.1287/mnsc.1100.1172

Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://arxiv.org/abs/1702.08608

Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4), 995–1003. https://doi.org/10.1016/j.eswa.2006.02.016

Nguyen, T. H., Choi, S., & Lee, Y. (2018). An empirical study on detecting financial statement fraud using SMOTE and machine learning techniques. International Journal of Accounting & Information Management, 26(4), 700–714. https://doi.org/10.1108/IJAIM-07-2017-0087

Perols, J. (2011). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 30(2), 19–50. https://doi.org/10.2308/ajpt-50009

Rashid, M., Asim, M., & Khan, H. U. (2020). A machine learning approach for detecting financial fraud. Journal of Information and Computational Science, 10(3), 84–95.

Wang, Z., & Li, J. (2021). Detecting financial statement fraud using deep neural networks. Applied Artificial Intelligence, 35(11), 843–861. https://doi.org/10.1080/08839514.2021.1947036

PHAN, H. T. N., & AKTER, A. (2024). HYBRID MACHINE LEARNING APPROACH FOR ORAL CANCER DIAGNOSIS AND CLASSIFICATION USING HISTOPATHOLOGICAL IMAGES. Universal Publication Index e-Library, 63-76.

Akhi, S. S., Shakil, F., Dey, S. K., Tusher, M. I., Kamruzzaman, F., Jamee, S. S., ... & Rahman, N. (2025). Enhancing Banking Cybersecurity: An Ensemble-Based Predictive Machine Learning Approach. The American Journal of Engineering and Technology, 7(03), 88-97.

Nath, F., Asish, S., Debi, H. R., Chowdhury, M. O. S., Zamora, Z. J., & Muñoz, S. (2023, August). Predicting hydrocarbon production behavior in heterogeneous reservoir utilizing deep learning models. In Unconventional Resources Technology Conference, 13–15 June 2023 (pp. 506-521). Unconventional Resources Technology Conference (URTeC).

Ahmmed, M. J., Rahman, M. M., Das, A. C., Das, P., Pervin, T., Afrin, S., ... & Rahman, N. (2024). COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR BANKING FRAUD DETECTION: A STUDY ON PERFORMANCE, PRECISION, AND REAL-TIME APPLICATION. American Research Index Library, 31-44.

Mohammad Iftekhar Ayub, Biswanath Bhattacharjee, Pinky Akter, Mohammad Nasir Uddin, Arun Kumar Gharami, Md Iftakhayrul Islam, Shaidul Islam Suhan, Md Sayem Khan, & Lisa Chambugong. (2025). Deep Learning for Real-Time Fraud Detection: Enhancing Credit Card Security in Banking Systems. The American Journal of Engineering and Technology, 7(04), 141–150. https://doi.org/10.37547/tajet/Volume07Issue04-19

Nguyen, A. T. P., Jewel, R. M., & Akter, A. (2025). Comparative Analysis of Machine Learning Models for Automated Skin Cancer Detection: Advancements in Diagnostic Accuracy and AI Integration. The American Journal of Medical Sciences and Pharmaceutical Research, 7(01), 15-26.

Nguyen, A. T. P., Shak, M. S., & Al-Imran, M. (2024). ADVANCING EARLY SKIN CANCER DETECTION: A COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORITHMS FOR MELANOMA DIAGNOSIS USING DERMOSCOPIC IMAGES. International Journal of Medical Science and Public Health Research, 5(12), 119-133.

Phan, H. T. N., & Akter, A. (2025). Predicting the Effectiveness of Laser Therapy in Periodontal Diseases Using Machine Learning Models. The American Journal of Medical Sciences and Pharmaceutical Research, 7(01), 27-37.

Phan, H. T. N. (2024). EARLY DETECTION OF ORAL DISEASES USING MACHINE LEARNING: A COMPARATIVE STUDY OF PREDICTIVE MODELS AND DIAGNOSTIC ACCURACY. International Journal of Medical Science and Public Health Research, 5(12), 107-118.

Al Mamun, A., Nath, A., Dey, S. K., Nath, P. C., Rahman, M. M., Shorna, J. F., & Anjum, N. (2025). Real-Time Malware Detection in Cloud Infrastructures Using Convolutional Neural Networks: A Deep Learning Framework for Enhanced Cybersecurity. The American Journal of Engineering and Technology, 7(03), 252-261.

Mohammad Iftekhar Ayub, Biswanath Bhattacharjee, Pinky Akter, Mohammad Nasir Uddin, Arun Kumar Gharami, Md Iftakhayrul Islam, Shaidul Islam Suhan, Md Sayem Khan, & Lisa Chambugong. (2025). Deep Learning for Real-Time Fraud Detection: Enhancing Credit Card Security in Banking Systems. The American Journal of Engineering and Technology, 7(04), 141–150. https://doi.org/10.37547/tajet/Volume07Issue04-19

Safayet Hossain, Ashadujjaman Sajal, Sakib Salam Jamee, Sanjida Akter Tisha, Md Tarake Siddique, Md Omar Obaid, MD Sajedul Karim Chy, & Md Sayem Ul Haque. (2025). Comparative Analysis of Machine Learning Models for Credit Risk Prediction in Banking Systems. The American Journal of Engineering and Technology, 7(04), 22–33. https://doi.org/10.37547/tajet/Volume07Issue04-04

Siddique, M. T., Uddin, M. J., Chambugong, L., Nijhum, A. M., Uddin, M. N., Shahid, R., ... & Ahmed, M. (2025). AI-Powered Sentiment Analytics in Banking: A BERT and LSTM Perspective. International Interdisciplinary Business Economics Advancement Journal, 6(05), 135-147.

Al Mamun, A., Nath, A., Dey, S. K., Nath, P. C., Rahman, M. M., Shorna, J. F., & Anjum, N. (2025). Real-Time Malware Detection in Cloud Infrastructures Using Convolutional Neural Networks: A Deep Learning Framework for Enhanced Cybersecurity. The American Journal of Engineering and Technology, 7(03), 252-261.

Sajal, A., Chy, M. S. K., Jamee, S. S., Uddin, M. N., Khan, M. S., Gharami, A. K., ... & Ahmed, M. (2025). Forecasting Bank Profitability Using Deep Learning and Macroeconomic Indicators: A Comparative Model Study. International Interdisciplinary Business Economics Advancement Journal, 6(06), 08-20.

Paresh Chandra Nath, Md Sajedul Karim Chy, Md Refat Hossain, Md Rashel Miah, Sakib Salam Jamee, Mohammad Kawsur Sharif, Md Shakhaowat Hossain, & Mousumi Ahmed. (2025). Comparative Performance of Large Language Models for Sentiment Analysis of Consumer Feedback in the Banking Sector: Accuracy, Efficiency, and Practical Deployment. Frontline Marketing, Management and Economics Journal, 5(06), 07–19. https://doi.org/10.37547/marketing-fmmej-05-06-02

Hossain, S., Siddique, M. T., Hosen, M. M., Jamee, S. S., Akter, S., Akter, P., ... & Khan, M. S. (2025). Comparative Analysis of Sentiment Analysis Models for Consumer Feedback: Evaluating the Impact of Machine Learning and Deep Learning Approaches on Business Strategies. Frontline Social Sciences and History Journal, 5(02), 18-29.

Sajal, A., Chy, M. S. K., Jamee, S. S., Uddin, M. N., Khan, M. S., Gharami, A. K., ... & Ahmed, M. (2025). Forecasting Bank Profitability Using Deep Learning and Macroeconomic Indicators: A Comparative Model Study. International Interdisciplinary Business Economics Advancement Journal, 6(06), 08-20.

Mohammad Iftekhar Ayub, Arun Kumar Gharami, Fariha Noor Nitu, Mohammad Nasir Uddin, Md Iftakhayrul Islam, Alifa Majumder Nijhum, Molay Kumar Roy, & Syed Yezdani. (2025). AI-Driven Demand Forecasting for Multi-Echelon Supply Chains: Enhancing Forecasting Accuracy and Operational Efficiency through Machine Learning and Deep Learning Techniques. The American Journal of Management and Economics Innovations, 7(07), 74–85. https://doi.org/10.37547/tajmei/Volume07Issue07-09

Sharmin Sultana Akhi, Sadia Akter, Md Refat Hossain, Arjina Akter, Nur Nobe, & Md Monir Hosen. (2025). Early-Stage Chronic Disease Prediction Using Deep Learning: A Comparative Study of LSTM and Traditional Machine Learning Models. Frontline Medical Sciences and Pharmaceutical Journal, 5(07), 8–17. https://doi.org/10.37547/medical-fmspj-05-07-02

Deep Learning-Driven Customer Segmentation in Banking: A Comparative Analysis for Real-Time Decision Support. (2025). International Interdisciplinary Business Economics Advancement Journal, 6(08), 9-22. https://doi.org/10.55640/business/volume06issue08-02

Nur Nobe, Md Refat Hossain, MD Sajedul Karim Chy, Md. Emran Hossen, Arjina Akter, & Zerin Akter. (2025). Comparative Evaluation of Machine Learning Algorithms for Forecasting Infectious Diseases: Insights from COVID-19 and Dengue Data. International Journal of Medical Science and Public Health Research, 6(08), 22–33. https://doi.org/10.37547/ijmsphr/Volume06Issue08-05

A. C. Das, M. S. Shak, N. Rahman, F. Mahmud, A. A. Eva and M. N. Hasan, "Self-Supervised Contrastive Learning for Disease Trajectory Prediction," 2025 5th International Conference on Pervasive Computing and Social Networking (ICPCSN), Salem, India, 2025, pp. 732-738, doi: 10.1109/ICPCSN65854.2025.11035472.

F. Mahmud, A. C. Das, M. S. Shak, N. Rahman, M. Ahmed and A. Sayeema, "Adaptive Few-Shot Fraud Detection: A Meta-Learning Approach," 2025 2nd International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 2025, pp. 1-6, doi: 10.1109/RMKMATE64874.2025.11042527.

Downloads

Published

2025-09-17