Hybrid Unsupervised Machine Learning for Insurance Fraud Detection: PCA-XGBoost-LOF and Isolation Forest

  • Natsai Chapwanya North West University, South Africa
  • Karikoga Norman Gorejena North-West University, South Africa
Keywords: Insurance Fraud Detection; Hybrid Machine Learning; Unsupervised Learning; Anomaly Detection; Principal Component Analysis (PCA)

Abstract

Insurance fraud poses a significant threat to the financial stability of insurance companies, resulting in substantial economic losses. To combat this issue, this study proposes a novel unsupervised machine learning hybrid algorithm, integrating Principal Component Analysis (PCA), Extreme Gradient Boosting (XGBoost), Local Outlier Factor (LOF), and Isolation Forest. This hybrid approach aims to improve the detection accuracy of insurance fraud by combining the strengths of each individual algorithm. Experimental results a real-world insurance dataset demonstrate a detection accuracy of 92%, precision of 92% and recall of 96%. Our experimental results demonstrate that the proposed hybrid algorithm outperforms existing state-of-the-art methods, achieving a higher detection rate and reducing false positives. This research contributes to the development of effective insurance fraud detection systems, ultimately helping insurance companies to minimize financial losses and improve their overall profitability.

Downloads

Download data is not yet available.

References

G. Pu, L. Wang, J. Shen, and F. Dong, “A hybrid unsupervised clustering-based anomaly detection method,” Tsinghua Sci. Technol., vol. 26, no. 2, pp. 146–153, Apr. 2021, doi: 10.26599/TST.2019.9010051.

C. Gomes, Z. Jin, and H. Yang, “Insurance fraud detection with unsupervised deep learning,” J. Risk Insur., vol. 88, no. 3, pp. 591–624, Sep. 2021, doi: 10.1111/jori.12359.

S. Chander and P. Vijaya, “Unsupervised learning methods for data clustering,” in Artificial Intelligence in Data Mining, Elsevier, 2021, pp. 41–64, doi: 10.1016/B978-0-12-820601-0.00002-1.

B. F. Azevedo, A. M. A. C. Rocha, and A. I. Pereira, “Hybrid approaches to optimization and machine learning methods: a systematic literature review,” Mach. Learn., vol. 113, no. 7, pp. 4055–4097, Jul. 2024, doi: 10.1007/s10994-023-06467-x.

W. Lin, S. Wang, W. Wu, D. Li, and A. Y. Zomaya, “HybridAD: A Hybrid Model-Driven Anomaly Detection Approach for Multivariate Time Series,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 8, no. 1, pp. 866–878, Feb. 2024, doi: 10.1109/TETCI.2023.3290027.

F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, “Insurance fraud detection: Evidence from artificial intelligence and machine learning,” Res. Int. Bus. Finance, vol. 62, p. 101744, Dec. 2022, doi: 10.1016/j.ribaf.2022.101744.

A. Seyyedabbasi, R. Aliyev, F. Kiani, M. U. Gulle, H. Basyildiz, and M. A. Shah, “Hybrid algorithms based on combining reinforcement learning and metaheuristic methods to solve global optimization problems,” Knowl.-Based Syst., vol. 223, p. 107044, Jul. 2021, doi: 10.1016/j.knosys.2021.107044.

E. T. Muswere, “Fraudulent Vehicle Insurance Claims Prediction Model Using Supervised Machine Learning in the Zimbabwean Insurance Industry,” 2023, doi: 10.13140/RG.2.2.14462.36163.

P. L. Brockett, X. Xia, and R. A. Derrig, “Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud,” J. Risk Insur., vol. 65, no. 2, p. 245, Jun. 1998, doi: 10.2307/253535.

M. Kovacs, R. Hoekstra, and B. Aczel, “The Role of Human Fallibility in Psychological Research: A Survey of Mistakes in Data Management,” Adv. Methods Pract. Psychol. Sci., vol. 4, no. 4, p. 25152459211045930, Oct. 2021, doi: 10.1177/25152459211045930.

X. Zhu et al., “Intelligent financial fraud detection practices in post-pandemic era,” The Innovation, vol. 2, no. 4, p. 100176, Nov. 2021, doi: 10.1016/j.xinn.2021.100176.

D. Shin, “Misinformation and Algorithmic Bias,” in Artificial Misinformation, Cham: Springer Nature Switzerland, 2024, pp. 15–47, doi: 10.1007/978-3-031-52569-8_2.

A. Tsamados et al., “The ethics of algorithms: key problems and solutions,” AI Soc., vol. 37, no. 1, pp. 215–230, Mar. 2022, doi: 10.1007/s00146-021-01154-8.

D. Breskuvienė and G. Dzemyda, “Enhancing credit card fraud detection: highly imbalanced data case,” J. Big Data, vol. 11, no. 1, p. 182, Dec. 2024, doi: 10.1186/s40537-024-01059-5.

V. Hassija et al., “Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence,” Cogn. Comput., vol. 16, no. 1, pp. 45–74, Jan. 2024, doi: 10.1007/s12559-023-10179-8.

M. J. Neuer, “Unsupervised Learning,” in Machine Learning for Engineers, Berlin, Heidelberg: Springer Berlin Heidelberg, 2025, pp. 141–172, doi: 10.1007/978-3-662-69995-9_5.

B. F. Azevedo, A. M. A. C. Rocha, and A. I. Pereira, “Hybrid approaches to optimization and machine learning methods: a systematic literature review,” Mach. Learn., vol. 113, no. 7, pp. 4055–4097, Jul. 2024, doi: 10.1007/s10994-023-06467-x.

W. Hilal, S. A. Gadsden, and J. Yawney, “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” Expert Syst. Appl., vol. 193, p. 116429, May 2022, doi: 10.1016/j.eswa.2021.116429.

R. Panchendrarajan and A. Zubiaga, “Synergizing machine learning & symbolic methods: A survey on hybrid approaches to natural language processing,” Expert Syst. Appl., vol. 251, p. 124097, Oct. 2024, doi: 10.1016/j.eswa.2024.124097.

S. Hariri, M. C. Kind, and R. J. Brunner, “Extended Isolation Forest,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 4, pp. 1479–1489, Apr. 2021, doi: 10.1109/TKDE.2019.2947676.

M. Ben Nasr and M. Chtourou, “Neural network control of nonlinear dynamic systems using hybrid algorithm,” Appl. Soft Comput., vol. 24, pp. 423–431, Nov. 2014, doi: 10.1016/j.asoc.2014.07.023.

P. R. K., D. Arumugam, and D., “Hybridization of Machine Learning Techniques for WSN Optimal Cluster Head Selection,” Int. J. Electr. Electron. Res., vol. 11, no. 2, pp. 426–433, Jun. 2023, doi: 10.37391/ijeer.110224.

Y. Zhao and M. K. Hryniewicki, “XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2018, pp. 1–8, doi: 10.1109/IJCNN.2018.8489605.

G. P. Spathoulas and S. K. Katsikas, “Reducing false positives in intrusion detection systems,” Comput. Secur., vol. 29, no. 1, pp. 35–44, Feb. 2010, doi: 10.1016/j.cose.2009.07.008.

D. A. Jerab and T. Mabrouk, “Strategic Excellence: Achieving Competitive Advantage through Differentiation Strategies,” SSRN Electron. J., 2023, doi: 10.2139/ssrn.4575042.

P. Dua and S. Bais, “Supervised Learning Methods for Fraud Detection in Healthcare Insurance,” in Mach. Learn. Healthc. Inform., vol. 56, S. Dua, U. R. Acharya, and P. Dua, Eds., Intell. Syst. Ref. Libr., vol. 56, Berlin, Heidelberg: Springer, 2014, pp. 261–285, doi: 10.1007/978-3-642-40017-9_12.

F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, “Insurance fraud detection: Evidence from artificial intelligence and machine learning,” Res. Int. Bus. Finance, vol. 62, p. 101744, Dec. 2022, doi: 10.1016/j.ribaf.2022.101744.

Published
2025-03-31
Abstract views: 91 times
Download PDF: 45 times
How to Cite
Chapwanya, N., & Gorejena, K. (2025). Hybrid Unsupervised Machine Learning for Insurance Fraud Detection: PCA-XGBoost-LOF and Isolation Forest. Journal of Information Systems and Informatics, 7(1), 941-959. https://doi.org/10.51519/journalisi.v7i1.958
Section
Articles