Hybrid Unsupervised Machine Learning for Insurance Fraud Detection: PCA-XGBoost-LOF and Isolation Forest
Abstract
Insurance fraud poses a significant threat to the financial stability of insurance companies, resulting in substantial economic losses. To combat this issue, this study proposes a novel unsupervised machine learning hybrid algorithm, integrating Principal Component Analysis (PCA), Extreme Gradient Boosting (XGBoost), Local Outlier Factor (LOF), and Isolation Forest. This hybrid approach aims to improve the detection accuracy of insurance fraud by combining the strengths of each individual algorithm. Experimental results a real-world insurance dataset demonstrate a detection accuracy of 92%, precision of 92% and recall of 96%. Our experimental results demonstrate that the proposed hybrid algorithm outperforms existing state-of-the-art methods, achieving a higher detection rate and reducing false positives. This research contributes to the development of effective insurance fraud detection systems, ultimately helping insurance companies to minimize financial losses and improve their overall profitability.
Downloads
References
G. Pu, L. Wang, J. Shen, and F. Dong, “A hybrid unsupervised clustering-based anomaly detection method,” Tsinghua Sci. Technol., vol. 26, no. 2, pp. 146–153, Apr. 2021, doi: 10.26599/TST.2019.9010051.
C. Gomes, Z. Jin, and H. Yang, “Insurance fraud detection with unsupervised deep learning,” J. Risk Insur., vol. 88, no. 3, pp. 591–624, Sep. 2021, doi: 10.1111/jori.12359.
S. Chander and P. Vijaya, “Unsupervised learning methods for data clustering,” in Artificial Intelligence in Data Mining, Elsevier, 2021, pp. 41–64, doi: 10.1016/B978-0-12-820601-0.00002-1.
B. F. Azevedo, A. M. A. C. Rocha, and A. I. Pereira, “Hybrid approaches to optimization and machine learning methods: a systematic literature review,” Mach. Learn., vol. 113, no. 7, pp. 4055–4097, Jul. 2024, doi: 10.1007/s10994-023-06467-x.
W. Lin, S. Wang, W. Wu, D. Li, and A. Y. Zomaya, “HybridAD: A Hybrid Model-Driven Anomaly Detection Approach for Multivariate Time Series,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 8, no. 1, pp. 866–878, Feb. 2024, doi: 10.1109/TETCI.2023.3290027.
F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, “Insurance fraud detection: Evidence from artificial intelligence and machine learning,” Res. Int. Bus. Finance, vol. 62, p. 101744, Dec. 2022, doi: 10.1016/j.ribaf.2022.101744.
A. Seyyedabbasi, R. Aliyev, F. Kiani, M. U. Gulle, H. Basyildiz, and M. A. Shah, “Hybrid algorithms based on combining reinforcement learning and metaheuristic methods to solve global optimization problems,” Knowl.-Based Syst., vol. 223, p. 107044, Jul. 2021, doi: 10.1016/j.knosys.2021.107044.
E. T. Muswere, “Fraudulent Vehicle Insurance Claims Prediction Model Using Supervised Machine Learning in the Zimbabwean Insurance Industry,” 2023, doi: 10.13140/RG.2.2.14462.36163.
P. L. Brockett, X. Xia, and R. A. Derrig, “Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud,” J. Risk Insur., vol. 65, no. 2, p. 245, Jun. 1998, doi: 10.2307/253535.
M. Kovacs, R. Hoekstra, and B. Aczel, “The Role of Human Fallibility in Psychological Research: A Survey of Mistakes in Data Management,” Adv. Methods Pract. Psychol. Sci., vol. 4, no. 4, p. 25152459211045930, Oct. 2021, doi: 10.1177/25152459211045930.
X. Zhu et al., “Intelligent financial fraud detection practices in post-pandemic era,” The Innovation, vol. 2, no. 4, p. 100176, Nov. 2021, doi: 10.1016/j.xinn.2021.100176.
D. Shin, “Misinformation and Algorithmic Bias,” in Artificial Misinformation, Cham: Springer Nature Switzerland, 2024, pp. 15–47, doi: 10.1007/978-3-031-52569-8_2.
A. Tsamados et al., “The ethics of algorithms: key problems and solutions,” AI Soc., vol. 37, no. 1, pp. 215–230, Mar. 2022, doi: 10.1007/s00146-021-01154-8.
D. Breskuvienė and G. Dzemyda, “Enhancing credit card fraud detection: highly imbalanced data case,” J. Big Data, vol. 11, no. 1, p. 182, Dec. 2024, doi: 10.1186/s40537-024-01059-5.
V. Hassija et al., “Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence,” Cogn. Comput., vol. 16, no. 1, pp. 45–74, Jan. 2024, doi: 10.1007/s12559-023-10179-8.
M. J. Neuer, “Unsupervised Learning,” in Machine Learning for Engineers, Berlin, Heidelberg: Springer Berlin Heidelberg, 2025, pp. 141–172, doi: 10.1007/978-3-662-69995-9_5.
B. F. Azevedo, A. M. A. C. Rocha, and A. I. Pereira, “Hybrid approaches to optimization and machine learning methods: a systematic literature review,” Mach. Learn., vol. 113, no. 7, pp. 4055–4097, Jul. 2024, doi: 10.1007/s10994-023-06467-x.
W. Hilal, S. A. Gadsden, and J. Yawney, “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” Expert Syst. Appl., vol. 193, p. 116429, May 2022, doi: 10.1016/j.eswa.2021.116429.
R. Panchendrarajan and A. Zubiaga, “Synergizing machine learning & symbolic methods: A survey on hybrid approaches to natural language processing,” Expert Syst. Appl., vol. 251, p. 124097, Oct. 2024, doi: 10.1016/j.eswa.2024.124097.
S. Hariri, M. C. Kind, and R. J. Brunner, “Extended Isolation Forest,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 4, pp. 1479–1489, Apr. 2021, doi: 10.1109/TKDE.2019.2947676.
M. Ben Nasr and M. Chtourou, “Neural network control of nonlinear dynamic systems using hybrid algorithm,” Appl. Soft Comput., vol. 24, pp. 423–431, Nov. 2014, doi: 10.1016/j.asoc.2014.07.023.
P. R. K., D. Arumugam, and D., “Hybridization of Machine Learning Techniques for WSN Optimal Cluster Head Selection,” Int. J. Electr. Electron. Res., vol. 11, no. 2, pp. 426–433, Jun. 2023, doi: 10.37391/ijeer.110224.
Y. Zhao and M. K. Hryniewicki, “XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2018, pp. 1–8, doi: 10.1109/IJCNN.2018.8489605.
G. P. Spathoulas and S. K. Katsikas, “Reducing false positives in intrusion detection systems,” Comput. Secur., vol. 29, no. 1, pp. 35–44, Feb. 2010, doi: 10.1016/j.cose.2009.07.008.
D. A. Jerab and T. Mabrouk, “Strategic Excellence: Achieving Competitive Advantage through Differentiation Strategies,” SSRN Electron. J., 2023, doi: 10.2139/ssrn.4575042.
P. Dua and S. Bais, “Supervised Learning Methods for Fraud Detection in Healthcare Insurance,” in Mach. Learn. Healthc. Inform., vol. 56, S. Dua, U. R. Acharya, and P. Dua, Eds., Intell. Syst. Ref. Libr., vol. 56, Berlin, Heidelberg: Springer, 2014, pp. 261–285, doi: 10.1007/978-3-642-40017-9_12.
F. Aslam, A. I. Hunjra, Z. Ftiti, W. Louhichi, and T. Shams, “Insurance fraud detection: Evidence from artificial intelligence and machine learning,” Res. Int. Bus. Finance, vol. 62, p. 101744, Dec. 2022, doi: 10.1016/j.ribaf.2022.101744.


Copyright (c) 2025 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)