PCOS Classification Using Random Forest, Recursive Feature Elimination, and Explainable AI
DOI:
https://doi.org/10.63158/journalisi.v8i3.1603Keywords:
Clinical Classification, Feature Selection, PCOS Classification, Recursive Feature Elimination, Explainable AIAbstract
Ovary Syndrome (PCOS) is an endocrine-related condition predominantly affecting women during their childbearing years who experience delayed diagnosis due to the limitations of conventional methods that require laboratory tests and imaging procedures that are relatively costly and time-consuming. This study develops a PCOS classification model based on a clinical dataset of 541 patients with 42 clinical attributes using the random forest algorithm with Recursive Feature Elimination (RFE) feature selection and an Explainable AI (XAI) approach. The research pipeline comprised several sequential stages: problem identification, data collection, preprocessing, data splitting, feature selection, model training and testing, evaluation, and SHAP-based explainability analysis. Performance was evaluated using Accuracy, Precision, Recall, and F1-score, and compared between two models, namely RF+CF and RF+RFE, where RF+RFE was identified as the best-performing model. The XAI approach using SHAP (SHapley Additive exPlanations) was applied to identify and explain the contribution of clinical variables to the classification results. The best model, RF+RFE, achieved an accuracy of 92.66%, precision of 93.75%, recall of 83.33%, and F1-score of 88.24%, demonstrating superior performance compared to RF+CF. As this study relies on a single dataset, broader validation across multiple centers is recommended before clinical deployment. This model is intended as a screening-support approach and has not been validated as a clinical diagnostic tool. The findings are anticipated to serve as a foundation for building data-driven early screening tools and clinical decision-making support systems.
Downloads
References
[1] H. Elmannai et al., “Polycystic Ovary Syndrome Detection Machine Learning Model Based on Optimized Feature Selection and Explainable Artificial Intelligence,” Diagnostics, vol. 13, no. 8, pp. 1–21, 2023, doi: 10.3390/diagnostics13081506.
[2] S. Arora, Vedpal, and N. Chauhan, "Polycystic Ovary Syndrome (PCOS) diagnostic methods in machine learning: a systematic literature review", vol. 84, no. 16. Springer US, 2025. doi: 10.1007/s11042-024-19707-6.
[3] S. Ahmed et al., “A Review on the Detection Techniques of Polycystic Ovary Syndrome Using Machine Learning,” IEEE Access, vol. 11, pp. 86522–86543, 2023, doi: 10.1109/ACCESS.2023.3304536.
[4] M. Alagarsamy, N. Shanmugam, D. P. Mani, M. Thayumanavan, K. K. Sundari, and K. Suriyan, “Detection of Polycystic Syndrome in Ovary Using Machine Learning Algorithm,” Int. J. Intell. Syst. Appl. Eng., vol. 11, no. 1, pp. 246–253, 2023.
[5] S. Tiwari et al., “SPOSDS: A smart Polycystic Ovary Syndrome diagnostic system using machine learning,” Expert Syst. Appl., vol. 203, no. May, 2022, doi: 10.1016/j.eswa.2022.117592.
[6] J. Lim et al., “Machine learning classification of polycystic ovary syndrome based on radial pulse wave analysis,” BMC Complement. Med. Ther., vol. 23, no. 1, pp. 1–15, 2023, doi: 10.1186/s12906-023-04249-5.
[7] C. Aulia et al., “Analisis Pola Gejala Pcos Menggunakan Algoritma K-Means Clustering,” JOISIE (Journal Inf. Syst. Informatics Eng., vol. 9, no. 1, pp. 91–99, 2025, [Online]. Available: https://www.ejournal.pelitaindonesia.ac.id/ojs32/index.php/JOISIE/article/view/4939
[8] H. J. Teede et al., “Recommendations From the 2023 International Evidence-based Guideline for the Assessment and Management of Polycystic Ovary Syndrome,” J. Clin. Endocrinol. Metab., vol. 108, no. 10, pp. 2447–2469, 2023, doi: 10.1210/clinem/dgad463.
[9] H. Yang et al., “Risk Prediction of Diabetes: Big data mining with fusion of multifarious physical examination indicators,” Inf. Fusion, vol. 75, no. February, pp. 140–149, 2021, doi: 10.1016/j.inffus.2021.02.015.
[10] S. Nasim, M. S. Almutairi, K. Munir, A. Raza, and F. Younas, “A Novel Approach for Polycystic Ovary Syndrome Prediction Using Machine Learning in Bioinformatics,” IEEE Access, vol. 10, no. September, pp. 97610–97624, 2022, doi: 10.1109/ACCESS.2022.3205587.
[11] S. Sreejith, H. Khanna Nehemiah, and A. Kannan, “A clinical decision support system for polycystic ovarian syndrome using red deer algorithm and random forest classifier,” Healthc. Anal., vol. 2, no. March, p. 100102, 2022, doi: 10.1016/j.health.2022.100102.
[12] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “The accuracy of Random Forest performance can be improved by conducting a feature selection with a balancing strategy,” PeerJ Comput. Sci., vol. 8, pp. 1–15, 2022, doi: 10.7717/PEERJ-CS.1041.
[13] M. I. Prasetiyowati, N. U. Maulidevi, and K. Surendro, “Feature selection to increase the random forest method performance on high dimensional data,” Int. J. Adv. Intell. Informatics, vol. 6, no. 3, pp. 303–312, 2020, doi: 10.26555/ijain.v6i3.471.
[14] S. Alam Suha and M. N. Islam, “Exploring the dominant features and data-driven detection of polycystic ovary syndrome through modified stacking ensemble machine learning technique,” Heliyon, vol. 9, no. 3, p. e14518, 2023, doi: 10.1016/j.heliyon.2023.e14518.
[15] R. Iranzad and X. Liu, “A review of random forest-based feature selection methods for data science education and applications,” Int. J. Data Sci. Anal., vol. 20, no. 2, pp. 197–211, 2025, doi: 10.1007/s41060-024-00509-w.
[16] S. Ratnasingam and J. Muñoz-Lopez, “Distance Correlation-Based Feature Selection in Random Forest,” Entropy, vol. 25, no. 9, 2023, doi: 10.3390/e25091250.
[17] M. Mohamad, A. Selamat, O. Krejcar, R. G. Crespo, E. Herrera-Viedma, and H. Fujita, “Enhancing big data feature selection using a hybrid correlation-based feature selection,” Electron., vol. 10, no. 23, pp. 1–24, 2021, doi: 10.3390/electronics10232984.
[18] N. G. Rezk, S. Alshathri, A. Sayed, E. El-Din Hemdan, and H. El-Behery, “XAI-Augmented Voting Ensemble Models for Heart Disease Prediction: A SHAP and LIME-Based Approach,” Bioengineering, vol. 11, no. 10, 2024, doi: 10.3390/bioengineering11101016.
[19] P. K. Mohanty, S. A. J. Francis, R. K. Barik, D. S. Roy, and M. J. Saikia, “Leveraging Shapley Additive Explanations for Feature Selection in Ensemble Models for Diabetes Prediction,” Bioengineering, vol. 11, no. 12, pp. 1–19, 2024, doi: 10.3390/bioengineering11121215.
[20] O. O. Bifarin, “Interpretable machine learning with treebased shapley additive explanations: Application to metabolomics datasets for binary classification,” PLoS One, vol. 18, no. 5 May, 2023, doi: 10.1371/journal.pone.0284315.
[21] T. Hulsen, “Explainable Artificial Intelligence (XAI): Concepts and Challenges in Healthcare,” AI, vol. 4, no. 3, pp. 652–666, 2023, doi: 10.3390/ai4030034.
[22] T. Patil and S. Arora, “Survey of Explainable AI Techniques: A Case Study of Healthcare,” Lect. Notes Networks Syst., vol. 765 LNNS, pp. 335–346, 2023, doi: 10.1007/978-981-99-5652-4_30.
[23] D. Saraswat et al., “Explainable AI for Healthcare 5.0: Opportunities and Challenges,” IEEE Access, vol. 10, no. July, pp. 84486–84517, 2022, doi: 10.1109/ACCESS.2022.3197671.
[24] S. Xia and Y. Yang, “A Model-Free Feature Selection Technique of Feature Screening and Random Forest-Based Recursive Feature Elimination,” Int. J. Intell. Syst., vol. 2023, 2023, doi: 10.1155/2023/2400194.
[25] U. M. G and U. M. P, “SmartScanPCOS: A feature-driven approach to cutting-edge prediction of Polycystic Ovary Syndrome using Machine Learning and Explainable Artificial Intelligence,” Heliyon, vol. 10, no. 20, 2024, doi: 10.1016/j.heliyon.2024.e39205.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














