Hypertension Classification Using Correlation-Based Feature Selection (CFS) with Random Forest, XGBoost, and Support Vector Machine: A Comparative Study on Indonesian Hospital Data
DOI:
https://doi.org/10.63158/journalisi.v8i3.1646Keywords:
Hypertension Classification, Correlation-Based Feature Selection, Label Leakage, Clinical Data Mining, Machine LearningAbstract
Hypertension is a major global health problem that significantly contributes to cardiovascular disease and mortality. This study evaluates the performance of Random Forest, XGBoost, and Support Vector Machine (SVM) algorithms integrated with Correlation-Based Feature Selection (CFS) for hypertension classification using hospital clinical data. The dataset comprises 500 clinical records containing demographic and physiological variables. CFS was applied to reduce irrelevant and redundant attributes before model training. Model performance was assessed using accuracy, precision, recall, F1-score, and AUC-ROC through 10-fold cross-validation. Statistical significance was examined using the Friedman test followed by the Wilcoxon signed-rank test with Bonferroni correction. The results show that CFS improved classification performance across all models by approximately 5–6%. XGBoost achieved the best performance with 93.5% accuracy and 0.96 AUC, followed by Random Forest and SVM. However, systolic and diastolic blood pressure, which define the hypertension label, were retained as predictors, indicating a diagnostic classification design rather than independent risk prediction. Therefore, the findings should be interpreted as dataset-based hypertension classification, not future hypertension risk prediction.
Downloads
References
[1] S. Datta et al., “Predicting hypertension onset from longitudinal electronic health records with deep learning,” JAMIA Open, vol. 5, no. 4, Dec. 2022, doi: 10.1093/jamiaopen/ooac097.
[2] Z. Noroozi, A. Orooji, and L. Erfannia, “Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-49962-w.
[3] J. Du et al., “Developing a hypertension visualization risk prediction system utilizing machine learning and health check-up data,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-46281-y.
[4] H. Zhao et al., “Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method,” Front. Public Health, vol. 9, Sep. 2021, doi: 10.3389/fpubh.2021.619429.
[5] A. Shrivastava, M. Chakkaravarthy, and M. A. Shah, “A new machine learning method for predicting systolic and diastolic blood pressure using clinical characteristics,” Healthcare Analytics, vol. 4, Dec. 2023, doi: 10.1016/j.health.2023.100219.
[6] J. S. Cho and J. H. Park, “Application of artificial intelligence in hypertension,” Dec. 01, 2024, BioMed Central Ltd. doi: 10.1186/s40885-024-00266-9.
[7] A. T. Layton, “AI, Machine Learning, and ChatGPT in Hypertension,” Apr. 01, 2024, Lippincott Williams and Wilkins. doi: 10.1161/HYPERTENSIONAHA.124.19468.
[8] M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, “Analyzing the impact of feature selection on the accuracy of heart disease prediction,” Healthcare Analytics, vol. 2, Nov. 2022, doi: 10.1016/j.health.2022.100060.
[9] S. H. Hwang et al., “Machine Learning–Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan,” J. Med. Internet Res., vol. 26, 2024, doi: 10.2196/52794.
[10] E. M. Senan, I. Abunadi, M. E. Jadhav, and S. M. Fati, “Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms,” Comput. Math. Methods Med., vol. 2021, 2021, doi: 10.1155/2021/8500314.
[11] B. Zhang, Z. Wang, H. Li, Z. Lei, J. Cheng, and S. Gao, “Information gain-based multi-objective evolutionary algorithm for feature selection,” Inf. Sci. (N. Y)., vol. 677, Aug. 2024, doi: 10.1016/j.ins.2024.120901.
[12] P. Bhat and K. Dutta, “A multi-tiered feature selection model for android malware detection based on Feature discrimination and Information Gain,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 10, pp. 9464–9477, Nov. 2022, doi: 10.1016/j.jksuci.2021.11.004.
[13] S. Sreekumari, R. Bhalla, and G. Singh, “Feature Selection and Model Evaluation for Heart Disease Prediction Using Ensemble Methods,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 1282–1295. doi: 10.1016/j.procs.2025.04.083.
[14] S. Sreekumari, R. Bhalla, and G. Singh, “Feature Selection and Model Evaluation for Heart Disease Prediction Using Ensemble Methods,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 1282–1295. doi: 10.1016/j.procs.2025.04.083.
[15] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” Journal of Applied Science and Technology Trends, vol. 1, no. 2, pp. 56–70, May 2020, doi: 10.38094/jastt1224.
[16] M. A. Mahant, P. Vidyullatha, and R. Scholar, “Framework for Child Healthcare System Using Random Forest,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 16, no. 6, 2025, doi: DOI:10.14569/IJACSA.2025.0160627.
[17] A. Wanhainen et al., “European Society for Vascular Surgery (ESVS) 2026 Clinical Practice Guidelines on the Management of Descending Thoracic and Thoraco-Abdominal Aortic Diseases,” European Journal of Vascular and Endovascular Surgery, Feb. 2025, doi: 10.1016/j.ejvs.2025.12.050.
[18] T. T. H. Tran et al., “A comprehensive review of clinical trials and Progress in stem cell therapies for advanced heart failure,” Dec. 01, 2025, Japanese Society of Regenerative Medicine. doi: 10.1016/j.reth.2025.09.009.
[19] I. Wardhana, M. Ariawijaya, V. A. Isnaini, and R. P. Wirman, “Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering,” Jurnal Resti, vol. 6, no. 1, pp. 92–99, 2021, doi: 10.29207/resti.v6i1.3682.
[20] M. A. Nematollahi et al., “Body composition predicts hypertension using machine learning methods: a cohort study,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-34127-6.
[21] S. Kapoor and A. Narayanan, “Leakage and the reproducibility crisis in machine-learning-based science,” Patterns, vol. 4, no. 9, Sep. 2023, doi: 10.1016/j.patter.2023.100804.
[22] M. A. Bouke and A. Abdullah, “An empirical assessment of ML models for 5G network intrusion detection: A data leakage-free approach,” e-Prime - Advances in Electrical Engineering, Electronics and Energy, vol. 8, Jun. 2024, doi: 10.1016/j.prime.2024.100590.
[23] L. Hao, “Test Scenario Design and Optimization of Automated Driving Lane Keeping System Based on PCA and Intelligent Algorithm,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 237–246. doi: 10.1016/j.procs.2025.04.194.
[24] A. C. Ott, J. Kronsteiner, L. Schwarzmeier, E. Theil, A. R. Arnoldt, and N. P. Papenberg, “Evaluation of a clustering algorithm for texture data,” Mater. Charact., vol. 225, Jul. 2025, doi: 10.1016/j.matchar.2025.115122.
[25] Y. Wang et al., “Accurate organ segmentation and phenotype extraction of tomato plants based on deep learning and clustering algorithm,” Smart Agricultural Technology, vol. 12, Aug. 2025, doi: 10.1016/j.atech.2025.101334.
[26] S. M. F. D. S. Mustapha and P. Gupta, “DBSCAN inspired task scheduling algorithm for cloud infrastructure,” Internet of Things and Cyber-Physical Systems, vol. 4, pp. 32–39, Jan. 2024, doi: 10.1016/j.iotcps.2023.07.001.
[27] R. Ma, J. Sha, S. Zhang, D. Zhu, W. Kang, and J. Liu, “Fast grouping fusion method of dual carbon monitoring data based on DBSCAN clustering algorithm,” Results in Engineering, vol. 26, Jun. 2025, doi: 10.1016/j.rineng.2025.105057.
[28] A. Koyalil and S. Rajalingam, “Enhanced multi-level K-means clustering and cluster head selection using a modernized pufferfish optimization algorithm for lifetime maximization in wireless sensor networks,” Results in Engineering, vol. 27, Sep. 2025, doi: 10.1016/j.rineng.2025.105836.
[29] H. Zenil et al., “Minimal algorithmic information loss methods for dimension reduction, feature selection and network sparsification,” Inf. Sci. (N. Y)., vol. 720, Dec. 2025, doi: 10.1016/j.ins.2025.122520.
[30] B. Yarahmadi and S. M. Hashemianzadeh, “Determining the quality of imprinted polymers using diverse feature selections methods, Ada Boost and Gradient boosting algorithms,” Results in Materials, vol. 27, Sep. 2025, doi: 10.1016/j.rinma.2025.100722.
[31] S. R. Lingaya, B. D. Gerardo, and R. P. Medina, “Modified Graph-theoretic Clustering Algorithm for Mining International Linkages of Philippine Higher Education Institutions,” 2019. doi: DOI:10.14569/IJACSA.2019.0100613.
[32] N. Hidayat, R. Wardoyo, U. Gadjah Mada, I. S. Azhari, and H. Dwi Surjono, “Enhanced Performance of the Automatic Learning Style Detection Model using a Combination of Modified K-Means Algorithm and Naive Bayesian,” 2020. [Online]. Available: www.ijacsa.thesai.org
[33] R. Hicham, L. Abdallah, and M. Moahmed, “A Hybrid Machine Learning Approach for Continuous Risk Management in Business Process Reengineering Projects,” 2024. doi: DOI:10.14569/IJACSA.2024.0151240.
[34] E. Bisong, N. Jibril, P. Premnath, E. Buligwa, G. Oboh, and A. Chukwuma, “Predicting high blood pressure using machine learning models in low- and middle-income countries,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12911-024-02634-9.
[35] J. Pardede and R. Dwianto, “The Effect of Feature Selection on Machine Learning Classification,” International Journal on Informatics Visualization, vol. 9, no. 4, pp. 1419–1429, Jul. 2025, doi: 10.62527/joiv.9.4.2926.
[36] R. Bertolini, S. J. Finch, and R. H. Nehm, “Enhancing data pipelines for forecasting student performance: integrating feature selection with cross-validation,” International Journal of Educational Technology in Higher Education, vol. 18, no. 1, Dec. 2021, doi: 10.1186/s41239-021-00279-6.
[37] D.- Andriansyah and Eka Wulansari Fridayanthie, “Optimization of Support Vector Machine and XGBoost Methods Using Feature Selection to Improve Classification Performance,” JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING, vol. 6, no. 2, pp. 484–493, Jan. 2023, doi: 10.31289/jite.v6i2.8373.
[38] N. Sadrekarimi, S. Talatahari, B. F. Azar, and A. H. Gandomi, “A surrogate merit function developed for structural weight optimization problems,” in Soft Computing, vol. 27, no. 3, Springer Science and Business Media Deutschland GmbH, 2023, pp. 1533–1563. doi: 10.1007/s00500-022-07453-6.
[39] T. Liu, Y. Lu, B. Zhu, and H. Zhao, “Clustering high-dimensional data via feature selection,” PMC Journal, vol. 79, no. 2, pp. 940–50, Jun. 2023, doi: 10.7910/DVN/DHLRSI.
[40] S. Gupta and A. Chug, “A feature selection strategy for improving software maintainability prediction,” Intelligent Data Analysis, vol. 26, no. 2, pp. 311–344, 2022, doi: 10.3233/IDA-215825.
[41] E. Pashaei and E. Pashaei, “An efficient binary chimp optimization algorithm for feature selection in biomedical data classification,” Neural Comput. Appl., vol. 34, no. 8, pp. 6427–6451, Apr. 2022, doi: 10.1007/s00521-021-06775-0.
[42] Y. Yang, Y. Li, R. Chen, J. Zheng, Y. Cai, and G. Fortino, “Risk Prediction of Renal Failure for Chronic Disease Population Based on Electronic Health Record Big Data,” Big Data Research, vol. 25, Jul. 2021, doi: 10.1016/j.bdr.2021.100234.
[43] Y. Duan et al., “Development and validation of a stroke risk prediction model using regional healthcare big data and machine learning,” Int. J. Nurs. Sci., vol. 12, no. 6, pp. 558–565, Nov. 2025, doi: 10.1016/j.ijnss.2025.10.011.
[44] D. Amaratunga, J. Cabrera, D. Sargsyan, J. B. Kostis, S. Zinonos, and W. J. Kostis, “Uses and opportunities for machine learning in hypertension research,” Int. J. Cardiol. Hypertens., vol. 5, Jun. 2020, doi: 10.1016/j.ijchy.2020.100027.
[45] L. Jiang et al., “Diabetes risk prediction model based on community follow-up data using machine learning,” Prev. Med. Rep., vol. 35, Oct. 2023, doi: 10.1016/j.pmedr.2023.102358.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














