Hypertension Classification Using Correlation-Based Feature Selection (CFS) with Random Forest, XGBoost, and Support Vector Machine: A Comparative Study on Indonesian Hospital Data

Authors

  • Faradillah Universitas Indo Global Mandiri, Indonesia
  • Herri Setiawan Universitas Indo Global Mandiri, Indonesia
  • M Fadhiel Alie Universitas Indo Global Mandiri, Indonesia
  • Atthiyah Gisca Ahsya Andalas University, Indonesia
Pages Icon

DOI:

https://doi.org/10.63158/journalisi.v8i3.1646

Keywords:

Hypertension Classification, Correlation-Based Feature Selection, Label Leakage, Clinical Data Mining, Machine Learning

Abstract

Hypertension is a major global health problem that significantly contributes to cardiovascular disease and mortality. This study evaluates the performance of Random Forest, XGBoost, and Support Vector Machine (SVM) algorithms integrated with Correlation-Based Feature Selection (CFS) for hypertension classification using hospital clinical data. The dataset comprises 500 clinical records containing demographic and physiological variables. CFS was applied to reduce irrelevant and redundant attributes before model training. Model performance was assessed using accuracy, precision, recall, F1-score, and AUC-ROC through 10-fold cross-validation. Statistical significance was examined using the Friedman test followed by the Wilcoxon signed-rank test with Bonferroni correction. The results show that CFS improved classification performance across all models by approximately 5–6%. XGBoost achieved the best performance with 93.5% accuracy and 0.96 AUC, followed by Random Forest and SVM. However, systolic and diastolic blood pressure, which define the hypertension label, were retained as predictors, indicating a diagnostic classification design rather than independent risk prediction. Therefore, the findings should be interpreted as dataset-based hypertension classification, not future hypertension risk prediction.

Downloads

Download data is not yet available.

References

[1] S. Datta et al., “Predicting hypertension onset from longitudinal electronic health records with deep learning,” JAMIA Open, vol. 5, no. 4, Dec. 2022, doi: 10.1093/jamiaopen/ooac097.

[2] Z. Noroozi, A. Orooji, and L. Erfannia, “Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-49962-w.

[3] J. Du et al., “Developing a hypertension visualization risk prediction system utilizing machine learning and health check-up data,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-46281-y.

[4] H. Zhao et al., “Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method,” Front. Public Health, vol. 9, Sep. 2021, doi: 10.3389/fpubh.2021.619429.

[5] A. Shrivastava, M. Chakkaravarthy, and M. A. Shah, “A new machine learning method for predicting systolic and diastolic blood pressure using clinical characteristics,” Healthcare Analytics, vol. 4, Dec. 2023, doi: 10.1016/j.health.2023.100219.

[6] J. S. Cho and J. H. Park, “Application of artificial intelligence in hypertension,” Dec. 01, 2024, BioMed Central Ltd. doi: 10.1186/s40885-024-00266-9.

[7] A. T. Layton, “AI, Machine Learning, and ChatGPT in Hypertension,” Apr. 01, 2024, Lippincott Williams and Wilkins. doi: 10.1161/HYPERTENSIONAHA.124.19468.

[8] M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev, “Analyzing the impact of feature selection on the accuracy of heart disease prediction,” Healthcare Analytics, vol. 2, Nov. 2022, doi: 10.1016/j.health.2022.100060.

[9] S. H. Hwang et al., “Machine Learning–Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan,” J. Med. Internet Res., vol. 26, 2024, doi: 10.2196/52794.

[10] E. M. Senan, I. Abunadi, M. E. Jadhav, and S. M. Fati, “Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms,” Comput. Math. Methods Med., vol. 2021, 2021, doi: 10.1155/2021/8500314.

[11] B. Zhang, Z. Wang, H. Li, Z. Lei, J. Cheng, and S. Gao, “Information gain-based multi-objective evolutionary algorithm for feature selection,” Inf. Sci. (N. Y)., vol. 677, Aug. 2024, doi: 10.1016/j.ins.2024.120901.

[12] P. Bhat and K. Dutta, “A multi-tiered feature selection model for android malware detection based on Feature discrimination and Information Gain,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 10, pp. 9464–9477, Nov. 2022, doi: 10.1016/j.jksuci.2021.11.004.

[13] S. Sreekumari, R. Bhalla, and G. Singh, “Feature Selection and Model Evaluation for Heart Disease Prediction Using Ensemble Methods,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 1282–1295. doi: 10.1016/j.procs.2025.04.083.

[14] S. Sreekumari, R. Bhalla, and G. Singh, “Feature Selection and Model Evaluation for Heart Disease Prediction Using Ensemble Methods,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 1282–1295. doi: 10.1016/j.procs.2025.04.083.

[15] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, and J. Saeed, “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction,” Journal of Applied Science and Technology Trends, vol. 1, no. 2, pp. 56–70, May 2020, doi: 10.38094/jastt1224.

[16] M. A. Mahant, P. Vidyullatha, and R. Scholar, “Framework for Child Healthcare System Using Random Forest,” IJACSA) International Journal of Advanced Computer Science and Applications, vol. 16, no. 6, 2025, doi: DOI:10.14569/IJACSA.2025.0160627.

[17] A. Wanhainen et al., “European Society for Vascular Surgery (ESVS) 2026 Clinical Practice Guidelines on the Management of Descending Thoracic and Thoraco-Abdominal Aortic Diseases,” European Journal of Vascular and Endovascular Surgery, Feb. 2025, doi: 10.1016/j.ejvs.2025.12.050.

[18] T. T. H. Tran et al., “A comprehensive review of clinical trials and Progress in stem cell therapies for advanced heart failure,” Dec. 01, 2025, Japanese Society of Regenerative Medicine. doi: 10.1016/j.reth.2025.09.009.

[19] I. Wardhana, M. Ariawijaya, V. A. Isnaini, and R. P. Wirman, “Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering,” Jurnal Resti, vol. 6, no. 1, pp. 92–99, 2021, doi: 10.29207/resti.v6i1.3682.

[20] M. A. Nematollahi et al., “Body composition predicts hypertension using machine learning methods: a cohort study,” Sci. Rep., vol. 13, no. 1, Dec. 2023, doi: 10.1038/s41598-023-34127-6.

[21] S. Kapoor and A. Narayanan, “Leakage and the reproducibility crisis in machine-learning-based science,” Patterns, vol. 4, no. 9, Sep. 2023, doi: 10.1016/j.patter.2023.100804.

[22] M. A. Bouke and A. Abdullah, “An empirical assessment of ML models for 5G network intrusion detection: A data leakage-free approach,” e-Prime - Advances in Electrical Engineering, Electronics and Energy, vol. 8, Jun. 2024, doi: 10.1016/j.prime.2024.100590.

[23] L. Hao, “Test Scenario Design and Optimization of Automated Driving Lane Keeping System Based on PCA and Intelligent Algorithm,” in Procedia Computer Science, Elsevier B.V., 2025, pp. 237–246. doi: 10.1016/j.procs.2025.04.194.

[24] A. C. Ott, J. Kronsteiner, L. Schwarzmeier, E. Theil, A. R. Arnoldt, and N. P. Papenberg, “Evaluation of a clustering algorithm for texture data,” Mater. Charact., vol. 225, Jul. 2025, doi: 10.1016/j.matchar.2025.115122.

[25] Y. Wang et al., “Accurate organ segmentation and phenotype extraction of tomato plants based on deep learning and clustering algorithm,” Smart Agricultural Technology, vol. 12, Aug. 2025, doi: 10.1016/j.atech.2025.101334.

[26] S. M. F. D. S. Mustapha and P. Gupta, “DBSCAN inspired task scheduling algorithm for cloud infrastructure,” Internet of Things and Cyber-Physical Systems, vol. 4, pp. 32–39, Jan. 2024, doi: 10.1016/j.iotcps.2023.07.001.

[27] R. Ma, J. Sha, S. Zhang, D. Zhu, W. Kang, and J. Liu, “Fast grouping fusion method of dual carbon monitoring data based on DBSCAN clustering algorithm,” Results in Engineering, vol. 26, Jun. 2025, doi: 10.1016/j.rineng.2025.105057.

[28] A. Koyalil and S. Rajalingam, “Enhanced multi-level K-means clustering and cluster head selection using a modernized pufferfish optimization algorithm for lifetime maximization in wireless sensor networks,” Results in Engineering, vol. 27, Sep. 2025, doi: 10.1016/j.rineng.2025.105836.

[29] H. Zenil et al., “Minimal algorithmic information loss methods for dimension reduction, feature selection and network sparsification,” Inf. Sci. (N. Y)., vol. 720, Dec. 2025, doi: 10.1016/j.ins.2025.122520.

[30] B. Yarahmadi and S. M. Hashemianzadeh, “Determining the quality of imprinted polymers using diverse feature selections methods, Ada Boost and Gradient boosting algorithms,” Results in Materials, vol. 27, Sep. 2025, doi: 10.1016/j.rinma.2025.100722.

[31] S. R. Lingaya, B. D. Gerardo, and R. P. Medina, “Modified Graph-theoretic Clustering Algorithm for Mining International Linkages of Philippine Higher Education Institutions,” 2019. doi: DOI:10.14569/IJACSA.2019.0100613.

[32] N. Hidayat, R. Wardoyo, U. Gadjah Mada, I. S. Azhari, and H. Dwi Surjono, “Enhanced Performance of the Automatic Learning Style Detection Model using a Combination of Modified K-Means Algorithm and Naive Bayesian,” 2020. [Online]. Available: www.ijacsa.thesai.org

[33] R. Hicham, L. Abdallah, and M. Moahmed, “A Hybrid Machine Learning Approach for Continuous Risk Management in Business Process Reengineering Projects,” 2024. doi: DOI:10.14569/IJACSA.2024.0151240.

[34] E. Bisong, N. Jibril, P. Premnath, E. Buligwa, G. Oboh, and A. Chukwuma, “Predicting high blood pressure using machine learning models in low- and middle-income countries,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12911-024-02634-9.

[35] J. Pardede and R. Dwianto, “The Effect of Feature Selection on Machine Learning Classification,” International Journal on Informatics Visualization, vol. 9, no. 4, pp. 1419–1429, Jul. 2025, doi: 10.62527/joiv.9.4.2926.

[36] R. Bertolini, S. J. Finch, and R. H. Nehm, “Enhancing data pipelines for forecasting student performance: integrating feature selection with cross-validation,” International Journal of Educational Technology in Higher Education, vol. 18, no. 1, Dec. 2021, doi: 10.1186/s41239-021-00279-6.

[37] D.- Andriansyah and Eka Wulansari Fridayanthie, “Optimization of Support Vector Machine and XGBoost Methods Using Feature Selection to Improve Classification Performance,” JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING, vol. 6, no. 2, pp. 484–493, Jan. 2023, doi: 10.31289/jite.v6i2.8373.

[38] N. Sadrekarimi, S. Talatahari, B. F. Azar, and A. H. Gandomi, “A surrogate merit function developed for structural weight optimization problems,” in Soft Computing, vol. 27, no. 3, Springer Science and Business Media Deutschland GmbH, 2023, pp. 1533–1563. doi: 10.1007/s00500-022-07453-6.

[39] T. Liu, Y. Lu, B. Zhu, and H. Zhao, “Clustering high-dimensional data via feature selection,” PMC Journal, vol. 79, no. 2, pp. 940–50, Jun. 2023, doi: 10.7910/DVN/DHLRSI.

[40] S. Gupta and A. Chug, “A feature selection strategy for improving software maintainability prediction,” Intelligent Data Analysis, vol. 26, no. 2, pp. 311–344, 2022, doi: 10.3233/IDA-215825.

[41] E. Pashaei and E. Pashaei, “An efficient binary chimp optimization algorithm for feature selection in biomedical data classification,” Neural Comput. Appl., vol. 34, no. 8, pp. 6427–6451, Apr. 2022, doi: 10.1007/s00521-021-06775-0.

[42] Y. Yang, Y. Li, R. Chen, J. Zheng, Y. Cai, and G. Fortino, “Risk Prediction of Renal Failure for Chronic Disease Population Based on Electronic Health Record Big Data,” Big Data Research, vol. 25, Jul. 2021, doi: 10.1016/j.bdr.2021.100234.

[43] Y. Duan et al., “Development and validation of a stroke risk prediction model using regional healthcare big data and machine learning,” Int. J. Nurs. Sci., vol. 12, no. 6, pp. 558–565, Nov. 2025, doi: 10.1016/j.ijnss.2025.10.011.

[44] D. Amaratunga, J. Cabrera, D. Sargsyan, J. B. Kostis, S. Zinonos, and W. J. Kostis, “Uses and opportunities for machine learning in hypertension research,” Int. J. Cardiol. Hypertens., vol. 5, Jun. 2020, doi: 10.1016/j.ijchy.2020.100027.

[45] L. Jiang et al., “Diabetes risk prediction model based on community follow-up data using machine learning,” Prev. Med. Rep., vol. 35, Oct. 2023, doi: 10.1016/j.pmedr.2023.102358.

Downloads

Published

2026-06-22

Issue

Section

Articles

Most read articles by the same author(s)