Optimizing Stroke Prediction Using Backward Elimination and SMOTE with C4.5 and K-Nearest Neighbors
DOI:
https://doi.org/10.63158/journalisi.v8i2.1521Keywords:
stroke prediction, SMOTE, class imbalance, C4.5, K-nearest neighbors, backward elimination, feature selectionAbstract
Early prediction of stroke risk is crucial for reducing mortality and the burden on the healthcare system, but class imbalance and irrelevant features often compromise model reliability. This study analyzes the impact of Backward Elimination and SMOTE on the performance of the C4.5 and K-NN algorithms in stroke prediction. The study used a fixed working subset of 1,239 data points and evaluated four modeling scenarios using Stratified 10-Fold Cross Validation. Model performance was measured using accuracy, precision, recall, F1-score, and AUC. The results showed that Backward Elimination improved model performance on the analyzed subsets. For C4.5, accuracy increased from 70.94% to 73.05%, stroke recall from 83.94% to 85.14%, and AUC from 0.776 to 0.806. For K-NN, accuracy increased from 72.31% to 74.82% and precision from 39.91% to 42.73%, while stroke recall remained relatively stable at 74.30%. These findings indicate that although the improvements are small numerically, the results remain practically relevant as they enhance the balance between sensitivity and class discrimination capability. In the context of stroke screening, reducing false negatives is more important because it helps minimize undetected high-risk cases, although false positives still need to be considered as a consequence of further testing. Overall, C4.5 with Backward Elimination demonstrates more balanced performance, although the results are still limited to the analyzed subset.
Downloads
References
[1] V. L. Feigin et al., “World Stroke Organization (WSO): Global Stroke Fact Sheet 2022,” International Journal of Stroke, vol. 17, no. 1, pp. 18–29, Jan. 2022, doi: 10.1177/17474930211065917.
[2] W. Heseltine-Carp et al., “Machine learning to predict stroke risk from routine hospital data: A systematic review,” Int. J. Med. Inform., vol. 196, no. January, p. 105811, 2025, doi: 10.1016/j.ijmedinf.2025.105811.
[3] F. Asadi, M. Rahimi, A. H. Daeechini, and A. Paghe, “The most efficient machine learning algorithms in stroke prediction: A systematic review,” Health Sci. Rep., vol. 7, no. 10, 2024, doi: 10.1002/hsr2.70062.
[4] T. Vu et al., “Machine Learning Approaches for Stroke Risk Prediction: Findings from the Suita Study,” J. Cardiovasc. Dev. Dis., vol. 11, no. 7, 2024, doi: 10.3390/jcdd11070207.
[5] P. Chakraborty et al., “Predicting stroke occurrences: a stacked machine learning approach with feature selection and data preprocessing,” BMC Bioinformatics, vol. 25, no. 1, pp. 1–23, 2024, doi: 10.1186/s12859-024-05866-8.
[6] J. Zhu et al., “Processing imbalanced medical data at the data level with assisted-reproduction data as an example,” BioData Min., vol. 17, no. 1, 2024, doi: 10.1186/s13040-024-00384-y.
[7] F. Fadmadika, H. H. Handayani, T. Al Mudzakir, and J. Indra, “Pengaruh Smote Terhadap Performa Algoritma Random Forest Dan Algoritma Gradient Boosting Dalam Memprediksi Penyakit Stroke,” Jurnal Teknik Informasi dan Komputer (Tekinkom), vol. 7, no. 2, p. 837, Dec. 2024, doi: 10.37600/tekinkom.v7i2.1575.
[8] A. Fernández, S. García, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” Journal of Artificial Intelligence Research, vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.
[9] Z. Khairi, R. Yanti, T. A. Fitri, and E. Fatdha, “Optimasi Algoritma Knn Menggunakan Smote Untuk Prediksi Stroke,” Jurnal Algoritma, vol. 22, no. 2, pp. 164–175, Nov. 2025, doi: 10.33364/algoritma/v.22-2.2474.
[10] F. Nabila, I. Afrianty, S. Sanjaya, and F. Syafria, “Implementasi Algoritma C4.5 dalam Melakukan Klasifikasi Penyakit Stroke Otak,” Jurnal Informatika Universitas Pamulang, vol. 8, no. 2, pp. 229–235, 2023, doi: 10.32493/informatika.v8i2.31361.
[11] A. Gupta et al., “Predicting stroke risk: An effective stroke prediction model based on neural networks,” Journal of Neurorestoratology, vol. 13, no. 1, p. 100156, 2025, doi: 10.1016/j.jnrt.2024.100156.
[12] Indah Werdiningsih et al., “Analisis Prediksi Stroke Menggunakan Pendekatan Decision Tree dengan Seleksi Fitur dan Neural Network,” Jurnal Sistem Cerdas, vol. 6, no. 3, pp. 213–221, Dec. 2023, doi: 10.37396/jsc.v6i3.310.
[13] K. Moulaei, L. Afshari, R. Moulaei, B. Sabet, S. M. Mousavi, and M. R. Afrash, “Explainable artificial intelligence for stroke prediction through comparison of deep learning and machine learning models.,” Sci. Rep., vol. 14, no. 1, p. 31392, Dec. 2024, doi: 10.1038/s41598-024-82931-5.
[14] P. Eini, M. Rezayee, M. Kassulke, and J. Tremblay, “Efficacy and comparative performance of machine learning models for stroke risk prediction in hypertensive patients: A systematic review and meta-analysis,” International Journal of Cardiology Cardiovascular Risk and Prevention, vol. 28, no. October 2025, p. 200564, 2026, doi: 10.1016/j.ijcrp.2025.200564.
[15] B. Van Calster et al., “Evaluation of performance measures in predictive artificial intelligence models to support medical decisions: overview and guidance,” Lancet Digit. Health, vol. 7, no. 12, p. 100916, 2025, doi: 10.1016/j.landig.2025.100916.
[16] K. M. Sujon, R. Hassan, K. Choi, and M. A. Samad, “Accuracy, precision, recall, f1-score, or MCC? empirical evidence from advanced statistics, ML, and XAI for evaluating business predictive models,” J. Big Data, vol. 12, no. 1, 2025, doi: 10.1186/s40537-025-01313-4.
[17] M. Liu et al., “Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques,” Artif. Intell. Med., vol. 142, Aug. 2023, doi: 10.1016/j.artmed.2023.102587.
[18] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. S. Philip Kegelmeyer, “synthetic minority over-sampling Technique,” J Artif Intell Res, vol. 16, p. 16, 2018.
[19] D. Patel, A. Saxena, and J. Wang, “A Machine Learning-Based Wrapper Method for Feature Selection,” International Journal of Data Warehousing and Mining, vol. 20, no. 1, pp. 1–33, 2024, doi: 10.4018/IJDWM.352041.
[20] D. Zhang, N. Yu, X. Yang, Y. De Marinis, Z. P. Liu, and R. Gao, “SRPNet: stroke risk prediction based on two-level feature selection and deep fusion network,” Front. Physiol., vol. 15, no. November, pp. 1–13, 2024, doi: 10.3389/fphys.2024.1357123.
[21] M. E. Klontzas et al., “ESR Essentials: common performance metrics in AI—practice recommendations by the European Society of Medical Imaging Informatics,” Eur. Radiol., pp. 1528–1540, 2025, doi: 10.1007/s00330-025-11890-w.
[22] I. Aiyer, L. Shaik, A. Sheta, and S. Surani, “Review of Application of Machine Learning as a Screening Tool for Diagnosis of Obstructive Sleep Apnea,” Medicina (Lithuania), vol. 58, no. 11, 2022, doi: 10.3390/medicina58111574.
[23] M. Goyal et al., “A bayesian framework to optimize performance of pre-hospital stroke triage scales,” J. Stroke, vol. 23, no. 3, pp. 443–448, 2021, doi: 10.5853/jos.2021.01312.
[24] S. Patil, R. Rossi, D. Jabrah, and K. Doyle, “Detection, Diagnosis and Treatment of Acute Ischemic Stroke: Current and Future Perspectives,” Front. Med. Technol., vol. 4, no. June, 2022, doi: 10.3389/fmedt.2022.748949.
[25] M. Jacobs, N. Hammarlund, E. Evans, and C. Ellis, “Identifying predictors of stroke in young adults: a machine learning analysis of sex-specific risk factors,” Frontiers in Stroke, vol. 3, no. Ml, 2024, doi: 10.3389/fstro.2024.1488313.
[26] A. A. Soladoye, N. Aderinto, M. R. Popoola, I. A. Adeyanju, A. Osonuga, and D. B. Olawade, “Machine learning techniques for stroke prediction: A systematic review of algorithms, datasets, and regional gaps,” Int. J. Med. Inform., vol. 203, no. June, p. 106041, 2025, doi: 10.1016/j.ijmedinf.2025.106041.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














