Comparative Analysis of KNN and Decision Tree Classification Algorithms for Early Stroke Prediction: A Machine Learning Approach
Abstract
Stroke is the second most deadly disease in the world and the third leading cause of disability. However, most deaths due to stroke can be prevented by recognizing the symptoms of stroke and taking preventive measures using information technology. Therefore, this research utilizes the role of information technology using a machine learning approach to predict stroke in a person using the K-Nearest Neighbor and Decision Tree classification methods. The two algorithms were compared to determine which algorithm was more effective in predicting stroke. Data analysis using the CRISP-DM approach was carried out using a dataset containing 5110 observations with 12 relevant attributes. Implementation of Exploratory Data Analysis (EDA) was also carried out for preprocessing, and oversampling techniques were applied to overcome the problem of unbalanced classes. The research results show that the predictive model with the highest level of accuracy was obtained at around 97.1845% using the K-Nearest Neighbor algorithm. This research makes a significant contribution to stroke prevention efforts through the use of information technology and machine learning algorithms for early identification of stroke risk.
Downloads
References
V. Plotnikova, M. Dumas, and F. Milani, “Adaptations of data mining methodologies: A systematic literature review,” PeerJ Comput. Sci., vol. 6, pp. 1–43, 2020, doi: 10.7717/PEERJ-CS.267.
F. Martinez-Plumed et al., “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 8, pp. 3048–3061, 2021, doi: 10.1109/TKDE.2019.2962680.
J. S. Saltz and I. Krasteva, “Current approaches for executing big data science projects—a systematic literature review,” PeerJ Comput. Sci., vol. 8, pp. 1–24, 2022, doi: 10.7717/PEERJ-CS.862.
D. Singh and B. Singh, “Investigating the impact of data normalization on classification performance,” Appl. Soft Comput., vol. 97, p. 105524, Dec. 2020, doi: 10.1016/j.asoc.2019.105524.
M. Kubat, An Introduction to Machine Learning. 2021. doi: 10.1007/978-3-030-81935-4.
D. Chopra and R. Khurana, Introduction to Machine Learning with Python. Singapore: Bentham Science, 2023.
X.-S. Yang, Introduction to Algorithms for Data Mining and Machine Learning. Candice Janco, 2019.
G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML Classification Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 6, pp. 539–545, 2021, doi: 10.14569/IJACSA.2021.0120662.
M. Daidone, S. Ferrantelli, A. Tuttolomondo, M. Daidone, and M. Daidone, “Machine learning applications in stroke medicine: Advancements, challenges, and future prospective,” Neural Regen. Res., vol. 19, no. 4, pp. 769–773, 2024, doi: 10.4103/1673-5374.382228.
M. S. Sirsat, E. Fermé, and J. Câmara, “Machine Learning for Brain Stroke: A Review,” J. Stroke Cerebrovasc. Dis., vol. 29, no. 10, 2020, doi: 10.1016/j.jstrokecerebrovasdis.2020.105162.
D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020, doi: 10.1186/s12864-019-6413-7.
A. Viloria, O. B. Pineda Lezama, and N. Mercado-Caruzo, “Unbalanced data processing using oversampling: Machine Learning,” Procedia Comput. Sci., vol. 175, pp. 108–113, 2020, doi: 10.1016/j.procs.2020.07.018.
C. Fernandez-Lozano et al., “Random forest-based prediction of stroke outcome,” Sci. Rep., vol. 11, no. 1, pp. 1–12, 2021, doi: 10.1038/s41598-021-89434-7.
S. K. Kwak and J. H. Kim, “Statistical data preparation: management of missing values and outliers,” Korean J. Anesthesiol., vol. 70, no. 4, p. 407, 2017, doi: 10.4097/kjae.2017.70.4.407.
T. Al-Shehari and R. A. Alsowail, “An Insider Data Leakage Detection Using One-Hot Encoding, Synthetic Minority Oversampling and Machine Learning Techniques,” Entropy, vol. 23, no. 10, p. 1258, Sep. 2021, doi: 10.3390/e23101258.
F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf. Sci. (Ny)., vol. 513, pp. 429–441, Mar. 2020, doi: 10.1016/j.ins.2019.11.004.
B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021, doi: 10.38094/jastt20165.
Download PDF: 456 times
Copyright (c) 2024 Journal of Information Systems and Informatics
This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)