An Empirical Comparison of C4.5, Naive Bayes, and KNN for Scholarship Selection

Authors

  • Burham Isnanto Institut Sains dan Bisnis Atma Luhur, Indonesia
  • Rahmat Sulaiman Institut Sains dan Bisnis Atma Luhur, Indonesia
Pages Icon

DOI:

https://doi.org/10.63158/journalisi.v8i3.1617

Keywords:

Scholarship Classification, Machine Learning, Comparative Benchmarking, Cross-Validation, Student Data Mining

Abstract

Scholarship selection is a critical process in higher education that requires objective, fair, and efficient evaluation of applicants based on academic and socio-economic criteria. However, manual assessment methods are often vulnerable to bias, inconsistency, and administrative inefficiencies, which may affect the transparency and quality of decision-making. This study compares the performance of three supervised machine learning algorithms—C4.5 Decision Tree, Naive Bayes, and K-Nearest Neighbor (KNN)—for scholarship recipient classification. The dataset consisted of 1,500 student records obtained from the KelasAI repository and included ten predictor attributes, namely Grade Point Average, Parental Income, Academic Semester, Family Dependents, Organizational Involvement, Academic Achievement, Regional Origin, Scholarship Type, National Examination Score, and Economic Status. The target variable was categorized into Accepted and Rejected classes. Experiments were conducted using RapidMiner Studio with 10-fold stratified cross-validation to ensure reliable model evaluation. The results showed that Naive Bayes achieved the best performance, with 81.6% accuracy, 81.8% precision, and 81.3% recall, outperforming C4.5 and KNN. These findings demonstrate the potential of machine learning to support more transparent and data-driven scholarship selection processes.

Downloads

Download data is not yet available.

References

[1] H. U. Khan, F. V. Espiritu, and M. C. B. Natividad, "A new framework for scholarship predictor using a machine learning approach," Intelligent Automation & Soft Computing, vol. 39, no. 5, pp. 949–964, 2024. doi: 10.32604/iasc.2024.058466.

[2] P. Valdiviezo-Diaz and J. Chicaiza, "Prediction of academic outcomes using machine learning techniques: A survey of findings on higher education," Communications in Computer and Information Science, vol. 2049, pp. 218–232, 2024. doi: 10.1007/978-3-031-58956-0_16.

[3] N. Sghir, A. Adadi, and M. Lahmer, "Recent advances in predictive learning analytics: A decade systematic review (2012–2022)," Education and Information Technologies, vol. 28, no. 7, pp. 8299–8333, 2023. doi: 10.1007/s10639-022-11536-0.

[4] P. Nayak, S. Vaheed, S. Gupta, and N. Mohan, “Predicting students’ academic performance by mining the educational data through machine learning-based classification model,” Education and Information Technologies, vol. 28, no. 11, pp. 14611–14637, Nov. 2023, doi: 10.1007/s10639-023-11706-8.

[5] E. Alhazmi and A. Sheneamer, "Early predicting of students performance in higher education," IEEE Access, vol. 11, pp. 27579–27589, 2023. doi: 10.1109/ACCESS.2023.3258083.

[6] V. Sheth, P. Ramteke, V. Saxena, and A. Kumar, "A comparative analysis of machine learning classification algorithms for binary classification," Procedia Computer Science, vol. 215, pp. 422–431, 2022. doi: 10.1016/j.procs.2022.12.044.

[7] M. Yagci, "Educational data mining: Prediction of students' academic performance using machine learning algorithms," Smart Learning Environments, vol. 9, no. 1, p. 11, 2022. doi: 10.1186/s40561-022-00192-z.

[8] Y. Alshamaila, I. Al-Shourbaji, A. Alam et al., "An intelligent rule-oriented framework for extracting key factors for grants scholarships in higher education," International Journal of Data and Network Science, vol. 8, no. 2, pp. 1325–1340, 2024. doi: 10.5267/j.ijdns.2023.11.002.

[9] H. Karalar, C. Kapucu, and H. Gurler, "Predicting students at risk of academic failure using ensemble model during pandemic in a distance learning system," International Journal of Educational Technology in Higher Education, vol. 18, no. 1, p. 63, 2021. doi: 10.1186/s41239-021-00300-y.

[10] B. Albreiki, N. Zaki, and H. Alashwal, "A systematic literature review of student performance prediction using machine learning techniques," Education Sciences, vol. 11, no. 9, p. 552, 2021. doi: 10.3390/educsci11090552.

[11] G. Brotosaputro, E. Helmud, and R. Sulaiman, “Comparative Accuracy of Prediction Classification Using Supervised Machine Learning,” in Proceedings of the 2025 7th International Conference on Cybernetics and Intelligent System (ICORIS), Mataram, Indonesia, 2025, pp. 1–6, doi: 10.1109/ICORIS67789.2025.11296063.

[12] R. Alamri and B. Alharbi, "Explainable student performance prediction models: A systematic review," IEEE Access, vol. 9, pp. 33132–33143, 2022. doi: 10.1109/ACCESS.2022.3061502.

[13] A. Tholib, M. N. F. Hidayat, S. Yono, R. Wulanningrum, and E. Daniati, "Comparison of C4.5 and Naive Bayes for predicting student graduation using machine learning algorithms," International Journal of Engineering and Computer Science Applications (IJECSA), vol. 2, no. 2, pp. 71–78, 2023. doi: 10.30812/ijecsa.v2i2.3364.

[14] N. A. Kushartanto and R. T. Aldisa, "Data mining perbandingan algoritma K-Nearest Neighbor dan Naive Bayes dalam prediksi penerimaan beasiswa," Journal of Computer System and Informatics (JoSYC), vol. 5, no. 1, pp. 196–207, 2023. doi: 10.47065/josyc.v5i1.4566.

[15] P. Ramadani, R. Fadillah, Q. Adawiyah, and B. R. Al Ghazali, "Perbandingan algoritma Naive Bayes, C4.5, dan K-Nearest Neighbor untuk klasifikasi kelayakan Program Keluarga Harapan," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 2, pp. 311–319, 2024. doi: 10.29207/resti.v8i2.5812.

[16] E. F. Wati, E. S. Perangin-Angin, and L. Indriyani, "Comparison of Naive Bayes and C4.5 methods with Particle Swarm Optimization on customer loyalty classification," IJISTECH (International Journal of Information System and Technology), vol. 8, no. 6, pp. 680–691, 2025. doi: 10.30645/ijistech.v8i6.382.

[17] V. Fitriyanti, G. Testiana, and C. E. Gunawan, "Klasifikasi predikat kelulusan mahasiswa menggunakan algoritma C4.5," Jurnal Saintekom: Sains, Teknologi, Komputer dan Manajemen, vol. 14, no. 2, pp. 217–232, 2024.

[18] F. Adiani, N. Fardiani, and F. Fitriyani, "Penerapan algoritma C4.5 untuk prediksi penerima beasiswa siswa berprestasi," JIKA (Jurnal Informatika), vol. 8, no. 4, pp. 465–474, 2024. doi: 10.31000/jika.v8i4.12117.

[19] N. T. Haryati, E. S. Negara, and T. B. Kurniawan, "Klasifikasi pemberian beasiswa berprestasi menggunakan perbandingan tiga algoritma," Jurnal TEKNOINFO, vol. 17, no. 1, pp. 145–152, 2023. doi: 10.33365/jti.v17i1.2423.

[20] A. Anwarudin, W. Andriyani, B. P. DP, and D. Kristomo, "The prediction on the students' graduation timeliness using Naive Bayes classification and K-Nearest Neighbor," Journal of Intelligent Software Systems, vol. 1, no. 1, pp. 75–88, 2022. doi: 10.26798/jiss.v1i1.597.

[21] W. I. Kurniawan and J. Triloka, "Application of Naive Bayes classifiers for family risk identification and stunting intervention planning," Journal of Applied Informatics and Computing, vol. 9, no. 5, pp. 1156–1165, 2025. doi: 10.30871/jaic.v9i5.9143.

[22] D. A. Shafiq, M. Marjani, R. A. A. Habeeb, and D. Asirvatham, "Student retention using educational data mining and predictive analytics: A systematic literature review," IEEE Access, vol. 10, pp. 72480–72503, 2022. doi: 10.1109/ACCESS.2022.3189214.

[23] M. B. Al-Zoubi, A. S. Al-Hashemi, and S. H. El-Gayar, "A review of educational data mining in higher education," International Journal of Advanced Computer Science and Applications, vol. 12, no. 5, pp. 458–467, 2021. doi: 10.14569/IJACSA.2021.0120652.

[24] S. Hussain and M. Q. Khan, "Student-Performulator: Predicting students' academic performance at secondary and intermediate level using machine learning," Annals of Data Science, vol. 10, no. 3, pp. 637–655, 2023. doi: 10.1007/s40745-021-00341-0.

[25] N. Aprilyani, I. Zulfa, and H. Syahputra, "Penerapan algoritma Decision Tree C4.5 untuk model penentuan penerima beasiswa Program Indonesia Pintar (PIP) studi kasus SMA Negeri 3 Timang Gajah," Jurnal Teknik Informatika dan Elektro, vol. 5, no. 1, pp. 23–34, 2022.

[26] B. Isnanto and R. Sulaiman, “Optimalisasi pembangunan desa: Prediksi kebutuhan intervensi ekonomi di Jawa Barat menggunakan algoritma machine learning,” Buffer Informatika, vol. 12, no. 1, pp. 80–86, 2026.

[27] M. B. Alqahtani and E. Alqahtani, "Educational data mining and predictive modeling in the age of artificial intelligence: An in-depth analysis of research dynamics," Computers, vol. 14, no. 2, p. 68, 2025. doi: 10.3390/computers14020068.

[28] S. Berutu, H. Budiati, J. Jatmika, and F. Gulo, "Data preprocessing approach for machine learning-based sentiment classification," Journal INFOTEL, vol. 15, no. 4, pp. 317–325, 2023. doi: 10.20895/infotel.v15i4.1030.

[29] M. Maharana, S. Mondal, and B. Nemade, "A review: Data pre-processing and data augmentation techniques," Global Transitions Proceedings, vol. 3, no. 1, pp. 253–260, 2022. doi: 10.1016/j.gltp.2022.04.020.

[30] K. Vujovic, "Classification model evaluation metrics," International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, pp. 599–606, 2021. doi: 10.14569/IJACSA.2021.0120670.

Downloads

Published

2026-06-22

Issue

Section

Articles

Most read articles by the same author(s)