Ensemble Learning for Pediatric Stunting Detection: A Comparative Study of XGBoost, Random Forest, and LightGBM with Oversampling Techniques

Tri Sugihartono; Djoko Soetarno; Rahmat Sulaiman; Sarwindah; Marini; Fitriyani

doi:10.63158/journalisi.v8i2.1568

Authors

Tri Sugihartono Institut Sains dan Bisnis Atma Luhur, Indonesia
Djoko Soetarno Binus University, Indonesia
Rahmat Sulaiman Institut Sains dan Bisnis Atma Luhur, Indonesia
Sarwindah Institut Sains dan Bisnis Atma Luhur, Indonesia
Marini Institut Sains dan Bisnis Atma Luhur, Indonesia
Fitriyani Institut Sains dan Bisnis Atma Luhur, Indonesia

DOI:

https://doi.org/10.63158/journalisi.v8i2.1568

Keywords:

Stunting Detection, Ensemble Learning, Imbalanced Classification, Oversampling, SMOTE

Abstract

Stunting, driven by chronic childhood malnutrition, remains a critical global public health concern. Early detection is persistently challenged by class imbalance in pediatric health datasets and the absence of systematic comparisons between oversampling strategies and ensemble classifiers. This study develops and evaluates an ensemble learning pipeline for stunting detection, benchmarking XGBoost, Random Forest, and LightGBM across five oversampling configurations — Original, SMOTE, ADASYN, Borderline-SMOTE, and SMOTE-ENN — using 10,000 pediatric health records from posyandu activities in Bangka Belitung Province, Indonesia. Seven anthropometric and demographic features were utilized, with stratified 80:20 train-test splitting and five-fold cross-validation. XGBoost with original imbalanced data achieved the highest Recall (0.9573) and a competitive F1-Score (0.9158), while LightGBM with SMOTE delivered the strongest balanced performance (F1-Score: 0.9160, ROC-AUC: 0.8431). SMOTE-ENN consistently underperformed across all classifiers. To our knowledge, this is the first study to simultaneously compare five oversampling strategies across three ensemble models within a unified framework, offering a foundation for high-sensitivity stunting surveillance in resource-constrained healthcare settings.

Downloads

Download data is not yet available.

References

[1] UNICEF, WHO, and World Bank, Levels and Trends in Child Malnutrition: UNICEF/WHO/World Bank Group Joint Child Malnutrition Estimates, Key Findings of the 2023 Edition. Geneva, Switzerland: World Health Organization, 2023.

[2] C. G. Victora et al., “Maternal and child undernutrition: Consequences for adult health and human capital,” Lancet, vol. 371, no. 9609, pp. 340–357, Jan. 2008, doi: 10.1016/S0140-6736(07)61692-4.

[3] Kementerian Kesehatan Republik Indonesia, Hasil Survei Status Gizi Indonesia (SSGI) Tahun 2022. Jakarta, Indonesia: Kemenkes RI, 2023.

[4] T. Vaivada, N. Akseer, S. Akseer, A. Somaskandan, M. Stefopulos, and Z. A. Bhutta, “Stunting in childhood: An overview of global burden, trends, determinants, and drivers of decline,” Am. J. Clin. Nutr., vol. 112, suppl. 2, pp. 777S–791S, Aug. 2020, doi: 10.1093/ajcn/nqaa159.

[5] A. T. Mulyani, M. A. Khairinisa, and A. Khatib, “Understanding stunting: Impact, causes, and strategy to accelerate stunting reduction—a narrative review,” Nutrients, vol. 17, no. 5, p. 879, Feb. 2025, doi: 10.3390/nu17050879.

[6] L. Swastina, B. Rahmatullah, A. Saad, and H. Khan, “A systematic review on research trends, datasets, algorithms, and frameworks of children’s nutritional status prediction,” IAES Int. J. Artif. Intell. (IJ-AI), vol. 13, no. 2, pp. 1868–1877, Jun. 2024, doi: 10.11591/ijai.v13.i2.pp1868-1877.

[7] N. Novalina, I. A. A. Tarigan, F. K. Kameela, and M. Rizkinia, “Benchmarking machine learning algorithm for stunting risk prediction in Indonesia,” Bull. Electr. Eng. Inform., vol. 14, no. 3, pp. 2252–2263, Jun. 2025, doi: 10.11591/eei.v14i3.8997.

[8] S. Ndagijimana, I. H. Kabano, E. Masabo, and J. M. Ntaganda, “Prediction of stunting among under-5 children in Rwanda using machine learning techniques,” J. Prev. Med. Public Health, vol. 56, no. 1, pp. 41–49, Jan. 2023, doi: 10.3961/jpmph.22.367.

[9] Y. S. Dewi, S. Hastuti, and M. Fatekurohman, “Analysis of stunting in East Java, Indonesia using random forest and geographically weighted random forest regression,” Braz. J. Biometr., vol. 42, no. 3, pp. 213–224, 2024, doi: 10.28951/bjb.v42i3.679.

[10] A. A. G. Y. Pramana, M. F. Maulana, M. C. Tirtayasa, and D. A. Tyas, “Enhancing early stunting detection: A novel approach using artificial intelligence with an integrated SMOTE algorithm and ensemble learning model,” in Proc. IEEE Conf. Artif. Intell. (CAI), Singapore, Jun. 2024, pp. 486–493, doi: 10.1109/CAI59869.2024.00098.

[11] T. Sugihartono, B. Wijaya, Marini, A. F. Alkayes, and H. A. Anugrah, “Optimizing stunting detection through SMOTE and machine learning: A comparative study of XGBoost, Random Forest, SVM, and k-NN,” J. Appl. Data Sci., vol. 6, no. 1, pp. 667–682, Jan. 2025, doi: 10.47738/jads.v6i1.494.

[12] M. A. Hamid and E. R. Subhiyakto, “Performance comparison of Random Forest, SVM, and XGBoost algorithms with SMOTE for stunting prediction,” J. Appl. Informat. Comput. (JAIC), vol. 9, no. 4, pp. 1163–1169, Aug. 2025, doi: 10.30871/jaic.v9i4.9701.

[13] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

[14] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” in Proc. Int. Conf. Intell. Comput. (ICIC), Hefei, China, Aug. 2005, pp. 878–887, doi: 10.1007/11538059_91.

[15] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in Proc. IEEE Int. Joint Conf. Neural Netw. (IJCNN), Hong Kong, China, Jun. 2008, pp. 1322–1328, doi: 10.1109/IJCNN.2008.4633969.

[16] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, Jun. 2004, doi: 10.1145/1007730.1007735.

[17] B. Krawczyk, “Learning from imbalanced data: Open challenges and future directions,” Prog. Artif. Intell., vol. 5, no. 4, pp. 221–232, Nov. 2016, doi: 10.1007/s13748-016-0094-0.

[18] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.

[19] R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), Montreal, Canada, Aug. 1995, pp. 1137–1143.

[20] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, CA, USA, Aug. 2016, pp. 785–794, doi: 10.1145/2939672.2939785.

[21] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

[22] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” in Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, USA, Dec. 2017, pp. 3146–3154.

[23] I. Tsamardinos, E. Greasidou, and G. Borboudakis, “Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation,” Mach. Learn., vol. 107, no. 12, pp. 1895–1922, 2018.

[24] G. Lemaître, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning,” J. Mach. Learn. Res., vol. 18, no. 17, pp. 1–5, 2017.

Ensemble Learning for Pediatric Stunting Detection: A Comparative Study of XGBoost, Random Forest, and LightGBM with Oversampling Techniques

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

publisher

sidebar

certificate

template

gs-citation

index

stat