Predicting Respiratory Conditions Using Random Forest and XGBoost

Dhiyaussalam Dhiyaussalam; Ahmad Yusuf; Isna Wardiah; Nitami Lestari Putri

doi:10.51519/journalisi.v7i2.1124

Authors

Dhiyaussalam Dhiyaussalam Politeknik Negeri Banjarmasin, Indonesia
Ahmad Yusuf Politeknik Negeri Banjarmasin, Indonesia
Isna Wardiah Politeknik Negeri Banjarmasin, Indonesia
Nitami Lestari Putri Politeknik Negeri Banjarmasin, Indonesia

DOI:

https://doi.org/10.51519/journalisi.v7i2.1124

Keywords:

Machine Learning, Random Forest, Respiratory Disease, Severity Classification, XGBoost

Abstract

This study examines the performance of Random Forest and XGBoost in predicting the diagnosis and severity of respiratory diseases using a simulated dataset of 2,000 patient records. The models were tested on two classification tasks: identifying disease types (e.g., pneumonia, influenza) and classifying severity levels (mild, moderate, severe). Both models achieved perfect accuracy in severity classification, with 1.0000 ± 0.0000 cross-validation scores, demonstrating strong stability under balanced class distributions. However, in the diagnosis task, Random Forest underperformed on minority classes, particularly pneumonia, with a recall of 0.18 and F1-score of 0.31. XGBoost, on the other hand, achieved superior results across all classes, including minority cases, with 0.9825 ± 0.0170 cross-validation accuracy and perfect test set performance. These findings highlight XGBoost’s robustness in handling imbalanced and multiclass medical data, making it a promising candidate for clinical decision support. Future work should address class imbalance and explore explainability techniques to improve trust and transparency in real-world applications.

Downloads

Download data is not yet available.

References

H. Zhu, J. Dong, X. Xie, and L. Wang, “Comparison between the molecular diagnostic test and chest X-ray combined with multi-slice spiral CT in the diagnosis of lobar pneumonia,” Cell. Mol. Biol., vol. 67, no. 3, pp. 129–132, 2021, doi: 10.14715/cmb/2021.67.3.18.

P. Zatovkaňuková and J. Slíva, “The potential dangers of whooping cough: a case of rib fracture and pneumothorax,” BMC Infect. Dis., vol. 24, no. 1, pp. 0–5, 2024, doi: 10.1186/s12879-024-10192-8.

N. Chen et al., “Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study,” Lancet, vol. 395, no. 10223, pp. 507–513, 2020, doi: 10.1016/S0140-6736(20)30211-7.

J. Czubak, K. Stolarczyk, A. Orzeł, M. Frączek, and T. Zatoński, “Comparison of the clinical differences between COVID-19, SARS, influenza, and the common cold: A systematic literature review.,” Adv. Clin. Exp. Med. Off. organ Wroclaw Med. Univ., vol. 30, no. 1, pp. 109–114, Jan. 2021, doi: 10.17219/acem/129573.

J. Qu, C. Yang, F. Bao, S. Chen, L. Gu, and B. Cao, “Epidemiological characterization of respiratory tract infections caused by Mycoplasma pneumoniae during epidemic and post-epidemic periods in North China, from 2011 to 2016,” BMC Infect. Dis., vol. 18, no. 1, pp. 1–8, 2018, doi: 10.1186/s12879-018-3250-2.

M. Oberoi, R. Kulkarni, and T. Oliver, “An Unusual Case of Myocarditis, Left Ventricular Thrombus, and Embolic Stroke Caused by Mycoplasma pneumoniae,” Cureus, vol. 13, no. 3, pp. 0–4, 2021, doi: 10.7759/cureus.14170.

T. A. Rowe et al., “Reliability of nonlocalizing signs and symptoms as indicators of the presence of infection in nursing-home residents,” Infect. Control Hosp. Epidemiol., vol. 43, no. 4, pp. 417–426, 2022, doi: 10.1017/ice.2020.1282.

L. Han, “Prediction of hepatocellular carcinoma and Edmondson-Steiner grade using an integrated workow of multiple machine learning algorithms,” 2023, [Online]. Available: https://doi.org/10.21203/rs.3.rs-2905568/v1

P. Jabbari, N. Taraghikhah, F. Jabbari, S. Ebrahimi, and N. Rezaei, “Body Mass Index as a Predictor of Symptom Duration in COVID-19 Outpatients,” Disaster Med. Public Health Prep., vol. 17, no. 6, 2023, doi: 10.1017/dmp.2022.185.

M. Esposito et al., “Depressive symptoms and insecure attachment predict disability and quality of life in psoriasis independently from disease severity,” Arch. Dermatol. Res., vol. 313, no. 6, pp. 431–437, 2021, doi: 10.1007/s00403-020-02116-8.

D. Meng, J. Xu, and J. Zhao, “Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and XGBoost,” PLoS One, vol. 16, no. 12 December, pp. 1–16, 2021, doi: 10.1371/journal.pone.0261629.

P. Yang and B. Yang, “Development and validation of predictive models for diabetic retinopathy using machine learning,” PLoS One, vol. 20, no. 2 February, pp. 1–13, 2025, doi: 10.1371/journal.pone.0318226.

Y. Han and S. Wang, “Disability risk prediction model based on machine learning among Chinese healthy older adults: results from the China Health and Retirement Longitudinal Study,” Front. Public Heal., vol. 11, 2023, doi: 10.3389/fpubh.2023.1271595.

N. Acharya, P. Kar, M. Ally, and J. Soar, “Predicting Co-Occurring Mental Health and Substance Use Disorders in Women: An Automated Machine Learning Approach,” Appl. Sci., vol. 14, no. 4, 2024, doi: 10.3390/app14041630.

Y. Xiao, Y. Chen, R. Huang, F. Jiang, J. Zhou, and T. Yang, “Interpretable machine learning in predicting drug-induced liver injury among tuberculosis patients: model development and validation study,” BMC Med. Res. Methodol., vol. 24, no. 1, pp. 1–17, 2024, doi: 10.1186/s12874-024-02214-5.

S. Albrecht et al., “Forecasting severe respiratory disease hospitalizations using machine learning algorithms,” BMC Med. Inform. Decis. Mak., vol. 0, 2024, doi: 10.1186/s12911-024-02702-0.

Y. Emre and Z. Ayd, “Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity,” pp. 1–26, 2023, doi: 10.7717/peerj.15552.

M. Shen, Jinzhi; Ke, “Review of Interpretable Machine Learning Models for Disease Prognosis”.

A. Mahajan, C. Kulkarni, and S. Mate, “Predicting Lung Disease Severity Via Image-Based Aqi Analysis Using Deep Learning Techniques,” pp. 1–11.

P. Yadav, V. Rastogi, A. Yadav, and P. Parashar, “Artificial Intelligence : A promising tool in diagnosis of respiratory diseases,” Intell. Pharm., vol. 2, no. 6, pp. 784–791, 2024, doi: 10.1016/j.ipha.2024.05.002.

Warner, “Disease Diagnosis Dataset.” [Online]. Available: https://www.kaggle.com/datasets/s3programmer/disease-diagnosis-dataset

D. Ali, M. M. S. Missen, and M. Husnain, “Multiclass Event Classification from Text,” Sci. Program., vol. 2021, no. 1, 2021, doi: 10.1155/2021/6660651.

A. Mansoori, M. Zeinalnezhad, and L. Nazarimanesh, “Optimization of Tree-Based Machine Learning Models to Predict the Length of Hospital Stay Using Genetic Algorithm,” J. Healthc. Eng., vol. 2023, no. 1, 2023, doi: 10.1155/2023/9673395.

F. Pargent, F. Pfisterer, J. Thomas, and B. Bischl, “Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features,” Comput. Stat., vol. 37, no. 5, pp. 2671–2692, 2022, doi: 10.1007/s00180-022-01207-6.

S. Mumtaz and M. Giese, “Hierarchy-based semantic embeddings for single-valued & multi-valued categorical variables,” J. Intell. Inf. Syst., vol. 58, no. 3, pp. 613–640, 2022, doi: 10.1007/s10844-021-00693-2.

Dhiyaussalam, A. Wibowo, F. A. Nugroho, E. A. Sarwoko, and I. M. A. Setiawan, “Classification of Headache Disorder Using Random Forest Algorithm,” in 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), 2020, pp. 1–5. doi: 10.1109/ICICoS51170.2020.9299105.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.

Dhiyaussalam and S. Uyun, “Optimization of Random Forest Hyperparameters with Genetic Algorithm in Classification of Lung Cancer,” 6th Int. Semin. Res. Inf. Technol. Intell. Syst. ISRITI 2023 - Proceeding, pp. 82–88, 2023, doi: 10.1109/ISRITI60336.2023.10467686.

A. Maulana et al., “Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm,” Infolitika J. Data Sci., vol. 1, no. 1, pp. 1–7, 2023, doi: 10.60084/ijds.v1i1.72.

S. Montaha, S. Azam, A. K. M. Rakibul Haque Rafid, S. Islam, P. Ghosh, and M. Jonkman, A shallow deep learning approach to classify skin cancer using down-scaling method to minimize time and space complexity, vol. 17, no. 8 August. 2022. doi: 10.1371/journal.pone.0269826.

S. Baharvand and H. Ahmari, Application of Machine Learning Approaches in Particle Tracking Model to Estimate Sediment Transport in Natural Streams, vol. 38, no. 8. 2024. doi: 10.1007/s11269-024-03798-9.

T. Inoue et al., “XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury,” Neurotrauma Reports, vol. 1, no. 1, pp. 8–16, 2020, doi: 10.1089/neur.2020.0009.

Predicting Respiratory Conditions Using Random Forest and XGBoost

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

publisher

sidebar

certificate

template

gs-citation

index

stat