Predicting Respiratory Conditions Using Random Forest and XGBoost
Abstract
This study examines the performance of Random Forest and XGBoost in predicting the diagnosis and severity of respiratory diseases using a simulated dataset of 2,000 patient records. The models were tested on two classification tasks: identifying disease types (e.g., pneumonia, influenza) and classifying severity levels (mild, moderate, severe). Both models achieved perfect accuracy in severity classification, with 1.0000 ± 0.0000 cross-validation scores, demonstrating strong stability under balanced class distributions. However, in the diagnosis task, Random Forest underperformed on minority classes, particularly pneumonia, with a recall of 0.18 and F1-score of 0.31. XGBoost, on the other hand, achieved superior results across all classes, including minority cases, with 0.9825 ± 0.0170 cross-validation accuracy and perfect test set performance. These findings highlight XGBoost’s robustness in handling imbalanced and multiclass medical data, making it a promising candidate for clinical decision support. Future work should address class imbalance and explore explainability techniques to improve trust and transparency in real-world applications.
Downloads
References
H. Zhu, J. Dong, X. Xie, and L. Wang, “Comparison between the molecular diagnostic test and chest X-ray combined with multi-slice spiral CT in the diagnosis of lobar pneumonia,” Cell. Mol. Biol., vol. 67, no. 3, pp. 129–132, 2021, doi: 10.14715/cmb/2021.67.3.18.
P. Zatovkaňuková and J. Slíva, “The potential dangers of whooping cough: a case of rib fracture and pneumothorax,” BMC Infect. Dis., vol. 24, no. 1, pp. 0–5, 2024, doi: 10.1186/s12879-024-10192-8.
N. Chen et al., “Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study,” Lancet, vol. 395, no. 10223, pp. 507–513, 2020, doi: 10.1016/S0140-6736(20)30211-7.
J. Czubak, K. Stolarczyk, A. Orzeł, M. Frączek, and T. Zatoński, “Comparison of the clinical differences between COVID-19, SARS, influenza, and the common cold: A systematic literature review.,” Adv. Clin. Exp. Med. Off. organ Wroclaw Med. Univ., vol. 30, no. 1, pp. 109–114, Jan. 2021, doi: 10.17219/acem/129573.
J. Qu, C. Yang, F. Bao, S. Chen, L. Gu, and B. Cao, “Epidemiological characterization of respiratory tract infections caused by Mycoplasma pneumoniae during epidemic and post-epidemic periods in North China, from 2011 to 2016,” BMC Infect. Dis., vol. 18, no. 1, pp. 1–8, 2018, doi: 10.1186/s12879-018-3250-2.
M. Oberoi, R. Kulkarni, and T. Oliver, “An Unusual Case of Myocarditis, Left Ventricular Thrombus, and Embolic Stroke Caused by Mycoplasma pneumoniae,” Cureus, vol. 13, no. 3, pp. 0–4, 2021, doi: 10.7759/cureus.14170.
T. A. Rowe et al., “Reliability of nonlocalizing signs and symptoms as indicators of the presence of infection in nursing-home residents,” Infect. Control Hosp. Epidemiol., vol. 43, no. 4, pp. 417–426, 2022, doi: 10.1017/ice.2020.1282.
L. Han, “Prediction of hepatocellular carcinoma and Edmondson-Steiner grade using an integrated workow of multiple machine learning algorithms,” 2023, [Online]. Available: https://doi.org/10.21203/rs.3.rs-2905568/v1
P. Jabbari, N. Taraghikhah, F. Jabbari, S. Ebrahimi, and N. Rezaei, “Body Mass Index as a Predictor of Symptom Duration in COVID-19 Outpatients,” Disaster Med. Public Health Prep., vol. 17, no. 6, 2023, doi: 10.1017/dmp.2022.185.
M. Esposito et al., “Depressive symptoms and insecure attachment predict disability and quality of life in psoriasis independently from disease severity,” Arch. Dermatol. Res., vol. 313, no. 6, pp. 431–437, 2021, doi: 10.1007/s00403-020-02116-8.
D. Meng, J. Xu, and J. Zhao, “Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and XGBoost,” PLoS One, vol. 16, no. 12 December, pp. 1–16, 2021, doi: 10.1371/journal.pone.0261629.
P. Yang and B. Yang, “Development and validation of predictive models for diabetic retinopathy using machine learning,” PLoS One, vol. 20, no. 2 February, pp. 1–13, 2025, doi: 10.1371/journal.pone.0318226.
Y. Han and S. Wang, “Disability risk prediction model based on machine learning among Chinese healthy older adults: results from the China Health and Retirement Longitudinal Study,” Front. Public Heal., vol. 11, 2023, doi: 10.3389/fpubh.2023.1271595.
N. Acharya, P. Kar, M. Ally, and J. Soar, “Predicting Co-Occurring Mental Health and Substance Use Disorders in Women: An Automated Machine Learning Approach,” Appl. Sci., vol. 14, no. 4, 2024, doi: 10.3390/app14041630.
Y. Xiao, Y. Chen, R. Huang, F. Jiang, J. Zhou, and T. Yang, “Interpretable machine learning in predicting drug-induced liver injury among tuberculosis patients: model development and validation study,” BMC Med. Res. Methodol., vol. 24, no. 1, pp. 1–17, 2024, doi: 10.1186/s12874-024-02214-5.
S. Albrecht et al., “Forecasting severe respiratory disease hospitalizations using machine learning algorithms,” BMC Med. Inform. Decis. Mak., vol. 0, 2024, doi: 10.1186/s12911-024-02702-0.
Y. Emre and Z. Ayd, “Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity,” pp. 1–26, 2023, doi: 10.7717/peerj.15552.
M. Shen, Jinzhi; Ke, “Review of Interpretable Machine Learning Models for Disease Prognosis”.
A. Mahajan, C. Kulkarni, and S. Mate, “Predicting Lung Disease Severity Via Image-Based Aqi Analysis Using Deep Learning Techniques,” pp. 1–11.
P. Yadav, V. Rastogi, A. Yadav, and P. Parashar, “Artificial Intelligence : A promising tool in diagnosis of respiratory diseases,” Intell. Pharm., vol. 2, no. 6, pp. 784–791, 2024, doi: 10.1016/j.ipha.2024.05.002.
Warner, “Disease Diagnosis Dataset.” [Online]. Available: https://www.kaggle.com/datasets/s3programmer/disease-diagnosis-dataset
D. Ali, M. M. S. Missen, and M. Husnain, “Multiclass Event Classification from Text,” Sci. Program., vol. 2021, no. 1, 2021, doi: 10.1155/2021/6660651.
A. Mansoori, M. Zeinalnezhad, and L. Nazarimanesh, “Optimization of Tree-Based Machine Learning Models to Predict the Length of Hospital Stay Using Genetic Algorithm,” J. Healthc. Eng., vol. 2023, no. 1, 2023, doi: 10.1155/2023/9673395.
F. Pargent, F. Pfisterer, J. Thomas, and B. Bischl, “Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features,” Comput. Stat., vol. 37, no. 5, pp. 2671–2692, 2022, doi: 10.1007/s00180-022-01207-6.
S. Mumtaz and M. Giese, “Hierarchy-based semantic embeddings for single-valued & multi-valued categorical variables,” J. Intell. Inf. Syst., vol. 58, no. 3, pp. 613–640, 2022, doi: 10.1007/s10844-021-00693-2.
Dhiyaussalam, A. Wibowo, F. A. Nugroho, E. A. Sarwoko, and I. M. A. Setiawan, “Classification of Headache Disorder Using Random Forest Algorithm,” in 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), 2020, pp. 1–5. doi: 10.1109/ICICoS51170.2020.9299105.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
Dhiyaussalam and S. Uyun, “Optimization of Random Forest Hyperparameters with Genetic Algorithm in Classification of Lung Cancer,” 6th Int. Semin. Res. Inf. Technol. Intell. Syst. ISRITI 2023 - Proceeding, pp. 82–88, 2023, doi: 10.1109/ISRITI60336.2023.10467686.
A. Maulana et al., “Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm,” Infolitika J. Data Sci., vol. 1, no. 1, pp. 1–7, 2023, doi: 10.60084/ijds.v1i1.72.
S. Montaha, S. Azam, A. K. M. Rakibul Haque Rafid, S. Islam, P. Ghosh, and M. Jonkman, A shallow deep learning approach to classify skin cancer using down-scaling method to minimize time and space complexity, vol. 17, no. 8 August. 2022. doi: 10.1371/journal.pone.0269826.
S. Baharvand and H. Ahmari, Application of Machine Learning Approaches in Particle Tracking Model to Estimate Sediment Transport in Natural Streams, vol. 38, no. 8. 2024. doi: 10.1007/s11269-024-03798-9.
T. Inoue et al., “XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury,” Neurotrauma Reports, vol. 1, no. 1, pp. 8–16, 2020, doi: 10.1089/neur.2020.0009.


Copyright (c) 2025 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)