Predicting Respiratory Conditions Using Random Forest and XGBoost
DOI:
https://doi.org/10.51519/journalisi.v7i2.1124Keywords:
Machine Learning, Random Forest, Respiratory Disease, Severity Classification, XGBoostAbstract
This study examines the performance of Random Forest and XGBoost in predicting the diagnosis and severity of respiratory diseases using a simulated dataset of 2,000 patient records. The models were tested on two classification tasks: identifying disease types (e.g., pneumonia, influenza) and classifying severity levels (mild, moderate, severe). Both models achieved perfect accuracy in severity classification, with 1.0000 ± 0.0000 cross-validation scores, demonstrating strong stability under balanced class distributions. However, in the diagnosis task, Random Forest underperformed on minority classes, particularly pneumonia, with a recall of 0.18 and F1-score of 0.31. XGBoost, on the other hand, achieved superior results across all classes, including minority cases, with 0.9825 ± 0.0170 cross-validation accuracy and perfect test set performance. These findings highlight XGBoost’s robustness in handling imbalanced and multiclass medical data, making it a promising candidate for clinical decision support. Future work should address class imbalance and explore explainability techniques to improve trust and transparency in real-world applications.
Downloads
References
H. Zhu, J. Dong, X. Xie, and L. Wang, “Comparison between the molecular diagnostic test and chest X-ray combined with multi-slice spiral CT in the diagnosis of lobar pneumonia,” Cell. Mol. Biol., vol. 67, no. 3, pp. 129–132, 2021, doi: 10.14715/cmb/2021.67.3.18.
P. Zatovkaňuková and J. Slíva, “The potential dangers of whooping cough: a case of rib fracture and pneumothorax,” BMC Infect. Dis., vol. 24, no. 1, pp. 0–5, 2024, doi: 10.1186/s12879-024-10192-8.
N. Chen et al., “Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study,” Lancet, vol. 395, no. 10223, pp. 507–513, 2020, doi: 10.1016/S0140-6736(20)30211-7.
J. Czubak, K. Stolarczyk, A. Orzeł, M. Frączek, and T. Zatoński, “Comparison of the clinical differences between COVID-19, SARS, influenza, and the common cold: A systematic literature review.,” Adv. Clin. Exp. Med. Off. organ Wroclaw Med. Univ., vol. 30, no. 1, pp. 109–114, Jan. 2021, doi: 10.17219/acem/129573.
J. Qu, C. Yang, F. Bao, S. Chen, L. Gu, and B. Cao, “Epidemiological characterization of respiratory tract infections caused by Mycoplasma pneumoniae during epidemic and post-epidemic periods in North China, from 2011 to 2016,” BMC Infect. Dis., vol. 18, no. 1, pp. 1–8, 2018, doi: 10.1186/s12879-018-3250-2.
M. Oberoi, R. Kulkarni, and T. Oliver, “An Unusual Case of Myocarditis, Left Ventricular Thrombus, and Embolic Stroke Caused by Mycoplasma pneumoniae,” Cureus, vol. 13, no. 3, pp. 0–4, 2021, doi: 10.7759/cureus.14170.
T. A. Rowe et al., “Reliability of nonlocalizing signs and symptoms as indicators of the presence of infection in nursing-home residents,” Infect. Control Hosp. Epidemiol., vol. 43, no. 4, pp. 417–426, 2022, doi: 10.1017/ice.2020.1282.
L. Han, “Prediction of hepatocellular carcinoma and Edmondson-Steiner grade using an integrated workow of multiple machine learning algorithms,” 2023, [Online]. Available: https://doi.org/10.21203/rs.3.rs-2905568/v1
P. Jabbari, N. Taraghikhah, F. Jabbari, S. Ebrahimi, and N. Rezaei, “Body Mass Index as a Predictor of Symptom Duration in COVID-19 Outpatients,” Disaster Med. Public Health Prep., vol. 17, no. 6, 2023, doi: 10.1017/dmp.2022.185.
M. Esposito et al., “Depressive symptoms and insecure attachment predict disability and quality of life in psoriasis independently from disease severity,” Arch. Dermatol. Res., vol. 313, no. 6, pp. 431–437, 2021, doi: 10.1007/s00403-020-02116-8.
D. Meng, J. Xu, and J. Zhao, “Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and XGBoost,” PLoS One, vol. 16, no. 12 December, pp. 1–16, 2021, doi: 10.1371/journal.pone.0261629.
P. Yang and B. Yang, “Development and validation of predictive models for diabetic retinopathy using machine learning,” PLoS One, vol. 20, no. 2 February, pp. 1–13, 2025, doi: 10.1371/journal.pone.0318226.
Y. Han and S. Wang, “Disability risk prediction model based on machine learning among Chinese healthy older adults: results from the China Health and Retirement Longitudinal Study,” Front. Public Heal., vol. 11, 2023, doi: 10.3389/fpubh.2023.1271595.
N. Acharya, P. Kar, M. Ally, and J. Soar, “Predicting Co-Occurring Mental Health and Substance Use Disorders in Women: An Automated Machine Learning Approach,” Appl. Sci., vol. 14, no. 4, 2024, doi: 10.3390/app14041630.
Y. Xiao, Y. Chen, R. Huang, F. Jiang, J. Zhou, and T. Yang, “Interpretable machine learning in predicting drug-induced liver injury among tuberculosis patients: model development and validation study,” BMC Med. Res. Methodol., vol. 24, no. 1, pp. 1–17, 2024, doi: 10.1186/s12874-024-02214-5.
S. Albrecht et al., “Forecasting severe respiratory disease hospitalizations using machine learning algorithms,” BMC Med. Inform. Decis. Mak., vol. 0, 2024, doi: 10.1186/s12911-024-02702-0.
Y. Emre and Z. Ayd, “Comparative analysis of machine learning approaches for predicting respiratory virus infection and symptom severity,” pp. 1–26, 2023, doi: 10.7717/peerj.15552.
M. Shen, Jinzhi; Ke, “Review of Interpretable Machine Learning Models for Disease Prognosis”.
A. Mahajan, C. Kulkarni, and S. Mate, “Predicting Lung Disease Severity Via Image-Based Aqi Analysis Using Deep Learning Techniques,” pp. 1–11.
P. Yadav, V. Rastogi, A. Yadav, and P. Parashar, “Artificial Intelligence : A promising tool in diagnosis of respiratory diseases,” Intell. Pharm., vol. 2, no. 6, pp. 784–791, 2024, doi: 10.1016/j.ipha.2024.05.002.
Warner, “Disease Diagnosis Dataset.” [Online]. Available: https://www.kaggle.com/datasets/s3programmer/disease-diagnosis-dataset
D. Ali, M. M. S. Missen, and M. Husnain, “Multiclass Event Classification from Text,” Sci. Program., vol. 2021, no. 1, 2021, doi: 10.1155/2021/6660651.
A. Mansoori, M. Zeinalnezhad, and L. Nazarimanesh, “Optimization of Tree-Based Machine Learning Models to Predict the Length of Hospital Stay Using Genetic Algorithm,” J. Healthc. Eng., vol. 2023, no. 1, 2023, doi: 10.1155/2023/9673395.
F. Pargent, F. Pfisterer, J. Thomas, and B. Bischl, “Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features,” Comput. Stat., vol. 37, no. 5, pp. 2671–2692, 2022, doi: 10.1007/s00180-022-01207-6.
S. Mumtaz and M. Giese, “Hierarchy-based semantic embeddings for single-valued & multi-valued categorical variables,” J. Intell. Inf. Syst., vol. 58, no. 3, pp. 613–640, 2022, doi: 10.1007/s10844-021-00693-2.
Dhiyaussalam, A. Wibowo, F. A. Nugroho, E. A. Sarwoko, and I. M. A. Setiawan, “Classification of Headache Disorder Using Random Forest Algorithm,” in 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), 2020, pp. 1–5. doi: 10.1109/ICICoS51170.2020.9299105.
T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., vol. 13-17-Augu, pp. 785–794, 2016, doi: 10.1145/2939672.2939785.
Dhiyaussalam and S. Uyun, “Optimization of Random Forest Hyperparameters with Genetic Algorithm in Classification of Lung Cancer,” 6th Int. Semin. Res. Inf. Technol. Intell. Syst. ISRITI 2023 - Proceeding, pp. 82–88, 2023, doi: 10.1109/ISRITI60336.2023.10467686.
A. Maulana et al., “Machine Learning Approach for Diabetes Detection Using Fine-Tuned XGBoost Algorithm,” Infolitika J. Data Sci., vol. 1, no. 1, pp. 1–7, 2023, doi: 10.60084/ijds.v1i1.72.
S. Montaha, S. Azam, A. K. M. Rakibul Haque Rafid, S. Islam, P. Ghosh, and M. Jonkman, A shallow deep learning approach to classify skin cancer using down-scaling method to minimize time and space complexity, vol. 17, no. 8 August. 2022. doi: 10.1371/journal.pone.0269826.
S. Baharvand and H. Ahmari, Application of Machine Learning Approaches in Particle Tracking Model to Estimate Sediment Transport in Natural Streams, vol. 38, no. 8. 2024. doi: 10.1007/s11269-024-03798-9.
T. Inoue et al., “XGBoost, a Machine Learning Method, Predicts Neurological Recovery in Patients with Cervical Spinal Cord Injury,” Neurotrauma Reports, vol. 1, no. 1, pp. 8–16, 2020, doi: 10.1089/neur.2020.0009.
Downloads
Published
Issue
Section
License
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














