Indonesian Health Question Multi-Class Classification Based on Deep Learning
Abstract
The health online forum is commonly used by Indonesian to ask questions related to diseases. A well-known example, Alodokter, has hundreds of thousands of health questions which are assigned to certain topics. Building a model to classify questions into a topic is important for better organization and faster response by relevant health professionals. This research experimented on 20 deep learning methods from RNN, CNN, and IndoBERT with different configurations to see the performance of each model when classifying questions into six different most common diseases that cause death in Indonesia. The results show the majority of the model can outperform the SVM as baseline. Bidirectional RNN such BiLSTM and BiGRU combined with CNN show a good metric score even though a certain version of the IndoBERT model generally outperforms all the other models.
Downloads
References
Ministry of Health of the Republic of Indonesia, "Indonesia Health Profile 2019," Jakarta: Ministry of Health of the Republic of Indonesia, 2020.
Y. A. Singgalen, "Sentiment Analysis on Customer Perception towards Products and Services of Restaurant in Labuan Bajo," J. Inf. Syst. Inform., vol. 4, no. 3, pp. 511-523, 2022.
P. R. A. Savitri, I. M. A. D. Suarjaya, and W. O. Vihikan, "Sentiment Analysis of X (Twitter) Comments on The Influence of South Korean Culture in Indonesia," J. Inf. Syst. Inform., vol. 6, no. 2, pp. 979-991, 2024.
P. A. Setiawati, I. M. A. D. Suarjaya, and I. N. P. Trisna, "Sentiment Analysis of Unemployment in Indonesia During and Post COVID-19 on X (Twitter) Using Naïve Bayes and Support Vector Machine," J. Inf. Syst. Inform., vol. 6, no. 2, pp. 662-675, 2024.
N. Limsopatham, "Effectively leveraging BERT for legal document classification," in Proc. Nat. Legal Lang. Process. Workshop 2021, 2021, pp. 210-216.
W. O. Vihikan, M. Mistica, I. Levy, A. Christie, and T. Baldwin, "Automatic resolution of domain name disputes," in Proc. Nat. Legal Lang. Process. Workshop 2021, 2021, pp. 228-238.
X. Li, M. Cui, J. Li, R. Bai, Z. Lu, and U. Aickelin, "A hybrid medical text classification framework: Integrating attentive rule construction and neural network," Neurocomputing, vol. 443, pp. 345-355, 2021.
S. K. Prabhakar and D.-O. Won, "Medical text classification using hybrid deep learning models with multihead attention," Comput. Intell. Neurosci., vol. 2021, no. 1, p. 9425655, 2021.
N. Arif, S. Latif, and R. Latif, "Question Classification Using Universal Sentence Encoder and Deep Contextualized Transformer," in Proc. 2021 14th Int. Conf. Develop. eSyst. Eng. (DeSE), 2021, pp. 206-211.
D. Han, T. Tohti, and A. Hamdulla, "Attention-based transformer-BiGRU for question classification," Information, vol. 13, no. 5, p. 214, 2022.
A. F. Abdillah, P. Putra, C. Bagus, S. Juanita, and D. Purwitasari, "Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data," J. Inf. Syst. Eng. Bus. Intell., vol. 8, no. 1, 2022.
N. A. Salsabila, Y. A. Winatmoko, A. A. Septiandri, and A. Jamal, "Colloquial Indonesian Lexicon," in Proc. 2018 Int. Conf. Asian Lang. Process. (IALP), 2018, pp. 226-229.
J. Asian, Effective techniques for Indonesian text retrieval, Melbourne, Australia: RMIT University, 2007.
A. Z. Arifin, I. Mahendra, and H. T. Ciptaningtyas, "Enhanced confix stripping stemmer and ants algorithm for classifying news document in Indonesian language," in Proc. Int. Conf. Inf. Commun. Technol. Syst., 2009, vol. 5, pp. 149-158.
A. D. Tahitoe and D. Purwitasari, "Implementasi modifikasi enhanced confix stripping stemmer untuk bahasa indonesia dengan metode corpus based stemming," J. Ilm., vol. 12, no. 15, pp. 1-15, 2010.
A. K. Darmawan, M. W. Al Wajieh, M. B. Setyawan, T. Yandi, and H. Hoiriyah, "Hoax news analysis for the Indonesian national capital relocation public policy with the support vector machine and random forest algorithms," J. Inf. Syst. Inform., vol. 5, no. 1, pp. 150-173, 2023.
M. Zulqarnain, A. K. Z. Alsaedi, R. Ghazali, M. G. Ghouse, W. Sharif, and N. A. Husaini, "A comparative analysis on question classification task based on deep learning approaches," PeerJ Comput. Sci., vol. 7, p. e570, 2021.
Y. Zhang and Z. Rao, "n-BiLSTM: BiLSTM with n-gram Features for Text Classification," in Proc. 2020 IEEE 5th Inf. Technol. Mechatronics Eng. Conf. (ITOEC), 2020, pp. 1056-1059.
A. A. Sharfuddin, M. N. Tihami, and M. S. Islam, "A deep recurrent neural network with bilstm model for sentiment classification," in Proc. 2018 Int. Conf. Bangla Speech Lang. Process. (ICBSLP), 2018, pp. 1-4.
L. Zhou and X. Bian, "Improved text sentiment classification method based on BiGRU-Attention," J. Phys.: Conf. Ser., vol. 1345, no. 3, p. 032097, 2019.
H. Wang, J. He, X. Zhang, and S. Liu, "A short text classification method based on N‐gram and CNN," Chin. J. Electron., vol. 29, no. 2, pp. 248-254, 2020.
E. D. Ajik, G. N. Obunadike, and F. O. Echobu, "Fake News Detection Using Optimized CNN and LSTM Techniques," J. Inf. Syst. Inform., vol. 5, no. 3, pp. 1044-1057, 2023.
J. Zheng and L. Zheng, "A hybrid bidirectional recurrent convolutional neural network attention-based model for text classification," IEEE Access, vol. 7, pp. 106673-106685, 2019.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," Trans. Assoc. Comput. Linguistics, vol. 5, p. 135, 2017.
F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, "IndoLEM and IndoBERT: a benchmark dataset and pre-trained language model for Indonesian NLP," in Proc. COLING 2020-28th Int. Conf. Comput. Linguistics, 2020, pp. 757-770.
B. Wilie et al., "IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding," in Proc. 1st Conf. Asia-Pacific Chapter Assoc. Comput. Linguistics 10th Int. Joint Conf. Natural Lang. Process., 2020, pp. 843-857.
A. Vaswani et al., "Attention is All You Need," presented at the Proc. 31st Int. Conf. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017.
A. Merchant, E. Rahimtoroghi, E. Pavlick, and I. Tenney, "What Happens To BERT Embeddings During Fine-tuning?," in Proc. Third BlackboxNLP Workshop Analyzing Interpreting Neural Netw. NLP, 2020, pp. 33-44.
I. Budiman et al., "Classification Performance Comparison of BERT and IndoBERT on Self-Report of COVID-19 Status on Social Media," J. Comput. Sci. Inst., vol. 30, pp. 61-67, 2024.
S. Saadah, K. M. Auditama, A. A. Fattahila, F. I. Amorokhman, A. Aditsania, and A. A. Rohmawati, "Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion About COVID-19 Vaccine in Indonesia," J. RESTI (Rekayasa Sist. dan Teknol. Inform.), vol. 6, no. 4, pp. 648-655, 2022.
M. I. K. Sinapoy, Y. Sibaroni, and S. S. Prasetyowati, "Comparison of LSTM and IndoBERT Method in Identifying Hoax On Twitter," J. RESTI (Rekayasa Sist. dan Teknol. Inform.), vol. 7, no. 3, pp. 657-662, 2023.
P. F. Wright and F. L. Marston, "The Detection of Respiratory Infections," N. Engl. J. Med., vol. 282, no. 4, pp. 203-209, 1970.
P. J. Barnes, "Mechanisms of Development of Multidrug-Resistant Tuberculosis," Clin. Chest Med., vol. 30, no. 4, pp. 521-530, 2009.


Copyright (c) 2024 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)