Indonesian Health Question Multi-Class Classification Based on Deep Learning

  • Wayan Oger Vihikan Udayana University, Indonesia
  • I Nyoman Prayana Trisna Udayana University, Indonesia
Keywords: Health Question, Text Classification, Deep Learning, IndoBERT

Abstract

The health online forum is commonly used by Indonesian to ask questions related to diseases. A well-known example, Alodokter, has hundreds of thousands of health questions which are assigned to certain topics. Building a model to classify questions into a topic is important for better organization and faster response by relevant health professionals. This research experimented on 20 deep learning methods from RNN, CNN, and IndoBERT with different configurations to see the performance of each model when classifying questions into six different most common diseases that cause death in Indonesia. The results show the majority of the model can outperform the SVM as baseline. Bidirectional RNN such BiLSTM and BiGRU combined with CNN show a good metric score even though a certain version of the IndoBERT model generally outperforms all the other models.

Downloads

Download data is not yet available.

References

Ministry of Health of the Republic of Indonesia, "Indonesia Health Profile 2019," Jakarta: Ministry of Health of the Republic of Indonesia, 2020.

Y. A. Singgalen, "Sentiment Analysis on Customer Perception towards Products and Services of Restaurant in Labuan Bajo," J. Inf. Syst. Inform., vol. 4, no. 3, pp. 511-523, 2022.

P. R. A. Savitri, I. M. A. D. Suarjaya, and W. O. Vihikan, "Sentiment Analysis of X (Twitter) Comments on The Influence of South Korean Culture in Indonesia," J. Inf. Syst. Inform., vol. 6, no. 2, pp. 979-991, 2024.

P. A. Setiawati, I. M. A. D. Suarjaya, and I. N. P. Trisna, "Sentiment Analysis of Unemployment in Indonesia During and Post COVID-19 on X (Twitter) Using Naïve Bayes and Support Vector Machine," J. Inf. Syst. Inform., vol. 6, no. 2, pp. 662-675, 2024.

N. Limsopatham, "Effectively leveraging BERT for legal document classification," in Proc. Nat. Legal Lang. Process. Workshop 2021, 2021, pp. 210-216.

W. O. Vihikan, M. Mistica, I. Levy, A. Christie, and T. Baldwin, "Automatic resolution of domain name disputes," in Proc. Nat. Legal Lang. Process. Workshop 2021, 2021, pp. 228-238.

X. Li, M. Cui, J. Li, R. Bai, Z. Lu, and U. Aickelin, "A hybrid medical text classification framework: Integrating attentive rule construction and neural network," Neurocomputing, vol. 443, pp. 345-355, 2021.

S. K. Prabhakar and D.-O. Won, "Medical text classification using hybrid deep learning models with multihead attention," Comput. Intell. Neurosci., vol. 2021, no. 1, p. 9425655, 2021.

N. Arif, S. Latif, and R. Latif, "Question Classification Using Universal Sentence Encoder and Deep Contextualized Transformer," in Proc. 2021 14th Int. Conf. Develop. eSyst. Eng. (DeSE), 2021, pp. 206-211.

D. Han, T. Tohti, and A. Hamdulla, "Attention-based transformer-BiGRU for question classification," Information, vol. 13, no. 5, p. 214, 2022.

A. F. Abdillah, P. Putra, C. Bagus, S. Juanita, and D. Purwitasari, "Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data," J. Inf. Syst. Eng. Bus. Intell., vol. 8, no. 1, 2022.

N. A. Salsabila, Y. A. Winatmoko, A. A. Septiandri, and A. Jamal, "Colloquial Indonesian Lexicon," in Proc. 2018 Int. Conf. Asian Lang. Process. (IALP), 2018, pp. 226-229.

J. Asian, Effective techniques for Indonesian text retrieval, Melbourne, Australia: RMIT University, 2007.

A. Z. Arifin, I. Mahendra, and H. T. Ciptaningtyas, "Enhanced confix stripping stemmer and ants algorithm for classifying news document in Indonesian language," in Proc. Int. Conf. Inf. Commun. Technol. Syst., 2009, vol. 5, pp. 149-158.

A. D. Tahitoe and D. Purwitasari, "Implementasi modifikasi enhanced confix stripping stemmer untuk bahasa indonesia dengan metode corpus based stemming," J. Ilm., vol. 12, no. 15, pp. 1-15, 2010.

A. K. Darmawan, M. W. Al Wajieh, M. B. Setyawan, T. Yandi, and H. Hoiriyah, "Hoax news analysis for the Indonesian national capital relocation public policy with the support vector machine and random forest algorithms," J. Inf. Syst. Inform., vol. 5, no. 1, pp. 150-173, 2023.

M. Zulqarnain, A. K. Z. Alsaedi, R. Ghazali, M. G. Ghouse, W. Sharif, and N. A. Husaini, "A comparative analysis on question classification task based on deep learning approaches," PeerJ Comput. Sci., vol. 7, p. e570, 2021.

Y. Zhang and Z. Rao, "n-BiLSTM: BiLSTM with n-gram Features for Text Classification," in Proc. 2020 IEEE 5th Inf. Technol. Mechatronics Eng. Conf. (ITOEC), 2020, pp. 1056-1059.

A. A. Sharfuddin, M. N. Tihami, and M. S. Islam, "A deep recurrent neural network with bilstm model for sentiment classification," in Proc. 2018 Int. Conf. Bangla Speech Lang. Process. (ICBSLP), 2018, pp. 1-4.

L. Zhou and X. Bian, "Improved text sentiment classification method based on BiGRU-Attention," J. Phys.: Conf. Ser., vol. 1345, no. 3, p. 032097, 2019.

H. Wang, J. He, X. Zhang, and S. Liu, "A short text classification method based on N‐gram and CNN," Chin. J. Electron., vol. 29, no. 2, pp. 248-254, 2020.

E. D. Ajik, G. N. Obunadike, and F. O. Echobu, "Fake News Detection Using Optimized CNN and LSTM Techniques," J. Inf. Syst. Inform., vol. 5, no. 3, pp. 1044-1057, 2023.

J. Zheng and L. Zheng, "A hybrid bidirectional recurrent convolutional neural network attention-based model for text classification," IEEE Access, vol. 7, pp. 106673-106685, 2019.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching Word Vectors with Subword Information," Trans. Assoc. Comput. Linguistics, vol. 5, p. 135, 2017.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, "IndoLEM and IndoBERT: a benchmark dataset and pre-trained language model for Indonesian NLP," in Proc. COLING 2020-28th Int. Conf. Comput. Linguistics, 2020, pp. 757-770.

B. Wilie et al., "IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding," in Proc. 1st Conf. Asia-Pacific Chapter Assoc. Comput. Linguistics 10th Int. Joint Conf. Natural Lang. Process., 2020, pp. 843-857.

A. Vaswani et al., "Attention is All You Need," presented at the Proc. 31st Int. Conf. Neural Inf. Process. Syst., Long Beach, CA, USA, 2017.

A. Merchant, E. Rahimtoroghi, E. Pavlick, and I. Tenney, "What Happens To BERT Embeddings During Fine-tuning?," in Proc. Third BlackboxNLP Workshop Analyzing Interpreting Neural Netw. NLP, 2020, pp. 33-44.

I. Budiman et al., "Classification Performance Comparison of BERT and IndoBERT on Self-Report of COVID-19 Status on Social Media," J. Comput. Sci. Inst., vol. 30, pp. 61-67, 2024.

S. Saadah, K. M. Auditama, A. A. Fattahila, F. I. Amorokhman, A. Aditsania, and A. A. Rohmawati, "Implementation of BERT, IndoBERT, and CNN-LSTM in Classifying Public Opinion About COVID-19 Vaccine in Indonesia," J. RESTI (Rekayasa Sist. dan Teknol. Inform.), vol. 6, no. 4, pp. 648-655, 2022.

M. I. K. Sinapoy, Y. Sibaroni, and S. S. Prasetyowati, "Comparison of LSTM and IndoBERT Method in Identifying Hoax On Twitter," J. RESTI (Rekayasa Sist. dan Teknol. Inform.), vol. 7, no. 3, pp. 657-662, 2023.

P. F. Wright and F. L. Marston, "The Detection of Respiratory Infections," N. Engl. J. Med., vol. 282, no. 4, pp. 203-209, 1970.

P. J. Barnes, "Mechanisms of Development of Multidrug-Resistant Tuberculosis," Clin. Chest Med., vol. 30, no. 4, pp. 521-530, 2009.

Published
2024-09-24
Abstract views: 468 times
Download PDF: 344 times
How to Cite
Vihikan, W., & Trisna, I. N. (2024). Indonesian Health Question Multi-Class Classification Based on Deep Learning. Journal of Information Systems and Informatics, 6(3), 1931-1944. https://doi.org/10.51519/journalisi.v6i3.838
Section
Articles