Enhancing Hate Speech Detection: Leveraging Emoji Preprocessing with BI-LSTM Model
DOI:
https://doi.org/10.51519/journalisi.v7i2.1147Keywords:
Twitter, Emoji Description, Hate Speech, Emoji Preprocessing, BI-LSTMAbstract
Microblogging platforms like Twitter enable users to rapidly share opinions, information, and viewpoints. However, the vast volume of daily user-generated content poses challenges in ensuring the platform remains safe and inclusive. One key concern is the prevalence of hate speech, which must be addressed to foster a respectful and open environment. This study explores the effectiveness of the Emoji Description Method (EMJ DESC), which enhances tweet classification by converting emojis into descriptive text or sentences. These descriptions are then encoded into numerical vector matrices that capture the meaning and emotional tone of each emoji. Integrated into a basic text classification model, these vectors help improve detection performance. The research examines how different emoji preprocessing strategies affect the performance of a BI-LSTM model for hate speech classification. Results show that removing emojis significantly reduces accuracy (68%) and weakens the model’s ability to distinguish between hate and non-hate speech, due to the loss of valuable semantic context. In contrast, retaining emoji semantics either through textual descriptions or embeddings boosts classification accuracy to 93% and 94%, respectively. The highest performance is achieved through emoji embedding, highlighting its ability to capture subtle non-verbal cues critically for accurate hate speech detection. Overall, the findings emphasize the importance of incorporating emoji-aware preprocessing techniques to enhance the effectiveness of social media content classification.
Downloads
References
V. B. Lestari, E. Utami, and Hanafi, "Combining Bi-LSTM and Word2vec Embedding for Sentiment Analysis Models of Application User Reviews," Indonesian Journal of Computer Science, vol. 13, no. 1, pp. 312–326, 2024, doi: 10.33022/ijcs.v13i1.3647.
A. Salau and T. K. Yesufu, "Recent Trends in Image and Signal Processing in Computer Vision," unpublished, Dec. 2020.
Y. A. Jasim, M. G. Saeed, and M. B. Raewf, "Analyzing Social Media Sentiment: Twitter as a Case Study," Advances in Distributed Computing and Artificial Intelligence Journal, vol. 11, no. 4, pp. 427–450, 2022, doi: 10.14201/adcaij.28394.
M. A. Fauzi and A. Yuniarti, "Ensemble method for Indonesian Twitter hate speech detection," Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 1, pp. 294–299, 2018, doi: 10.11591/ijeecs.v11.i1.pp294-299.
S. W. Azumah, N. Elsayed, Z. ElSayed, M. Ozer, and A. La Guardia, "Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks," arXiv preprint, 2024. [Online]. Available: http://arxiv.org/abs/2406.17793
O. Adel, K. M. Fathalla, and A. Abo ElFarag, "MM-EMOR: Multi-Modal Emotion Recognition of Social Media Using Concatenated Deep Learning Networks," Big Data and Cognitive Computing, vol. 7, no. 4, 2023, doi: 10.3390/bdcc7040164.
A. A. Arifiyanti and E. D. Wahyuni, "Emoji and emoticon in tweet sentiment classification," in Proc. 6th Information Technology International Seminar (ITIS), 2020, pp. 145–150, doi: 10.1109/ITIS50118.2020.9320988.
M. Amrullah, I. Budi, A. Santoso, and P. Putra, "The effect of using Emoji and Hashtag in sentiment analysis on Twitter case study: Indonesian online travel agent," in AIP Conference Proceedings, vol. 2023, p. 20013, 2023, doi: 10.1063/5.0118228.
M. J. Althobaiti, "BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis," International Journal of Advanced Computer Science and Applications, vol. 13, no. 5, pp. 972–980, 2022, doi: 10.14569/IJACSA.2022.01305109.
U. Ite, "Perbandingan IndoBERT dan Bi-LSTM Dalam Mendeteksi Pelanggaran," Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 1, pp. 52–59, 2025.
E. Aurora, A. Zahra, Y. Sibaroni, and S. Prasetyowati, "Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method," JINAV: Journal of Information and Visualization, vol. 4, no. 2, pp. 2746–1440, 2023, doi: 10.35877/454RI.jinav1864.
B. Jang, M. Kim, G. Harerimana, S. U. Kang, and J. W. Kim, "Bi-LSTM model to increase accuracy in text classification: Combining word2vec CNN and attention mechanism," Applied Sciences, vol. 10, no. 17, 2020, doi: 10.3390/app10175841.
A. R. Gunawan, R. Faticha, and A. Aziza, "Sentiment Analysis Using LSTM Algorithm Regarding Grab Application Services in Indonesia," Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 2, pp. 322–332, 2025.
V. Prasetyo and A. Samudra, "Hate speech content detection system on Twitter using K-nearest neighbor method," in AIP Conference Proceedings, vol. 2022, p. 50001, 2022, doi: 10.1063/5.0080185.
K. Keykhosravi, A. Hamednia, H. Rastegarfar, and E. Agrell, "Data preprocessing for machine-learning-based adaptive data center transmission," ICT Express, vol. 8, no. 1, pp. 37–43, 2022, doi: 10.1016/j.icte.2022.02.002.
K. Maharana, S. Mondal, and B. Nemade, "A review: Data pre-processing and data augmentation techniques," Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, 2022, doi: 10.1016/j.gltp.2022.04.020.
N. Pandey, P. K. Patnaik, and S. Gupta, "Data Pre Processing for Machine Learning Models using Python Libraries," International Journal of Engineering and Advanced Technology, vol. 9, no. 4, pp. 1995–1999, 2020, doi: 10.35940/ijeat.d9057.049420.
P. Gong, Y. Ma, C. Li, X. Ma, and S. H. Noh, "Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks," arXiv preprint, 2023. [Online]. Available: http://arxiv.org/abs/2304.08925
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Elsevier, 2011, doi: 10.1016/C2009-0-61819-5.
L. Saragih, M. Nababan, Y. Simatupang, and J. Amalia, "Analisis Self-Attention Pada Bi-Directional LSTM Dengan Fasttext Dalam Mendeteksi Emosi Berdasarkan Text," Zo. Jurnal Sistem Informasi, vol. 4, no. 2, pp. 144–156, 2022, doi: 10.31849/zn.v4i2.10846.
L. F. A. O. Pellicer, T. M. Ferreira, and A. H. R. Costa, "Data augmentation techniques in natural language processing," Applied Soft Computing, vol. 132, p. 109803, 2023, doi: 10.1016/j.asoc.2022.109803.
D. Wang and J. Eisner, "Synthetic data made to order: The case of parsing," in Proc. 2018 Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 1325–1337, 2018, doi: 10.18653/v1/d18-1163.
D. Raka, V. Saputra, and E. R. Arumi, "Optimizing Aspect-Based Sentiment Analysis for Kyai Langgeng Park Using PSO and SVM," Jurnal Ilmu Sistem Informasi, vol. 6, no. 4, pp. 2856–2867, 2024, doi: 10.51519/journalisi.v6i4.930.
A. Novanto and D. Indra, "Analisis Pre-processing Sentimen Terhadap Komentar Layanan Indihome pada Twitter," Jurnal Teknologi dan Sistem Informasi, vol. 5, no. 1, pp. 30–36, 2024.
A. P. J. Dwitama, D. H. Fudholi, and S. Hidayat, "Indonesian Hate Speech Detection Using Bidirectional Long Short-Term Memory (Bi-LSTM)," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 2, pp. 302–309, 2023, doi: 10.29207/resti.v7i2.4642.
Downloads
Published
Issue
Section
License
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














