Enhancing Hate Speech Detection: Leveraging Emoji Preprocessing with BI-LSTM Model

Junita Amalia; Sarah Rosdiana Tambunan; Susi Eva Maria Purba; Walker Valentinus Simanjuntak

doi:10.51519/journalisi.v7i2.1147

Authors

Junita Amalia Institut Teknologi Del, Indonesia
Sarah Rosdiana Tambunan Institut Teknologi Del, Indonesia
Susi Eva Maria Purba Institut Teknologi Del, Indonesia
Walker Valentinus Simanjuntak Institut Teknologi Del, Indonesia

DOI:

https://doi.org/10.51519/journalisi.v7i2.1147

Keywords:

Twitter, Emoji Description, Hate Speech, Emoji Preprocessing, BI-LSTM

Abstract

Microblogging platforms like Twitter enable users to rapidly share opinions, information, and viewpoints. However, the vast volume of daily user-generated content poses challenges in ensuring the platform remains safe and inclusive. One key concern is the prevalence of hate speech, which must be addressed to foster a respectful and open environment. This study explores the effectiveness of the Emoji Description Method (EMJ DESC), which enhances tweet classification by converting emojis into descriptive text or sentences. These descriptions are then encoded into numerical vector matrices that capture the meaning and emotional tone of each emoji. Integrated into a basic text classification model, these vectors help improve detection performance. The research examines how different emoji preprocessing strategies affect the performance of a BI-LSTM model for hate speech classification. Results show that removing emojis significantly reduces accuracy (68%) and weakens the model’s ability to distinguish between hate and non-hate speech, due to the loss of valuable semantic context. In contrast, retaining emoji semantics either through textual descriptions or embeddings boosts classification accuracy to 93% and 94%, respectively. The highest performance is achieved through emoji embedding, highlighting its ability to capture subtle non-verbal cues critically for accurate hate speech detection. Overall, the findings emphasize the importance of incorporating emoji-aware preprocessing techniques to enhance the effectiveness of social media content classification.

Downloads

Download data is not yet available.

References

V. B. Lestari, E. Utami, and Hanafi, "Combining Bi-LSTM and Word2vec Embedding for Sentiment Analysis Models of Application User Reviews," Indonesian Journal of Computer Science, vol. 13, no. 1, pp. 312–326, 2024, doi: 10.33022/ijcs.v13i1.3647.

A. Salau and T. K. Yesufu, "Recent Trends in Image and Signal Processing in Computer Vision," unpublished, Dec. 2020.

Y. A. Jasim, M. G. Saeed, and M. B. Raewf, "Analyzing Social Media Sentiment: Twitter as a Case Study," Advances in Distributed Computing and Artificial Intelligence Journal, vol. 11, no. 4, pp. 427–450, 2022, doi: 10.14201/adcaij.28394.

M. A. Fauzi and A. Yuniarti, "Ensemble method for Indonesian Twitter hate speech detection," Indonesian Journal of Electrical Engineering and Computer Science, vol. 11, no. 1, pp. 294–299, 2018, doi: 10.11591/ijeecs.v11.i1.pp294-299.

S. W. Azumah, N. Elsayed, Z. ElSayed, M. Ozer, and A. La Guardia, "Deep Learning Approaches for Detecting Adversarial Cyberbullying and Hate Speech in Social Networks," arXiv preprint, 2024. [Online]. Available: http://arxiv.org/abs/2406.17793

O. Adel, K. M. Fathalla, and A. Abo ElFarag, "MM-EMOR: Multi-Modal Emotion Recognition of Social Media Using Concatenated Deep Learning Networks," Big Data and Cognitive Computing, vol. 7, no. 4, 2023, doi: 10.3390/bdcc7040164.

A. A. Arifiyanti and E. D. Wahyuni, "Emoji and emoticon in tweet sentiment classification," in Proc. 6th Information Technology International Seminar (ITIS), 2020, pp. 145–150, doi: 10.1109/ITIS50118.2020.9320988.

M. Amrullah, I. Budi, A. Santoso, and P. Putra, "The effect of using Emoji and Hashtag in sentiment analysis on Twitter case study: Indonesian online travel agent," in AIP Conference Proceedings, vol. 2023, p. 20013, 2023, doi: 10.1063/5.0118228.

M. J. Althobaiti, "BERT-based Approach to Arabic Hate Speech and Offensive Language Detection in Twitter: Exploiting Emojis and Sentiment Analysis," International Journal of Advanced Computer Science and Applications, vol. 13, no. 5, pp. 972–980, 2022, doi: 10.14569/IJACSA.2022.01305109.

U. Ite, "Perbandingan IndoBERT dan Bi-LSTM Dalam Mendeteksi Pelanggaran," Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 1, pp. 52–59, 2025.

E. Aurora, A. Zahra, Y. Sibaroni, and S. Prasetyowati, "Classification of Multi-Label of Hate Speech on Twitter Indonesia using LSTM and BiLSTM Method," JINAV: Journal of Information and Visualization, vol. 4, no. 2, pp. 2746–1440, 2023, doi: 10.35877/454RI.jinav1864.

B. Jang, M. Kim, G. Harerimana, S. U. Kang, and J. W. Kim, "Bi-LSTM model to increase accuracy in text classification: Combining word2vec CNN and attention mechanism," Applied Sciences, vol. 10, no. 17, 2020, doi: 10.3390/app10175841.

A. R. Gunawan, R. Faticha, and A. Aziza, "Sentiment Analysis Using LSTM Algorithm Regarding Grab Application Services in Indonesia," Jurnal Teknologi dan Sistem Komputer, vol. 9, no. 2, pp. 322–332, 2025.

V. Prasetyo and A. Samudra, "Hate speech content detection system on Twitter using K-nearest neighbor method," in AIP Conference Proceedings, vol. 2022, p. 50001, 2022, doi: 10.1063/5.0080185.

K. Keykhosravi, A. Hamednia, H. Rastegarfar, and E. Agrell, "Data preprocessing for machine-learning-based adaptive data center transmission," ICT Express, vol. 8, no. 1, pp. 37–43, 2022, doi: 10.1016/j.icte.2022.02.002.

K. Maharana, S. Mondal, and B. Nemade, "A review: Data pre-processing and data augmentation techniques," Global Transitions Proceedings, vol. 3, no. 1, pp. 91–99, 2022, doi: 10.1016/j.gltp.2022.04.020.

N. Pandey, P. K. Patnaik, and S. Gupta, "Data Pre Processing for Machine Learning Models using Python Libraries," International Journal of Engineering and Advanced Technology, vol. 9, no. 4, pp. 1995–1999, 2020, doi: 10.35940/ijeat.d9057.049420.

P. Gong, Y. Ma, C. Li, X. Ma, and S. H. Noh, "Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks," arXiv preprint, 2023. [Online]. Available: http://arxiv.org/abs/2304.08925

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Elsevier, 2011, doi: 10.1016/C2009-0-61819-5.

L. Saragih, M. Nababan, Y. Simatupang, and J. Amalia, "Analisis Self-Attention Pada Bi-Directional LSTM Dengan Fasttext Dalam Mendeteksi Emosi Berdasarkan Text," Zo. Jurnal Sistem Informasi, vol. 4, no. 2, pp. 144–156, 2022, doi: 10.31849/zn.v4i2.10846.

L. F. A. O. Pellicer, T. M. Ferreira, and A. H. R. Costa, "Data augmentation techniques in natural language processing," Applied Soft Computing, vol. 132, p. 109803, 2023, doi: 10.1016/j.asoc.2022.109803.

D. Wang and J. Eisner, "Synthetic data made to order: The case of parsing," in Proc. 2018 Conf. Empirical Methods in Natural Language Processing (EMNLP), pp. 1325–1337, 2018, doi: 10.18653/v1/d18-1163.

D. Raka, V. Saputra, and E. R. Arumi, "Optimizing Aspect-Based Sentiment Analysis for Kyai Langgeng Park Using PSO and SVM," Jurnal Ilmu Sistem Informasi, vol. 6, no. 4, pp. 2856–2867, 2024, doi: 10.51519/journalisi.v6i4.930.

A. Novanto and D. Indra, "Analisis Pre-processing Sentimen Terhadap Komentar Layanan Indihome pada Twitter," Jurnal Teknologi dan Sistem Informasi, vol. 5, no. 1, pp. 30–36, 2024.

A. P. J. Dwitama, D. H. Fudholi, and S. Hidayat, "Indonesian Hate Speech Detection Using Bidirectional Long Short-Term Memory (Bi-LSTM)," Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 2, pp. 302–309, 2023, doi: 10.29207/resti.v7i2.4642.

Enhancing Hate Speech Detection: Leveraging Emoji Preprocessing with BI-LSTM Model

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

publisher

sidebar

certificate

template

gs-citation

index

stat