Detection of Hate Speech Code Mix Involving English and Other Nigerian Languages

  • Joseph Nda Ndabula Federal University Dutsinma, Nigeria
  • Oyenike Mary Olanrewaju Federal University Dutsinma, Nigeria
  • Faith O Echobu Federal University Dutsinma, Nigeria
Keywords: Hate speech, Code-mix, Social Media, Support Vector Machine, Random Forest

Abstract

Hate speech is a recurrent event and has become a cause for global concern. The proliferation of hate speech has recently become prevalent, breeding room for violence and discrimination against specific individuals or groups. In Nigeria, message masking (use of language-mix) has become the new normal, especially in disseminating hateful and inciting comments. Hence, there is a need to curb the spread over social media. Therefore, this research focuses on detecting hate speech on social media with a code-mix of English, Pidgin and any of the three major Nigerian languages (Hausa, Igbo and Yoruba). The research used two machine learning algorithms: Support Vector Machine (SVM) and Random Forest (RF). Data were collected from tweets on the EndSARS protest and the 2023 Nigerian elections. The major features were extracted, and the text was converted into vectors using TF-IDF and Bag-of-words (BoW), which were used to train and test the model. The result showed that SVM performed better in classifying hate speech than RF on both TF-IDF and BoW features, averaging 93.43% for accuracy, 93.70% for precision, 93.43% for recall, and 93.57% for F1-score.

Downloads

Download data is not yet available.

References

A. Guterres, "United nations strategy and plan of action on hate speech," United Nations, New York, NY, USA, 2019.

S. MacAvaney, H. R. Yao, E. Yang, K. Russell, N. Goharian and O. Frieder, "``Hate speech detection: Challenges and solutions," PLoS ONE, vol. 14, no. 8, pp. 1-16, 2019.

B. Ross, M. Rist, G. Carbonell, B. Cabrera, N. Kurowsky and W. Wojatzki, "Measuring the reliability of hate speech annotations: The case of the European refugee crisis," in Proceedings of NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication, Bochum, Germany, 2016.

C. E. Ring, "Hate speech IN social media: An exploration of the problem and its proposed solutions," Colorado, 2013.

E. C. o. H. Rights, "Annual Report 2017 of European Court of Human Rights, Council of Europe," ECHR, Strasbourg, France, 2017.

S. Abro, S. Shaikh, Z. H. Khand, Z. Ali, S. Khan and M. Ghulam, "Automatic Hate Speech Detection using Machine Learning: A Comparative Study," International Journal of Advanced Computer Science and Applications, (IJACSA), vol. 11, no. 8, pp. 1-8, 2020.

C. E. R. Salim and D. Suhartono, "A Systematic Literature Review of Different Machine Learning Methods on Hate Speech Detection," International Journal on Informatics Visualization, vol. 4, no. 4, pp. 1-6, 2020.

S. K. Mohapatra, S. Prasad, D. K. Bebarta, T. K. Das, K. Srinivasan and Y.-C. Hu, "Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques," Applied Science, vol. 11, pp. 1-21, 2021.

V. Pathak, M. Joshi, P. A. Joshi, M. Mundada and T. Joshi, "Using Machine Learning for Detection of Using Machine Learning for Detection of Social Media text," KBCNMUJAL, pp. 1-12, 2020.

H. Nayel and H. L. Shashirekha, "DEEP at HASOC2019: A Machine Learning Framework for Hate Speech and Offensive Language Detection," in FIRE 2019, Kolkata, India., 2019.

N. Aulia and I. Budi, "Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach," in International Conference on Computing and Artificial Intelligence (ICCAI), Bali, Indonesia, 2019.

I. Aljarah, M. Habib, N. Hijazi, H. Faris, R. Qaddoura, B. Hammo, M. Abushariah and M. Alfawareh, "Intelligent detection of hate speech in Arabic social network: A machine learning approach," Journal of Information Science (JIS), vol. 47, no. 4, pp. 2-19, 2021.

F. D. Vigna, A. Cimino, F. Dell’Orletta, M. Petrocchi and M. Tesconi, "Hate me, hate me not: Hate speech detection on Facebook," in In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, 2017.

B. Vidgen and T. Yasseri, "Detecting weak and strong Islamophobic hate speech on social media," Journal of Information Technology & Politics, pp. 1-14, 2019.

S. M. Aliyu, G. M. Wajiga, M. Murtala, S. H. Muhammad, I. Abdulmumin and I. S. Ahmad, "HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria," arXiv preprint arXiv:2211.15262., pp. 1-3, 2022.

M. Awad and R. Khanna, "Support Vector Machine for Classifiaction," in Efficient Learnhing Machines, Berkeley, CA., Apress, 2015, pp. 39-66.

A. W. Moore, "Tutorials," 19 February 2020. [Online]. Available: http://www.cs.cmu.edu/~awm/tutorials.html. [Accessed 19 February 2020].

V. Vapnik, S. Golowich and A. Smola, "Support vector method for function approximation, regression estimation, and signal processing," in In M. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, Cambridge, MA, 1997.

R. Sutton and A. Barto, Learning: An Introduction, 1998.

N. Mohapatra, K. Shreya and A. Chinmay, "Optimization of the Random Forest Algorithm," in Advances in Data Science and Management. Lecture Notes on Data Engineering and Communications Technologies, vol. 37, Singapore, Springer, 2020, pp. 201-208.

Published
2023-12-02
Abstract views: 905 times
Download PDF: 511 times
How to Cite
Ndabula, J., Olanrewaju, O., & Echobu, F. (2023). Detection of Hate Speech Code Mix Involving English and Other Nigerian Languages. Journal of Information Systems and Informatics, 5(4), 1416-1431. https://doi.org/10.51519/journalisi.v5i4.595