Bibliometric Analysis of Deep Learning for Social Media Hate Speech Detection

Raymond Tapiwa Mutanga; Oludayo Olugbara; Nalindren Naicker

doi:10.51519/journalisi.v5i3.549

Raymond Tapiwa Mutanga Durban University of Technology, South Africa http://orcid.org/0000-0002-8152-5946
Oludayo Olugbara Durban University of Technology, South Africa
Nalindren Naicker Durban University of Technology, South Africa

DOI: 10.51519/journalisi.v5i3.549

Keywords: Bibliometric, Deep Learning, Hate Speech

Abstract

Social media has become an important web technology for creating and sharing information plus enhancing business reputations worldwide. However, the anonymity accorded by social media platforms has been cryptically vituperated to spread horrendous content such as hate speech. Recently, researchers have been progressively gravitating towards the use of deep learning techniques to address the problem of social media hate speech detection. This study provides bibliometric analysis and mapping of the existing literature on hate speech detection using deep learning algorithms. The study used articles published between 2016 and 2022 from the Scopus database, while Vos Viewer, Biblioshiny, and Panda’s software tools were employed for the bibliometric analysis. The research explored the yearly trajectory of recent publications, dominant countries, collaborative institutions, sources of primary studies that have employed deep learning for hate speech detection, and the intellectual and social structures of the research constituents. It has been observed that the literature on hate speech detection is rapidly growing, but research output and collaborations from the developing countries of the world are still limited. The findings of this study provide insights into the intellectual structure and advancements in deep learning applications for hate speech detection while identifying research gaps for future work.

Downloads

Download data is not yet available.

References

U. Kursuncu, M. Gaur, U. Lokala, K. Thirunarayan, A. Sheth, and I. B. Arpinar, "Predictive analysis on Twitter: Techniques and applications," in Emerging research challenges and opportunities in computational social network analysis and mining: Springer, 2019, pp. 67-104.

A. Schmidt and M. Wiegand, "A survey on hate speech detection using natural language processing," in Proceedings of the fifth international workshop on natural language processing for social media, 2017, pp. 1-10.

N. Alkiviadou, "Hate speech on social media networks: towards a regulatory framework?," Information & Communications Technology Law, vol. 28, no. 1, pp. 19-35, 2019.

Z. Waseem and D. Hovy, "Hateful symbols or hateful people? predictive features for hate speech detection on twitter," in Proceedings of the NAACL student research workshop, 2016, pp. 88-93.

F. Del Vigna12, A. Cimino23, F. Dell’Orletta, M. Petrocchi, and M. Tesconi, "Hate me, hate me not: Hate speech detection on facebook," in Proceedings of the first Italian conference on cybersecurity (ITASEC17), 2017, pp. 86-95.

I. Kwok and Y. Wang, "Locate the hate: Detecting tweets against blacks," in Proceedings of the AAAI Conference on Artificial Intelligence, 2013, vol. 27, no. 1, pp. 1621-1622.

W. Warner and J. Hirschberg, "Detecting hate speech on the world wide web," in Proceedings of the second workshop on language in social media, 2012, pp. 19-26.

T. Davidson, D. Warmsley, M. Macy, and I. Weber, "Automated hate speech detection and the problem of offensive language," in Proceedings of the international AAAI conference on web and social media, 2017, vol. 11, no. 1, pp. 512-515.

T. Young, D. Hazarika, S. Poria, and E. Cambria, "Recent trends in deep learning based natural language processing," ieee Computational intelligenCe magazine, vol. 13, no. 3, pp. 55-75, 2018.

F. Alkomah and X. Ma, "A Literature Review of Textual Hate Speech Detection Methods and Datasets," Information, vol. 13, no. 6, p. 273, 2022.

A. Gasparetto, M. Marcuzzo, A. Zangari, and A. Albarelli, "A survey on text classification algorithms: From text to predictions," Information, vol. 13, no. 2, p. 83, 2022.

R. Mutanga, "A comparative study of deep learning algorithms for hate speech detection on Twitter," 2021.

R. T. Mutanga, N. Naicker, and O. O. Olugbara, "Detecting Hate Speech on Twitter Network using Ensemble Machine Learning," International Journal of Advanced Computer Science and Applications, vol. 13, no. 3, 2022.

R. Ahluwalia, H. Soni, E. Callow, A. Nascimento, and M. De Cock, "Detecting hate speech against women in english tweets," EVALITA Evaluation of NLP and Speech Tools for Italian, vol. 12, p. 194, 2018.

M. K. A. Aljero and N. Dimililer, "A novel stacked ensemble for hate speech recognition," Applied Sciences, vol. 11, no. 24, p. 11684, 2021.

Z. Zhang and L. Luo, "Hate speech detection: A solved problem? the challenging case of long tail on twitter," Semantic Web, vol. 10, no. 5, pp. 925-945, 2019.

G. Kovács, P. Alonso, and R. Saini, "Challenges of hate speech detection in social media: Data scarcity, and leveraging external resources," SN Computer Science, vol. 2, pp. 1-15, 2021.

A. Keramatfar and H. Amirkhani, "Bibliometrics of sentiment analysis literature," Journal of Information Science, vol. 45, no. 1, pp. 3-15, 2019.

P. Sánchez-Núñez, M. J. Cobo, C. De Las Heras-Pedrosa, J. I. Peláez, and E. Herrera-Viedma, "Opinion mining, sentiment analysis and emotion understanding in advertising: a bibliometric analysis," IEEE Access, vol. 8, pp. 134563-134576, 2020.

A. Sarirete, "A Bibliometric Analysis of COVID-19 Vaccines and Sentiment Analysis," Procedia Computer Science, vol. 194, pp. 280-287, 2021.

H. Zhu and L. Lei, "The Research Trends of Text Classification Studies (2000–2020): A Bibliometric Analysis," SAGE Open, vol. 12, no. 2, p. 21582440221089963, 2022.

L. S. Adriaanse and C. Rensleigh, "Web of Science, Scopus and Google Scholar: A content comprehensiveness comparison," The Electronic Library, 2013.

P. Mongeon and A. Paul-Hus, "The journal coverage of Web of Science and Scopus: a comparative analysis," Scientometrics, vol. 106, no. 1, pp. 213-228, 2016.

C. T. Olugbara, M. Letseka, R. E. Ogunsakin, and O. O. Olugbara, "Meta-analysis of factors influencing student acceptance of massive open online courses for open distance learning," The African Journal of Information Systems, vol. 13, no. 3, p. 5, 2021.

C. T. Olugbara, M. Letseka, and O. O. Olugbara, "A Systematic Review of Digital Storytelling as Educational Tool for Teaching and Learning in Southern Africa," Multimodal Learning Environments in Southern Africa, pp. 165-195, 2022.

J. A. Moral-Muñoz, E. Herrera-Viedma, A. Santisteban-Espejo, and M. J. Cobo, "Software tools for conducting bibliometric analysis in science: An up-to-date review," Profesional de la Información, vol. 29, no. 1, 2020.

N. Donthu, S. Kumar, D. Mukherjee, N. Pandey, and W. M. Lim, "How to conduct a bibliometric analysis: An overview and guidelines," Journal of Business Research, vol. 133, pp. 285-296, 2021.

M. J. Cobo, A. G. López-Herrera, E. Herrera-Viedma, and F. Herrera, "An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field," Journal of informetrics, vol. 5, no. 1, pp. 146-166, 2011.

A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.

V. Grech and D. E. Rizk, "Increasing importance of research metrics: Journal Impact Factor and h-index," vol. 29, ed: Springer, 2018, pp. 619-620.

P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, "Deep learning for hate speech detection in tweets," in Proceedings of the 26th international conference on World Wide Web companion, 2017, pp. 759-760.

R. Khodabandelou, N. Aleebrahim, A. Amoozegar, and G. Mehran, "Revisiting three decades of educational research in Iran: A bibliometric analysis," Iranian Journal of Comparative Education, vol. 2, no. 1, pp. 1-21, 2019.

N. Van Eck and L. Waltman, "Software survey: VOSviewer, a computer program for bibliometric mapping," scientometrics, vol. 84, no. 2, pp. 523-538, 2010.

A. Kalantari et al., "A bibliometric approach to tracking big data research trends," Journal of Big Data, vol. 4, no. 1, pp. 1-18, 2017.

H. Shin and R. R. Perdue, "Self-Service Technology Research: A bibliometric co-citation visualization analysis," International Journal of Hospitality Management, vol. 80, pp. 101-112, 2019.

E. Sazany and I. Budi, "Hate speech identification in text written in Indonesian with recurrent neural network," in 2019 International Conference on Advanced Computer Science and information Systems (ICACSIS), 2019: IEEE, pp. 211-216.

M. O. Ibrohim, E. Sazany, and I. Budi, "Identify abusive and offensive language in indonesian twitter using deep learning approach," in Journal of Physics: Conference Series, 2019, vol. 1196, no. 1: IOP Publishing, p. 012041.

E. Sazany and I. Budi, "Deep learning-based implementation of hate speech identification on texts in indonesian: Preliminary study," in 2018 International Conference on Applied Information Technology and Innovation (ICAITI), 2018: IEEE, pp. 114-117.

A. G. d’Sa, I. Illina, D. Fohr, and A. Akbar, "Exploration of Multi-corpus Learning for Hate Speech Classification in Low Resource Scenarios," in Text, Speech, and Dialogue: 25th International Conference, TSD 2022, Brno, Czech Republic, September 6–9, 2022, Proceedings, 2022: Springer, pp. 238-250.

N. Zampieri, I. Illina, and D. Fohr, "Multiword expression features for automatic hate speech detection," in Natural Language Processing and Information Systems: 26th International Conference on Applications of Natural Language to Information Systems, NLDB 2021, Saarbrücken, Germany, June 23–25, 2021, Proceedings, 2021: Springer, pp. 156-164.

A. G. d'Sa, I. Illina, and D. Fohr, "Bert and fasttext embeddings for automatic detection of toxic speech," in 2020 International Multi-Conference on:“Organization of Knowledge and Advanced Technologies”(OCTA), 2020: IEEE, pp. 1-5.

M. A. Bashar, R. Nayak, K. Luong, and T. Balasubramaniam, "Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts," Social Network Analysis and Mining, vol. 11, pp. 1-18, 2021.

R. Nayak and R. Joshi, "Contextual hate speech detection in code mixed text using transformer based approaches," arXiv preprint arXiv:2110.09338, 2021.

M. Abul Bashar and R. Nayak, "QutNocturnal@ HASOC'19: CNN for Hate Speech and Offensive Content Identification in Hindi Language," arXiv e-prints, p. arXiv: 2008.12448, 2020.

A. Velankar, H. Patil, A. Gore, S. Salunke, and R. Joshi, "Hate and offensive speech detection in hindi and marathi," arXiv preprint arXiv:2110.12200, 2021.

G. L. De la Peña Sarracén and P. Rosso, "Convolutional Graph Neural Networks for Hate Speech Detection in Data-Poor Settings," in Natural Language Processing and Information Systems: 27th International Conference on Applications of Natural Language to Information Systems, NLDB 2022, Valencia, Spain, June 15–17, 2022, Proceedings, 2022: Springer, pp. 16-24.

E. Pronoza, P. Panicheva, O. Koltsova, and P. Rosso, "Detecting ethnicity-targeted hate speech in Russian social media texts," Information Processing & Management, vol. 58, no. 6, p. 102674, 2021.

J. Sánchez-Junquera, P. Rosso, M. Montes, and B. Chulvi, "Masking and bert-based models for stereotype identication," Procesamiento del Lenguaje Natural, vol. 67, pp. 83-94, 2021.

S. Frenda, S. Banerjee, P. Rosso, and V. Patti, "Do linguistic features help deep learning? The case of aggressiveness in Mexican Tweets," Computación y Sistemas, vol. 24, no. 2, pp. 633-643, 2020.

P. Singh and P. Bhattacharyya, "CFILT IIT Bombay@ HASOC-Dravidian-CodeMix FIRE 2020: Assisting ensemble of transformers with random transliteration," in FIRE (Working Notes), 2020, pp. 411-416.

P. Singha and P. Bhattacharyyaa, "CFILT IIT Bombay at HASOC 2020: Joint multitask learning of multilingual hate speech and offensive content detection system," 2020.

B. Jayaraman, T. Mirnalinee, K. R. Anandan, A. S. Kumar, and A. Anand, "Offensive text prediction using Machine Learning and Deep Learning approaches," 2021.

R. Sivanaiah, S. Angel, S. M. Rajendram, and T. Mirnalinee, "TechSSN at semeval-2022 task 5: Multimedia automatic misogyny identification using deep learning models," in Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), 2022, pp. 571-574.

D. Thenmozhi, N. Pr, S. Arunima, and A. Sengupta, "Ssn_nlp at SemEval 2020 Task 12: Offense Target Identification in Social Media Using Traditional and Deep Machine Learning Approaches," in Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020, pp. 2155-2160.

D. Thenmozhi, S. Sharavanan, and A. Chandrabose, "SSN_NLP at SemEval-2019 task 6: Offensive language identification in social media using traditional and deep machine learning approaches," in Proceedings of the 13th International Workshop on Semantic Evaluation, 2019, pp. 739-744.

R. Priyadharshini, B. R. Chakravarthi, S. Thavareesan, D. Chinnappa, D. Thenmozhi, and R. Ponnusamy, "Overview of the DravidianCodeMix 2021 shared task on sentiment detection in Tamil, Malayalam, and Kannada," in Forum for Information Retrieval Evaluation, 2021, pp. 4-6.

B. R. Chakravarthi, "Multilingual hope speech detection in English and Dravidian languages," International Journal of Data Science and Analytics, vol. 14, no. 4, pp. 389-406, 2022.

B. R. Chakravarthi et al., "Findings of the shared task on offensive language identification in Tamil, Malayalam, and Kannada," in Proceedings of the first workshop on speech and language technologies for Dravidian languages, 2021, pp. 133-145.

P. K. Kumaresan et al., "Findings of shared task on offensive language identification in Tamil and Malayalam," in Forum for Information Retrieval Evaluation, 2021, pp. 16-18.

J. A. M. Murgado, F. M. Plaza-del-Arco, J. Collado-Montañez, L. A. Ureña-López, and M. T. Martín-Valdivia, "ALIADA: Artificial Intelligence-based language applications for the detection of aggressiveness in social networks," 2022.

F. M. Plaza-del-Arco, M. D. Molina-González, L. A. Urena-López, and M. T. Martín-Valdivia, "Comparing pre-trained language models for Spanish hate speech detection," Expert Systems with Applications, vol. 166, p. 114120, 2021.

E. Aldana-Bobadilla, A. Molina-Villegas, Y. Montelongo-Padilla, I. Lopez-Arevalo, and O. S. Sordia, "A language model for misogyny detection in latin american spanish driven by multisource feature extraction and transformers," Applied Sciences, vol. 11, no. 21, p. 10467, 2021.

I. Zupic and T. Čater, "Bibliometric methods in management and organization," Organizational research methods, vol. 18, no. 3, pp. 429-472, 2015.

S. Khanra, A. Dhir, A. N. Islam, and M. Mäntymäki, "Big data analytics in healthcare: a systematic literature review," Enterprise Information Systems, vol. 14, no. 7, pp. 878-912, 2020.

N. Donthu, S. Kumar, and D. Pattnaik, "Forty-five years of Journal of Business Research: A bibliometric analysis," Journal of business research, vol. 109, pp. 1-14, 2020.

H. K. Baker, S. Kumar, and N. Pandey, "A bibliometric analysis of managerial finance: a retrospective," Managerial Finance, 2020.

R. T. Mutanga, N. Naicker, and O. O. Olugbara, "Hate speech detection in twitter using transformer methods," International Journal of Advanced Computer Science and Applications, vol. 11, no. 9, 2020.

N. S. Mullah and W. M. N. W. Zainon, "Advances in machine learning algorithms for hate speech detection in social media: a review," IEEE Access, vol. 9, pp. 88364-88376, 2021.

S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder, "Hate speech detection: Challenges and solutions," PloS one, vol. 14, no. 8, p. e0221152, 2019.