The Problem of Data Extraction in Social Media: A Theoretical Framework

  • Tarirai Chani Durban University of Thenology, South Africa http://orcid.org/0000-0003-3813-7101
  • Oludayo O Olugbara Durban University of Technology, South Africa
  • Bethel Mutanga Mangosuthu University of Technology, South Africa
Keywords: Social media data, data extraction, social media data quality, theoretical framework, social network analysis

Abstract

In today's rapidly evolving digital landscape, the pervasive growth of social media platforms has resulted in an era of unprecedented data generation. These platforms are responsible for generating vast volumes of data on a daily basis, forming intricate webs of patterns and connections that harbor invaluable insights crucial for informed decision-making. Recognizing the significance of exploring social media data, researchers have increasingly turned their attention towards leveraging this data to address a wide array of social research issues. Unlike conventional data collection methods such as questionnaires, interviews, or focus groups, social media data presents unique challenges and opportunities, demanding specialized techniques for its extraction and analysis. However, the absence of a standardized and systematic approach to collect and preprocess social media data remains a gap in the field. This gap not only compromises the quality and credibility of subsequent data analysis but also hinders the realization of the full potential inherent in social media data. This paper aims to bridge this gap by presenting a comprehensive framework designed for the systematic extraction and processing of social media data. The proposed framework offers a clear, step-by-step methodology for the extraction and processing of social media data for analysis. In an era where social media data serves as a pivotal resource for understanding human behavior, sentiment, and societal dynamics, this framework offers a foundational toolset for researchers and practitioners seeking to harness the wealth of insights concealed within the vast expanse of social media data.

Downloads

Download data is not yet available.

References

D. L. Rodkey, S. Y. Nelson, A. E. Lundy, and M. D. Helgeson, "Exponential growth of social media utilization among orthopaedic surgery residency programs: a cross-sectional study," Current Orthopaedic Practice, vol. 32, no. 5, pp. 500-504, 2021.

D. Chaffey. "Global social media statistics research summary 2023." Smart Insights. https://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research/ (accessed 04 April 2023).

S. Aslam. "Twitter by the Numbers: Stats, Demographics & Fun Facts." https://www.omnicoreagency.com (accessed 04 April 2023).

S. I. Sumer and N. Parilti, Social Media Analytics in Predicting Consumer Behavior. CRC Press, 2023.

J. Luo, J. Du, C. Tao, H. Xu, and Y. Zhang, "Exploring temporal suicidal behavior patterns on social media: Insight from Twitter analytics," Health informatics journal, vol. 26, no. 2, pp. 738-752, 2020.

C.-w. Shen, M. Chen, and C.-c. Wang, "Analyzing the trend of O2O commerce by bilingual text mining on social media," Computers in Human Behavior, vol. 101, pp. 474-483, 2019.

J. Ranjan and C. Foropon, "Big data analytics in building the competitive intelligence of organizations," International Journal of Information Management, vol. 56, pp. 1-13, 2021.

F. J. Lacarcel and R. Huete, "Digital communication strategies used by private companies, entrepreneurs, and public entities to attract long-stay tourists: a review," International Entrepreneurship and Management Journal, pp. 1-18, 2023.

I. Lee, "Social media analytics for enterprises: Typology, methods, and processes," Business Horizons, vol. 61, no. 2, pp. 199-210, 2018.

I. Taleb, M. A. Serhani, and R. Dssouli, "Big data quality: A survey," in 2018 IEEE International Congress on Big Data (BigData Congress), 2018: IEEE, pp. 166-173.

W. Elouataoui, I. E. Alaoui, and Y. Gahi, "Data Quality in the Era of Big Data: A Global Review," Big Data Intelligence for Smart Applications, pp. 1-25, 2022.

R. Rawat and R. Yadav, "Big data: Big data analysis, issues and challenges and technologies," in IOP Conference Series: Materials Science and Engineering, 2021, vol. 1022, no. 1: IOP Publishing, pp. 1-9.

S. Kaisler, J. A. Espinosa, W. Money, and F. Armour, "Big Data and Analytics: Issues and Challenges for the Past and Next Ten Years," pp. 805-814, 2023.

M. Naeem et al., "Trends and future perspective challenges in big data," in Advances in Intelligent Data Analysis and Applications: Proceeding of the Sixth Euro-China Conference on Intelligent Data Analysis and Applications, 15–18 October 2019, Arad, Romania, 2022: Springer, pp. 309-325.

M. Henderson, K. Jiang, M. Johnson, and L. Porter, "Measuring Twitter use: validating survey-based measures," Social Science Computer Review, vol. 39, no. 6, pp. 1121-1141, 2021.

L. Pilař, L. Kvasničková Stanislavská, R. Kvasnička, P. Bouda, and J. Pitrová, "Framework for Social Media Analysis Based on Hashtag Research," Applied Sciences, vol. 11, no. 8, p. 3697, 2021. [Online]. Available: https://www.mdpi.com/2076-3417/11/8/3697.

L. Pilař, L. Kvasničková Stanislavská, J. Pitrová, I. Krejčí, I. Tichá, and M. Chalupová, "Twitter analysis of global communication in the field of sustainability," Sustainability, vol. 11, no. 24, p. 6958, 2019.

L. Kvasničková Stanislavská, L. Pilař, K. Margarisová, and R. Kvasnička, "Corporate social responsibility and social media: Comparison between developing and developed countries," Sustainability, vol. 12, no. 13, p. 5255, 2020.

L. Pilař, L. Kvasničková Stanislavská, G. Gresham, J. Poláková, S. Rojík, and R. Petkov, "Questionnaire vs. social media analysis-Case study of organic food," AGRIS on-line Papers in Economics and Informatics, vol. 10, no. 665-2019-272, pp. 93-101, 2018.

J. Yang, P. Xiu, L. Sun, L. Ying, and B. Muthu, "Social media data analytics for business decision making system to competitive analysis," Information Processing & Management, vol. 59, no. 1, p. 102751, 2022.

M. Imran and A. Ahmad, "Enhancing data quality to mine credible patterns," Journal of Information Science, vol. 49, no. 2, pp. 544-564, 2023.

H. Zhang, Z. Zang, H. Zhu, M. I. Uddin, and M. A. Amin, "Big data-assisted social media analytics for business model for business decision making system competitive analysis," Information Processing & Management, vol. 59, no. 1, p. 102762, 2022/01/01/ 2022, doi: https://doi.org/10.1016/j.ipm.2021.102762.

P. Kumar and A. Sinha, "Information diffusion modeling and analysis for socially interacting networks," Social Network Analysis and Mining, vol. 11, pp. 1-18, 2021.

I. G. García and A. Mateos, "Use of Social Network Analysis for Tax Control in Spain," Hacienda Publica Espanola, no. 239, pp. 159-197, 2021.

D. J. Brass, "New developments in social network analysis," Annual Review of Organizational Psychology and Organizational Behavior, vol. 9, pp. 225-246, 2022.

R. Gould, Graph theory. Courier Corporation, 2012.

A. Majeed and I. Rauf, "Graph theory: A comprehensive survey about graph theory applications in computer science and social networks," Inventions, vol. 5, no. 1, p. 10, 2020.

S. P. Borgatti and D. J. Brass, "Centrality: Concepts and measures," Social networks at work, pp. 9-22, 2019.

H. Zhu, X. Yang, and J. Wei, "Path prediction of information diffusion based on a topic-oriented relationship strength network," Information Sciences, vol. 631, pp. 108-119, 2023.

A. Tsang, B. Wilder, E. Rice, M. Tambe, and Y. Zick, "Group-fairness in influence maximization," arXiv preprint arXiv:1903.00967, 2019.

K. Li, L. Zhang, and H. Huang, "Social influence analysis: models, methods, and evaluation," Elsevier: Engineering, vol. 4, no. 1, pp. 40-46. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2095809917308056

M. Azaouzi, W. Mnasri, and L. B. Romdhane, "New trends in influence maximization models," Computer Science Review, vol. 40, p. 100393, 2021.

T. Baldwin, P. Cook, M. Lui, A. MacKinlay, and L. Wang, "How noisy social media text, how diffrnt social media sources?," in Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013, pp. 356-364.

L. S. Lai and W. M. To, "Content analysis of social media: A grounded theory approach," Journal of Electronic Commerce Research, vol. 16, no. 2, p. 138, 2015.

S. Myneni, N. K. Cobb, and T. Cohen, "Finding meaning in social media: content-based social network analysis of QuitNet to identify new opportunities for health promotion," in MEDINFO 2013: IOS Press, 2013, pp. 807-811.

N. Crossley, "Content and context in social network analysis," in Networks in the Global World V: Proceedings of NetGloW 2020 5, 2021: Springer, pp. 3-14.

H. Purohit, Y. Ruan, A. Joshi, S. Parthasarathy, and A. Sheth, "Understanding user-community engagement by multi-faceted features: A case study on twitter," in WWW 2011 Workshop on Social Media Engagement (SoME), 2011.

S. Nepal, W. Sherchan, and C. Paris, "Building trust communities using social trust," in Advances in User Modeling: UMAP 2011 Workshops, Girona, Spain, July 11-15, 2011, Revised Selected Papers 19, 2012: Springer, pp. 243-255.

C. Buntain and J. Golbeck, "Automatically identifying fake news in popular twitter threads," in 2017 IEEE international conference on smart cloud (smartCloud), 2017: IEEE, pp. 208-215.

M. Mahdavi, M. Asadpour, and S. M. Ghavami, "A comprehensive analysis of tweet content and its impact on popularity," in 2016 8th International Symposium on Telecommunications (IST), 2016: IEEE, pp. 559-564.

S. Kong, L. Feng, G. Sun, and K. Luo, "Predicting lifespans of popular tweets in microblog," in Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, 2012, pp. 1129-1130.

P. Zola, G. Cola, M. Mazza, and M. Tesconi, "Interaction strength analysis to model retweet cascade graphs," Applied Sciences, vol. 10, no. 23, p. 8394, 2020.

C. Salvatore, S. Biffignandi, and A. Bianchi, "Social media and twitter data quality for new social indicators," Social Indicators Research, vol. 156, pp. 601-630, 2021.

R. Pozzar et al., "Threats of bots and other bad actors to data quality following research participant recruitment through social media: cross-sectional questionnaire," Journal of medical Internet research, vol. 22, no. 10, p. e23021, 2020.

F. A. Batarseh and A. Kulkarni, "Context-driven data mining through bias removal and data incompleteness mitigation," arXiv preprint arXiv:1910.08670, 2019.

A. N. Islam, S. Laato, S. Talukder, and E. Sutinen, "Misinformation sharing and social media fatigue during COVID-19: An affordance and cognitive load perspective," Technological forecasting and social change, vol. 159, p. 120201, 2020.

J. Li, Q. Xu, R. Cuomo, V. Purushothaman, and T. Mackey, "Data mining and content analysis of the Chinese social media platform Weibo during the early COVID-19 outbreak: retrospective observational infoveillance study," JMIR Public Health and Surveillance, vol. 6, no. 2, p. e18700, 2020.

Twitter Inc. . "Rate limits: Standard v1.1 Twitter Developer Platform " Twitter, Inc. https://developer.twitter.com/en/docs/twitter-api/v1/rate-limits (accessed 14 April 2023).

K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, "Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media," Big data, vol. 8, no. 3, pp. 171-188, 2020.

P. Koukaras and C. Tjortjis, "Social media analytics, types and methodology," Machine Learning Paradigms: Applications of Learning and Analytics in Intelligent Systems, pp. 401-427, 2019.

D. Antonakaki, P. Fragopoulou, and S. Ioannidis, "A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks," Expert Systems with Applications, vol. 164, p. 114006, 2021.

D. Henry, "Twiscraper: a collaborative project to enhance twitter data collection," in Proceedings of the 14th ACM international conference on web search and data mining, 2021, pp. 886-889.

R. H. Hariri, E. M. Fredericks, and K. M. Bowers, "Uncertainty in big data analytics: survey, opportunities, and challenges," Journal of Big Data, vol. 6, no. 1, pp. 1-16, 2019.

N. A. Ghani, S. Hamid, I. A. T. Hashem, and E. Ahmed, "Social media big data analytics: A survey," Computers in Human Behavior, vol. 101, pp. 417-428, 2019.

S. Stier, J. Breuer, P. Siegers, and K. Thorson, "Integrating survey data and digital trace data: Key issues in developing an emerging field," vol. 38, ed: SAGE Publications Sage CA: Los Angeles, CA, 2020, pp. 503-516.

C. Fuchs, "Social media: A critical introduction," Social Media, pp. 1-440, 2021.

P. Martí, L. Serrano-Estrada, and A. Nolasco-Cirugeda, "Social media data: Challenges, opportunities and limitations in urban studies," Computers, Environment and Urban Systems, vol. 74, pp. 161-174, 2019.

A. Ghahramani and M. Prokofieva, "Visualisation for social media analytics: landscape of R packages," in 2021 25th International Conference Information Visualisation (IV), 2021: IEEE, pp. 218-222.

J. Lowe and M. Matthee, "Requirements of data visualisation tools to analyse big data: A structured literature review," in Responsible Design, Implementation and Use of Information and Communication Technology: 19th IFIP WG 6.11 Conference on e-Business, e-Services, and e-Society, I3E 2020, Skukuza, South Africa, April 6–8, 2020, Proceedings, Part I 19, 2020: Springer, pp. 469-480.

Published
2023-12-02
Abstract views: 896 times
Download PDF: 569 times
How to Cite
Chani, T., Olugbara, O., & Mutanga, B. (2023). The Problem of Data Extraction in Social Media: A Theoretical Framework. Journal of Information Systems and Informatics, 5(4), 1363-1384. https://doi.org/10.51519/journalisi.v5i4.585