Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling

  • Satya Abdul Halim Bahtiar Universitas Islam Indonesia, Indonesia
  • Chandra Kusuma Dewa Universitas Islam Indonesia, Indonesia
  • Ahmad Luthfi Universitas Islam Indonesia, Indonesia
Keywords: Naïve Bayes, Logistic Regression, Marketplace, Google Play Store, Rating-based Labeling

Abstract

This research focuses on sentiment analysis in the marketplace reviews in Google Play Store, a platform for downloading Android applications and providing reviews. Sentiment analysis is essential for understanding user responses to applications, particularly in the app marketplace. In this study, two machine learning algorithms, Naïve Bayes and Logistic Regression, are employed to classify user reviews. The application rating is used as a reference to determine the sentiment of each comment. The dataset is divided into two conditions: using 2 labels (positive & negative) and 3 labels (positive, neutral, & negative). The test results indicate that the highest performance is achieved by classifying with Logistic Regression on the Shopee dataset with 2 labels. The accuracy reaches 84.58%, precision reaches 84.66%, and recall reaches 84.63%. Additionally, the fastest processing time occurs when testing the Lazada 2-label dataset with Naïve Bayes, taking only 0.038 seconds. Overall, the research suggests that datasets with 2 labels tend to yield higher accuracy compared to datasets with 3 labels.

Downloads

Download data is not yet available.

References

Pristiyono, M. Ritonga, M. A. Al Ihsan, A. Anjar, and F. H. Rambe, “Sentiment analysis of COVID-19 vaccine in Indonesia using Naïve Bayes Algorithm,” IOP Conf Ser Mater Sci Eng, vol. 1088, no. 1, p. 12045, Feb. 2021, doi: 10.1088/1757-899X/1088/1/012045.

D. Pratmanto, R. Rousyati, F. F. Wati, A. E. Widodo, S. Suleman, and R. Wijianto, “App Review Sentiment Analysis Shopee Application In Google Play Store Using Naive Bayes Algorithm,” J Phys Conf Ser, vol. 1641, no. 1, p. 12043, Nov. 2020, doi: 10.1088/1742-6596/1641/1/012043.

H. Hasanli and S. Rustamov, “Sentiment Analysis of Azerbaijani twits Using Logistic Regression, Naive Bayes and SVM,” in 2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT), Baku, Azerbaijan: IEEE, Oct. 2019, pp. 1–7. doi: 10.1109/AICT47866.2019.8981793.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,” Augmented Human Research, vol. 5, no. 1, p. 12, Dec. 2020, doi: 10.1007/s41133-020-00032-0.

T. H. Jaya Hidayat, Y. Ruldeviyani, A. R. Aditama, G. R. Madya, A. W. Nugraha, and M. W. Adisaputra, “Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier,” Procedia Comput Sci, vol. 197, pp. 660–667, 2022, doi: 10.1016/j.procs.2021.12.187.

Y. A. Singgalen, “Pemilihan Metode dan Algoritma dalam Analisis Sentimen di Media Sosial : Sistematic Literature Review,” Journal of Information Systems and Informatics, vol. 3, no. 2, 2021, [Online]. Available: http://journal-isi.org/index.php/isi

A. Poornima and K. S. Priya, “A Comparative Sentiment Analysis Of Sentence Embedding Using Machine Learning Techniques,” in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India: IEEE, Mar. 2020, pp. 493–496. doi: 10.1109/ICACCS48705.2020.9074312.

Raksaka Indra Alhaqq, I Made Kurniawan Putra, and Yova Ruldeviyani, “Analisis Sentimen terhadap Penggunaan Aplikasi MySAPK BKN di Google Play Store,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 11, no. 2, pp. 105–113, May 2022, doi: 10.22146/jnteti.v11i2.3528.

H. Aldabbas, A. Bajahzar, M. Alruily, A. A. Qureshi, R. M. Amir Latif, and M. Farhan, “Google Play Content Scraping and Knowledge Engineering using Natural Language Processing Techniques with the Analysis of User Reviews,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 192–208, Jul. 2020, doi: 10.1515/jisys-2019-0197.

S. Pradha, M. N. Halgamuge, and N. Tran Quoc Vinh, “Effective Text Data Preprocessing Technique for Sentiment Analysis in Social Media Data,” in 2019 11th International Conference on Knowledge and Systems Engineering (KSE), Da Nang, Vietnam: IEEE, Oct. 2019, pp. 1–8. doi: 10.1109/KSE.2019.8919368.

Y. A. Singgalen, “Sentiment Analysis on Customer Perception towards Products and Services of Restaurant in Labuan Bajo,” Journal of Information Systems and Informatics, vol. 4, no. 3, 2022, [Online]. Available: http://journal-isi.org/index.php/isi

B. Khemani and A. Adgaonkar, “A Review on Reddit News Headlines with NLTK tool,” SSRN Electronic Journal, 2021, doi: 10.2139/ssrn.3834240.

M. A. Rosid, A. S. Fitrani, I. R. I. Astutik, N. I. Mulloh, and H. A. Gozali, “Improving Text Preprocessing For Student Complaint Document Classification Using Sastrawi,” IOP Conf Ser Mater Sci Eng, vol. 874, no. 1, p. 12017, Jun. 2020, doi: 10.1088/1757-899X/874/1/012017.

Ž. Ð. Vujovic, “Classification Model Evaluation Metrics,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120670.

Published
2023-08-29
Abstract views: 1822 times
Download PDF: 1361 times
How to Cite
Bahtiar, S. A., Dewa, C., & Luthfi, A. (2023). Comparison of Naïve Bayes and Logistic Regression in Sentiment Analysis on Marketplace Reviews Using Rating-Based Labeling. Journal of Information Systems and Informatics, 5(3), 915-927. https://doi.org/10.51519/journalisi.v5i3.539