Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques

  • Febriyanti Panjaitan Satu University, Indonesia
  • Win Ce Bina Nusantara University, Indonesia
  • Hery Oktafiandi Satu University, Indonesia
  • Ghanim Kanugrahan Satu University, Indonesia
  • Yudi Ramdhani Satu University, Indonesia
  • Vito Hafizh Cahaya Putra Satu University, Indonesia
Keywords: Sentiment Analysis, Machine Learning, Governor Election, Twitter, YouTube, CountVectorizer, Balancing Data.

Abstract

Sentiment analysis is crucial for understanding public opinion, especially in political contexts like the 2024 South Sumatra gubernatorial election. Social media platforms such as Twitter and YouTube provide key sources of public sentiment, which can be analyzed using machine learning to classify opinions as positive, neutral, or negative. However, challenges such as data imbalance and selecting the right model to improve classification accuracy remain significant. This study compares five machine learning algorithms (SVM, Naïve Bayes, KNN, Decision Tree, and Random Forest) and examines the impact of data balancing on their performance. Data was collected via Twitter crawling (140 entries) and YouTube scraping (384 entries), and text features were extracted using CountVectorizer. The models were then evaluated on imbalanced and balanced datasets using accuracy, precision, recall, and F1-score. The Decision Tree and Random Forest models achieved the highest accuracies of 79.22% and 75.32% on imbalanced data, respectively. However, they also exhibited overfitting, as indicated by their near-perfect training performance. Naïve Bayes, on the other hand, demonstrated the lowest accuracy at 54.55% despite achieving high precision, suggesting frequent misclassification, particularly for the minority class. SVM and KNN also struggled with imbalanced data, recording accuracies of 58.44% and 63.64%, respectively. Significant improvements were observed after applying data balancing techniques. The accuracy of SVM increased to 71.43%, and KNN improved to 66.23%, indicating that these models are more stable and effective when class distributions are even. These findings highlight the substantial impact of data balancing on model performance, particularly for methods sensitive to class distribution. While tree-based models achieved high accuracy on imbalanced data, their tendency to overfit underscores the importance of balancing techniques to enhance model generalization.

Downloads

Download data is not yet available.

References

H. M. Duryat and M. Pd, Indramayu: Menuju Kontestasi Pilkada 2024, Problem Kepemimpinan, Demokratisasi dan Pembangunan Berkelanjutan. Penerbit Adab, 2024.

S. N. Rahim, H. N. Shabrina, R. Salsabila, S. Hanum, N. A.-R. Zemlya, and A. Rahman, “Peran buzzer di media sosial dalam membentuk opini kebijakan publik di masyarakat pada Pemilu 2024,” PubBis: J. Pemikir. Penelit. Adm. Publik Adm. Bisnis, vol. 8, no. 2, pp. 147–158, 2024.

A. Karim, Big Data Analytics: Analisis Sentimen Netizen di Era Media Baru. Penerbit NEM, 2025.

A. J. Sutan, “Media sosial di masa pandemi: Media sosial digunakan untuk menolak diskriminasi rasial kasus kampanye #AsianLivesMatter di Amerika Serikat,” Analisis, p. 93.

R. Safitri, N. Alfira, D. Tamitiadini, W. W. A. Dewi, and N. Febriani, Analisis Sentimen: Metode Alternatif Penelitian Big Data. Universitas Brawijaya Press, 2021.

M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artif. Intell. Rev., vol. 55, no. 7, pp. 5731–5780, 2022.

A. Peivandizadeh et al., “Stock market prediction with transductive long short-term memory and social media sentiment analysis,” IEEE Access, 2024.

M. R. Pavan Kumar and P. Jayagopal, “Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM,” J. Intell. Manuf., vol. 34, no. 5, pp. 2123–2132, 2023.

M. Mujahid et al., “Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, p. 87, 2024.

A. R. Hakim, W. Gata, A. Z. P. Widodo, O. Kurniawan, and A. R. Syarif, “Analisis perbandingan algoritma machine learning terhadap sentimen analis pemindahan ibu kota negara,” J. JTIK (J. Teknol. Inf. Komun.), vol. 7, no. 2, pp. 179–185, 2023.

R. Aryanti, T. Misriati, and A. Sagiyanto, “Analisis sentimen aplikasi Primaku menggunakan algoritma Random Forest dan SMOTE untuk mengatasi ketidakseimbangan data,” J. Comput. Syst. Informatics (JoSYC), vol. 5, no. 1, pp. 218–227, 2023.

M. M. Effendy, T. E. Sutanto, and M. Liebenlito, “Efektivitas variabel demografi pengguna Twitter dalam prediksi Pilpres Indonesia 2014 dan 2019,” Indones. J. Comput. Sci., vol. 12, no. 6, 2023.

L. A. Andika, P. A. N. Azizah, and R. Respatiwulan, “Analisis sentimen masyarakat terhadap hasil quick count pemilihan presiden Indonesia 2019 pada media sosial Twitter menggunakan metode Naïve Bayes classifier,” Indones. J. Appl. Stat., vol. 2, no. 1, pp. 34–41, 2019.

A. Halim et al., “Klasifikasi sentimen masyarakat di Twitter terhadap Prabowo Subianto sebagai bakal calon presiden 2024 menggunakan M-KNN,” J. Inf. Syst. Res. (JOSH), vol. 5, no. 1, pp. 202–212, 2023.

M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and A. Agustin, “Penerapan Naïve Bayes classifier, K-Nearest Neighbor (KNN) dan decision tree untuk menganalisis sentimen pada interaksi netizen dan pemerintah,” MATRIK: J. Manaj. Tek. Inform. Rekayasa Komput., vol. 21, no. 1, pp. 139–150, 2021.

D. Alita and A. R. Isnain, “Pendeteksian sarkasme pada proses analisis sentimen menggunakan Random Forest classifier,” J. Komputasi, vol. 8, no. 2, pp. 50–58, 2020.

W. W. H. Cholil, F. Panjaitan, F. Ferdiansyah, A. Arista, R. Astriratma, and T. Rahayu, “Comparison of machine learning methods in sentiment analysis PeduliLindungi applications,” in Proc. 2022 Int. Conf. Informatics, Multimedia, Cyber Inf. Syst. (ICIMCIS), Nov. 2022, pp. 276–280, IEEE.

E. Setiani and W. Ce, “Text classification services using Naïve Bayes for Bahasa Indonesia,” in Proc. 2018 Int. Conf. Inf. Manag. Technol. (ICIMTech), IEEE, 2018, pp. 361–366.

G. Kanugrahan and A. F. Wicaksono, “Sentiment analysis of face-to-face learning during COVID-19 pandemic using Twitter data,” in Proc. 2021 8th Int. Conf. Adv. Informatics: Concepts, Theory Appl. (ICAICTA), IEEE, 2021, pp. 1–6.

I. K. A. B. Artana, G. A. Pradnyana, and I. G. M. Darmawiguna, “Analisis sentimen Twitter untuk menilai kesiapan pembelajaran tatap muka terbatas dengan Inset Lexicon dan Levenshtein Distance,” J. Pendidik. Teknol. Kejuruan, vol. 20, no. 2, pp. 200–209, 2023.

M. K. Anam, “Penerapan metode support vector machine untuk analisis sentimen terhadap produk skincare,” Indones. J. Comput. Sci., vol. 13, no. 1, 2024.

V. Bonta, N. Kumaresh, and N. Janardhan, “A comprehensive study on lexicon-based approaches for sentiment analysis,” Asian J. Comput. Sci. Technol., vol. 8, no. S2, pp. 1–6, 2019.

S. N. Almuayqil, M. Humayun, N. Z. Jhanjhi, M. F. Almufareh, and D. Javed, “Framework for improved sentiment analysis via random minority oversampling for user tweet review classification,” Electronics (Basel), vol. 11, no. 19, p. 3058, 2022.

M. Fachrie, A. Musdholifah, and S. Hartati, “Improving sentiment analysis performance on imbalanced dataset using data resampling and statistical feature selection,” in Proc. 2024 8th Int. Conf. Inf. Technol. (InCIT), Nov. 2024, pp. 272–277, IEEE.

A. Miftahusalam, A. F. Nuraini, A. A. Khoirunisa, and H. Pratiwi, “Perbandingan algoritma Random Forest, Naïve Bayes, dan support vector machine pada analisis sentimen Twitter mengenai opini masyarakat terhadap penghapusan tenaga honorer,” in Seminar Nasional Official Statistics, vol. 2022, no. 1, pp. 563–572, Nov. 2022.

I. N. Husada and H. Toba, “Pengaruh metode penyeimbangan kelas terhadap tingkat akurasi analisis sentimen pada tweets berbahasa Indonesia,” J. Tek. Inform. Sist. Informasi, vol. 6, no. 2, 2020.

B. B. Baskoro, I. Susanto, and S. Khomsah, “Analisis sentimen pelanggan hotel di Purwokerto menggunakan metode Random Forest dan TF-IDF (studi kasus: ulasan pelanggan pada situs TRIPADVISOR),” J. Informatics Inf. Syst. Softw. Eng. Appl. (INISTA), vol. 3, no. 2, pp. 21–29, 2021.

O. I. Gifari, M. Adha, I. R. Hendrawan, and F. F. S. Durrand, “Analisis sentimen review film menggunakan TF-IDF dan support vector machine,” J. Inf. Technol., vol. 2, no. 1, pp. 36–40, 2022.

I. Saputra, R. S. A. Pambudi, H. E. Darono, F. Amsury, M. R. Fahdia, B. Ramadhan, and A. Ardiansyah, “Analisis sentimen pengguna marketplace Bukalapak dan Tokopedia di Twitter menggunakan machine learning,” Faktor Exacta, vol. 13, no. 4, pp. 200–207, 2021.

G. A. Buntoro, R. Arifin, G. N. Syaifuddiin, A. Selamat, O. Krejcar, and F. Hamido, “The implementation of the machine learning algorithm for the sentiment analysis of Indonesia’s 2019 presidential election,” IIUM Eng. J., vol. 22, no. 1, pp. 78–92, 2021.

C. A. N. Agustina, R. Novita, and N. E. Rozanda, “The implementation of TF-IDF and Word2Vec on booster vaccine sentiment analysis using support vector machine algorithm,” Procedia Comput. Sci., vol. 234, pp. 156–163, 2024.

I. Lazrig and S. L. Humpherys, “Using machine learning sentiment analysis to evaluate learning impact,” Inf. Syst. Educ. J., vol. 20, no. 1, pp. 13–21, 2022.

P. Rahayu, I. G. I. S. Wibawa, S. Suryani, A. Surachman, A. Ridwan, I. G. M. Darmawiguna, M. N. Sutoyo, I. Slamet, S. Harlina, and I. M. D. Maysanjaya, Buku Ajar Data Mining. PT. Sonpedia Publishing Indonesia, 2024.

P. P. Armaeni, I. K. A. G. Wiguna, and W. G. S. Parwita, “Sentiment analysis of YouTube comments on the closure of TikTok Shop using Naïve Bayes and decision tree method comparison,” J. Galaksi, vol. 1, no. 2, pp. 70–80, 2024.

A. M. Mantika, A. Triayudi, and R. T. Aldisa, “Sentiment analysis on Twitter using Naïve Bayes and logistic regression for the 2024 presidential election,” SaNa: J. Blockchain, NFTs Metaverse Technol., vol. 2, no. 1, pp. 44–55, 2024.

Published
2025-03-21
Abstract views: 182 times
Download PDF: 133 times
How to Cite
Panjaitan, F., Ce, W., Oktafiandi, H., Kanugrahan, G., Ramdhani, Y., & Putra, V. H. (2025). Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques. Journal of Information Systems and Informatics, 7(1), 461-478. https://doi.org/10.51519/journalisi.v7i1.1019
Section
Articles