Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques
Abstract
Sentiment analysis is crucial for understanding public opinion, especially in political contexts like the 2024 South Sumatra gubernatorial election. Social media platforms such as Twitter and YouTube provide key sources of public sentiment, which can be analyzed using machine learning to classify opinions as positive, neutral, or negative. However, challenges such as data imbalance and selecting the right model to improve classification accuracy remain significant. This study compares five machine learning algorithms (SVM, Naïve Bayes, KNN, Decision Tree, and Random Forest) and examines the impact of data balancing on their performance. Data was collected via Twitter crawling (140 entries) and YouTube scraping (384 entries), and text features were extracted using CountVectorizer. The models were then evaluated on imbalanced and balanced datasets using accuracy, precision, recall, and F1-score. The Decision Tree and Random Forest models achieved the highest accuracies of 79.22% and 75.32% on imbalanced data, respectively. However, they also exhibited overfitting, as indicated by their near-perfect training performance. Naïve Bayes, on the other hand, demonstrated the lowest accuracy at 54.55% despite achieving high precision, suggesting frequent misclassification, particularly for the minority class. SVM and KNN also struggled with imbalanced data, recording accuracies of 58.44% and 63.64%, respectively. Significant improvements were observed after applying data balancing techniques. The accuracy of SVM increased to 71.43%, and KNN improved to 66.23%, indicating that these models are more stable and effective when class distributions are even. These findings highlight the substantial impact of data balancing on model performance, particularly for methods sensitive to class distribution. While tree-based models achieved high accuracy on imbalanced data, their tendency to overfit underscores the importance of balancing techniques to enhance model generalization.
Downloads
References
H. M. Duryat and M. Pd, Indramayu: Menuju Kontestasi Pilkada 2024, Problem Kepemimpinan, Demokratisasi dan Pembangunan Berkelanjutan. Penerbit Adab, 2024.
S. N. Rahim, H. N. Shabrina, R. Salsabila, S. Hanum, N. A.-R. Zemlya, and A. Rahman, “Peran buzzer di media sosial dalam membentuk opini kebijakan publik di masyarakat pada Pemilu 2024,” PubBis: J. Pemikir. Penelit. Adm. Publik Adm. Bisnis, vol. 8, no. 2, pp. 147–158, 2024.
A. Karim, Big Data Analytics: Analisis Sentimen Netizen di Era Media Baru. Penerbit NEM, 2025.
A. J. Sutan, “Media sosial di masa pandemi: Media sosial digunakan untuk menolak diskriminasi rasial kasus kampanye #AsianLivesMatter di Amerika Serikat,” Analisis, p. 93.
R. Safitri, N. Alfira, D. Tamitiadini, W. W. A. Dewi, and N. Febriani, Analisis Sentimen: Metode Alternatif Penelitian Big Data. Universitas Brawijaya Press, 2021.
M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artif. Intell. Rev., vol. 55, no. 7, pp. 5731–5780, 2022.
A. Peivandizadeh et al., “Stock market prediction with transductive long short-term memory and social media sentiment analysis,” IEEE Access, 2024.
M. R. Pavan Kumar and P. Jayagopal, “Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM,” J. Intell. Manuf., vol. 34, no. 5, pp. 2123–2132, 2023.
M. Mujahid et al., “Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering,” J. Big Data, vol. 11, no. 1, p. 87, 2024.
A. R. Hakim, W. Gata, A. Z. P. Widodo, O. Kurniawan, and A. R. Syarif, “Analisis perbandingan algoritma machine learning terhadap sentimen analis pemindahan ibu kota negara,” J. JTIK (J. Teknol. Inf. Komun.), vol. 7, no. 2, pp. 179–185, 2023.
R. Aryanti, T. Misriati, and A. Sagiyanto, “Analisis sentimen aplikasi Primaku menggunakan algoritma Random Forest dan SMOTE untuk mengatasi ketidakseimbangan data,” J. Comput. Syst. Informatics (JoSYC), vol. 5, no. 1, pp. 218–227, 2023.
M. M. Effendy, T. E. Sutanto, and M. Liebenlito, “Efektivitas variabel demografi pengguna Twitter dalam prediksi Pilpres Indonesia 2014 dan 2019,” Indones. J. Comput. Sci., vol. 12, no. 6, 2023.
L. A. Andika, P. A. N. Azizah, and R. Respatiwulan, “Analisis sentimen masyarakat terhadap hasil quick count pemilihan presiden Indonesia 2019 pada media sosial Twitter menggunakan metode Naïve Bayes classifier,” Indones. J. Appl. Stat., vol. 2, no. 1, pp. 34–41, 2019.
A. Halim et al., “Klasifikasi sentimen masyarakat di Twitter terhadap Prabowo Subianto sebagai bakal calon presiden 2024 menggunakan M-KNN,” J. Inf. Syst. Res. (JOSH), vol. 5, no. 1, pp. 202–212, 2023.
M. K. Anam, B. N. Pikir, M. B. Firdaus, S. Erlinda, and A. Agustin, “Penerapan Naïve Bayes classifier, K-Nearest Neighbor (KNN) dan decision tree untuk menganalisis sentimen pada interaksi netizen dan pemerintah,” MATRIK: J. Manaj. Tek. Inform. Rekayasa Komput., vol. 21, no. 1, pp. 139–150, 2021.
D. Alita and A. R. Isnain, “Pendeteksian sarkasme pada proses analisis sentimen menggunakan Random Forest classifier,” J. Komputasi, vol. 8, no. 2, pp. 50–58, 2020.
W. W. H. Cholil, F. Panjaitan, F. Ferdiansyah, A. Arista, R. Astriratma, and T. Rahayu, “Comparison of machine learning methods in sentiment analysis PeduliLindungi applications,” in Proc. 2022 Int. Conf. Informatics, Multimedia, Cyber Inf. Syst. (ICIMCIS), Nov. 2022, pp. 276–280, IEEE.
E. Setiani and W. Ce, “Text classification services using Naïve Bayes for Bahasa Indonesia,” in Proc. 2018 Int. Conf. Inf. Manag. Technol. (ICIMTech), IEEE, 2018, pp. 361–366.
G. Kanugrahan and A. F. Wicaksono, “Sentiment analysis of face-to-face learning during COVID-19 pandemic using Twitter data,” in Proc. 2021 8th Int. Conf. Adv. Informatics: Concepts, Theory Appl. (ICAICTA), IEEE, 2021, pp. 1–6.
I. K. A. B. Artana, G. A. Pradnyana, and I. G. M. Darmawiguna, “Analisis sentimen Twitter untuk menilai kesiapan pembelajaran tatap muka terbatas dengan Inset Lexicon dan Levenshtein Distance,” J. Pendidik. Teknol. Kejuruan, vol. 20, no. 2, pp. 200–209, 2023.
M. K. Anam, “Penerapan metode support vector machine untuk analisis sentimen terhadap produk skincare,” Indones. J. Comput. Sci., vol. 13, no. 1, 2024.
V. Bonta, N. Kumaresh, and N. Janardhan, “A comprehensive study on lexicon-based approaches for sentiment analysis,” Asian J. Comput. Sci. Technol., vol. 8, no. S2, pp. 1–6, 2019.
S. N. Almuayqil, M. Humayun, N. Z. Jhanjhi, M. F. Almufareh, and D. Javed, “Framework for improved sentiment analysis via random minority oversampling for user tweet review classification,” Electronics (Basel), vol. 11, no. 19, p. 3058, 2022.
M. Fachrie, A. Musdholifah, and S. Hartati, “Improving sentiment analysis performance on imbalanced dataset using data resampling and statistical feature selection,” in Proc. 2024 8th Int. Conf. Inf. Technol. (InCIT), Nov. 2024, pp. 272–277, IEEE.
A. Miftahusalam, A. F. Nuraini, A. A. Khoirunisa, and H. Pratiwi, “Perbandingan algoritma Random Forest, Naïve Bayes, dan support vector machine pada analisis sentimen Twitter mengenai opini masyarakat terhadap penghapusan tenaga honorer,” in Seminar Nasional Official Statistics, vol. 2022, no. 1, pp. 563–572, Nov. 2022.
I. N. Husada and H. Toba, “Pengaruh metode penyeimbangan kelas terhadap tingkat akurasi analisis sentimen pada tweets berbahasa Indonesia,” J. Tek. Inform. Sist. Informasi, vol. 6, no. 2, 2020.
B. B. Baskoro, I. Susanto, and S. Khomsah, “Analisis sentimen pelanggan hotel di Purwokerto menggunakan metode Random Forest dan TF-IDF (studi kasus: ulasan pelanggan pada situs TRIPADVISOR),” J. Informatics Inf. Syst. Softw. Eng. Appl. (INISTA), vol. 3, no. 2, pp. 21–29, 2021.
O. I. Gifari, M. Adha, I. R. Hendrawan, and F. F. S. Durrand, “Analisis sentimen review film menggunakan TF-IDF dan support vector machine,” J. Inf. Technol., vol. 2, no. 1, pp. 36–40, 2022.
I. Saputra, R. S. A. Pambudi, H. E. Darono, F. Amsury, M. R. Fahdia, B. Ramadhan, and A. Ardiansyah, “Analisis sentimen pengguna marketplace Bukalapak dan Tokopedia di Twitter menggunakan machine learning,” Faktor Exacta, vol. 13, no. 4, pp. 200–207, 2021.
G. A. Buntoro, R. Arifin, G. N. Syaifuddiin, A. Selamat, O. Krejcar, and F. Hamido, “The implementation of the machine learning algorithm for the sentiment analysis of Indonesia’s 2019 presidential election,” IIUM Eng. J., vol. 22, no. 1, pp. 78–92, 2021.
C. A. N. Agustina, R. Novita, and N. E. Rozanda, “The implementation of TF-IDF and Word2Vec on booster vaccine sentiment analysis using support vector machine algorithm,” Procedia Comput. Sci., vol. 234, pp. 156–163, 2024.
I. Lazrig and S. L. Humpherys, “Using machine learning sentiment analysis to evaluate learning impact,” Inf. Syst. Educ. J., vol. 20, no. 1, pp. 13–21, 2022.
P. Rahayu, I. G. I. S. Wibawa, S. Suryani, A. Surachman, A. Ridwan, I. G. M. Darmawiguna, M. N. Sutoyo, I. Slamet, S. Harlina, and I. M. D. Maysanjaya, Buku Ajar Data Mining. PT. Sonpedia Publishing Indonesia, 2024.
P. P. Armaeni, I. K. A. G. Wiguna, and W. G. S. Parwita, “Sentiment analysis of YouTube comments on the closure of TikTok Shop using Naïve Bayes and decision tree method comparison,” J. Galaksi, vol. 1, no. 2, pp. 70–80, 2024.
A. M. Mantika, A. Triayudi, and R. T. Aldisa, “Sentiment analysis on Twitter using Naïve Bayes and logistic regression for the 2024 presidential election,” SaNa: J. Blockchain, NFTs Metaverse Technol., vol. 2, no. 1, pp. 44–55, 2024.


Copyright (c) 2025 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)