Sentiment Analysis on Coretax Data Using SVM and Random Forest with SMOTE and Tomek-Link

  • Hery Oktafiandi Universitas Satu, Indonesia
  • Winarnie Winarnie Universitas Satu, Indonesia
  • M. Fajar Ramadhan Universitas Satu, Indonesia
  • Febriyanti Panjaitan Universitas Satu, Indonesia
Keywords: Coretax, Sentiment Analysis, SVM, Random Forest, Resamping Techniques

Abstract

This study is motivated by the increasing adoption of digital tax platforms in Indonesia, particularly Coretax, which enables online tax reporting and payment. Understanding user sentiment is crucial for evaluating system effectiveness and identifying areas for improvement. However, sentiment data is often imbalanced, making it challenging to detect the sentiments of the minority class. This research evaluates the performance of Support Vector Machine (SVM) and Random Forest (RF) in classifying sentiment from Coretax related reviews collected between March and September 2025 from Twitter, YouTube, and the DJP application. Lexicon-based labeling and preprocessing were applied, followed by class balancing using Tomek-Link, SMOTE, and SMOTE-Tomek techniques. On the original data, SVM achieved an accuracy of 98.56%, while Random Forest reached 98.43%, both performing strongly on the majority class. However, minority class detection was improved through SMOTE and SMOTE-Tomek, albeit with a slight decrease in overall accuracy due to the risk of overfitting. The novelty of this study lies in its focus on Coretax 2025 data and a comparative analysis of multiple resampling techniques, providing practical insights into improving sentiment analysis performance on imbalanced digital tax data.

Downloads

Download data is not yet available.

References

M. R. Panjaitan and Y. Yuna, “Pengaruh Coretax terhadap Transparansi dan Akuntabilitas Sistem Perpajakan,” Jurnal Riset Akuntansi, vol. 2, no. 4, pp. 51–60, 2024.

G. Dimetheo, A. Salsabila, and N. C. A. Izaak, “Implementasi Core Tax Administration System sebagai Upaya Mendorong Kepatuhan Pajak di Indonesia,” in Prosiding Seminar Nasional Ekonomi dan Perpajakan, 2023, pp. 10–25.

M. A. Al Maliki, “Studi Literatur: Analisis Penerapan Aplikasi CoreTax dalam Sistem Perpajakan,” EKOMA: Jurnal Ekonomi, Manajemen, Akuntansi, vol. 4, no. 3, pp. 5132–5140, 2025.

N. Cindy and C. Chelsya, “Persepsi Mahasiswa Terhadap Penerapan Core Tax Administration System (CTAS) di Indonesia,” Economics and Digital Business Review, vol. 5, no. 2, pp. 1029–1040, 2024.

N. K. J. S. Mayoni, “Pengaruh Persepsi Kegunaan, Persepsi Kemudahan, dan Persepsi Risiko terhadap Niat Untuk Menggunakan Coretax dalam Sistem Administrasi Pajak di Kabupaten Badung,” 2024, Politeknik Negeri Bali.

M. A. Pryono, S. H. Wijoyo, and F. A. Bachtiar, “Analisis Sentimen Terhadap Program Merdeka Belajar Kampus Merdeka Pada Sosial Media Twitter Menggunakan K-Means Clustering, Support Vector Machine (SVM) dan Syntethic Minority Oversampling Technique (SMOTE),” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 8, no. 9, 2024.

N. A. Riskiwibowo, “Analisis Emosi Terhadap Komentar Video Youtube ‘Penyebab Kegagalan Adopsi Sistem Pendidikan Finlandia di Indonesia’ Menggunakan Metode Random Forest,” in Proceedings of the National Conference on Electrical Engineering, Informatics, Industrial Technology, and Creative Media, 2024, pp. 812–827.

M. R. Saputra and P. Parjito, “Analisis Sentimen Twitter Terhadap Konflik di Papua Menggunakan Perbandingan Naive Bayes dan SVM,” JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), vol. 10, no. 2, pp. 1197–1208, 2025.

S. P. Tanzil, “Analisis Sentimen Pengguna Instagram terhadap Timnas Indonesia U-23 pada Piala AFC menggunakan Algoritma K-Nearest Neighbor (K-NN) dengan SMOTE,” Telekontran: Jurnal Ilmiah Telekomunikasi, Kendali dan Elektronika Terapan, vol. 12, no. 1, pp. 68–80, 2024.

D. Shabrina Assyifa and A. Luthfiarta, “SMOTE-Tomek Re-sampling Based on Random Forest Method to Overcome Unbalanced Data for Multi-class Classification,” Inform : Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi, vol. 9, no. 2, pp. 151–160, 2024, doi: 10.25139/inform.v9i2.8410.

H. Oktafiandi, W. Winarnie, and S. M. R. Olajuwon, “Perbandingan Algoritma untuk Analisis Sentimen Terhadap Google Play Store Menggunakan Machine Learning,” Jurnal Ekonomi Dan Teknik Informatika, vol. 11, no. 2, pp. 16–21, 2023.

N. A. Haqimi and T. A. Roshinta, “Analisis Spam Komentar Instagram menggunakan Support Vector Machine dengan Variasi Hyperparameter,” Jurnal Informatika: Jurnal Pengembangan IT, vol. 9, no. 3, pp. 242–253, 2024.

F. Panjaitan, W. Ce, H. Oktafiandi, G. Kanugrahan, Y. Ramdhani, and V. H. C. Putra, “Evaluation of Machine Learning Models for Sentiment Analysis in the South Sumatra Governor Election Using Data Balancing Techniques,” Journal of Information Systems and Informatics, vol. 7, no. 1, pp. 461–478, Mar. 2025, doi: 10.51519/journalisi.v7i1.1019.

B. M. Iqbal, K. M. Lhaksmana, and E. B. Setiawan, “2024 Presidential Election Sentiment Analysis in News Media Using Support Vector Machine,” Journal of Computer System and Informatics (JoSYC), vol. 4, no. 2, pp. 397–404, 2023.

U. R. H. Baba, “Analisa Sentimen Menjelang Komparasi Analisis Sentimen Masyarakat Terhadap Isu Penundaan Pemilu 2024 Pada Twitter Dengan Metode Naive Bayes Dan Support Vector Machine,” Innovative: Journal Of Social Science Research, vol. 4, no. 3, pp. 11972–11990, 2024.

R. H. Muhammadi, T. G. Laksana, and A. B. Arifa, “Combination of support vector machine and lexicon-based algorithm in twitter sentiment analysis,” Khazanah Informatika: Jurnal Ilmu Komputer dan Informatika, vol. 8, no. 1, pp. 59–71, 2022.

Ida Ayu Mirah Cahya Dewi, I Komang Dharmendra, and Ni Wayan Setiasih, “Analisis Sentimen Review Aplikasi Satu Sehat Mobile Menggunakan Model Sampling Tomek Links,” Jurnal Teknologi Informasi dan Komputer, vol. 9, no. 5, pp. 497–504, 2023, doi: 10.36002/jutik.v9i5.2644.

R. Rakarahayu Putri and N. Cahyono, “Analisis Sentimen Komentar Masyarakat Terhadap Pelayanan Publik Pemerintah Dki Jakarta Dengan Algoritma Super Vector Machine Dan Naive Bayes,” JATI (Jurnal Mahasiswa Teknik Informatika), vol. 8, no. 2, pp. 2363–2371, 2024, doi: 10.36040/jati.v8i2.9472.

Published
2025-09-30
Abstract views: 26 times
Download PDF: 11 times
How to Cite
Oktafiandi, H., Winarnie, W., Ramadhan, M., & Panjaitan, F. (2025). Sentiment Analysis on Coretax Data Using SVM and Random Forest with SMOTE and Tomek-Link. Journal of Information Systems and Informatics, 7(3), 2803-2818. https://doi.org/10.51519/journalisi.v7i3.1279
Section
Articles