Detecting Deceptive Online Reviews Using a Semantic Reliability Index and Hybrid Text Representation

Hartatik; Andri Syafrianto

doi:10.63158/journalisi.v8i2.1576

Authors

Hartatik Universitas Amikom Yogyakarta, Indonesia
Andri Syafrianto STMIK El Rahma, Indonesia

DOI:

https://doi.org/10.63158/journalisi.v8i2.1576

Keywords:

opinion spam detection, semantic similarity, stylometric features, XGBoost, hybrid feature representation

Abstract

Online review platforms such as Yelp play an important role in consumer decision-making, but the growing prevalence of fake reviews undermines their reliability. This study proposes a hybrid approach for fake review detection by integrating stylometric features, language model signals, and semantic embeddings within a unified classification framework. The proposed method combines linguistic indicators, including GPT-2 perplexity, lexical diversity, sentence burstiness, punctuation ratio, and sentiment intensity, with TF-IDF representations and Sentence-BERT embeddings. A composite feature, namely the Semantic Reliability Index (SRI), is introduced to capture interactions between semantic similarity and linguistic characteristics, serving as an auxiliary feature within the hybrid model rather than a standalone classifier. Experiments on a Yelp hotel review dataset demonstrate that the hybrid model outperforms baseline methods in terms of F1-score and AUC, indicating improved discriminative capability. It should be noted that the classification setting is based on a binary transformation of ordinal labels, which may simplify the underlying label structure and influence performance interpretation. Overall, this work's contribution lies in a systematic feature-integration strategy that enhances fake review detection in the evaluated dataset.

Downloads

Download data is not yet available.

References

[1] Z. K. Nimra Mughal, Ghulam Mujtaba, Muhammad Hussain Mughal, Abdul Manaf, “Fake Reviews Detection on E-Commerce Websites Using Novel User Behavioral Features : An Experimental Study,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 24, no. 9, pp. 0–44, 2026, doi: 10.1145/3748493.

[2] A. Jakhar and A. Indian, “Explainable fake review detection : A hybrid deep learning model for E-commerce platforms to enhance customer trust,” J. Retail. Consum. Serv., vol. 92, no. March, pp. 1–15, 2026.

[3] P. Sun et al., “Fake Review Detection Model Based on Comment Content and Review Behavior,” Electronics, vol. 13, pp. 1–17, 2024.

[4] E. Elmurngi and A. Gherbi, “Detecting Fake Reviews through Sentiment Analysis Using Machine Learning Techniques,” in DATA ANALYTICS 2017 : The Sixth International Conference on Data Analytics, IARIA, 2017, pp. 65–72.

[5] J. Wang and J. Chen, “WF-CFRB : A Deep Learning Approach for Fake Review Detection Based on Weighted Fusion of Contextual Features and Reviewer Behaviors,” J SYST SCI SYST ENG, vol. 34, no. 5, pp. 558–575, 2025.

[6] M. J. Abd and M. H. Hussein, “Fake reviews detection in e-commerce using machine learning techniques : a comparative survey,” in BIO Web of Conferences 97, ISCKU 2024, 2024, pp. 1–12. doi: 10.1051/bioconf/20249700099.

[7] R. Mohawesh, H. Bany, Y. Jararweh, and M. Alkhalaileh, “International Journal of Cognitive Computing in Engineering Fake review detection using transformer-based enhanced LSTM and RoBERTa,” Int. J. Cogn. Comput. Eng., vol. 5, no. June, pp. 250–258, 2024, doi: 10.1016/j.ijcce.2024.06.001.

[8] J. Kumar, “Fake Review Detection Using Behavioral and Contextual Features Fake Review Detection Using Behavioral and Contextual Features,” QUAID-I-AZAM UNIVERSITY, 2018.

[9] J. Li, M. Ott, C. Cardie, and E. Hovy, “Towards a General Rule for Identifying Deceptive Opinion Spam,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA: Association for Computational Linguistics (ACL), 2014, pp. 1566–1576.

[10] C. Chen, H. Zhao, and Y. Yang, “Deceptive Opinion Spam Detection Using Deep Level Linguistic Features,” in International Joint Conference on Natural Language Processing (IJCNLP), ACL Anthology (Association for Computational Linguistics), 2015, pp. 465–474. doi: 10.1007/978-3-319-25207-0.

[11] S. Morgan and B. Liu, “Spotting Fake Reviewer Groups in Consumer Reviews,” in the International World Wide Web Conference Committee (IW3C2), Lyon, France: ACM, 2026, pp. 191–200. doi: 10.1145/2187836.2187863.

[12] H. Aghakhani, A. Machiry, S. Nilizadeh, C. Kruegel, and G. Vigna, “Detecting Deceptive Reviews using Generative Adversarial Networks,” in 2018 IEEE Symposium on Security and Privacy Workshops, 2018, pp. 89–95. doi: 10.1109/SPW.2018.00022.

[13] G. Bathla, P. Singh, R. Kumar, Erik Cambria, and Rajeev Tiwari, “Intelligent fake reviews detection based on aspect extraction and analysis using deep learning,” Neural Comput. Appl., vol. 34, no. 22, pp. 20213–20229, 2022, doi: 10.1007/s00521-022-07531-8.

[14] Y. C. Song Feng, Ritwik Banerjee, “Syntactic Stylometry for Deception Detection,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2012, pp. 171–175.

[15] K. K. Poojary, “Deciphering Deception - Detecting Fake Review using NLP by analysis of stylistic, sentiment-based, and semantic features,” Dublin Business School, 2024.

[16] N. Reimers and I. Gurevych, “Sentence-BERT : Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China: Association for Computational Linguistics (ACL), 2019, pp. 3982–3992.

[17] T. Gao, X. Yao, and D. Chen, “SimCSE : Simple Contrastive Learning of Sentence Embeddings,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (ACL), 2021, pp. 6894–6910.

[18] A. Mukherjee, B. Liu, and N. Glance, “Spotting fake reviewer groups in consumer reviews,” WWW’12 - Proc. 21st Annu. Conf. World Wide Web, pp. 191–200, 2012, doi: 10.1145/2187836.2187863.

[19] M. Ennaouri and A. Zellou, “Enhancing Fake Review Detection Using Linguistic Exaggeration, BERT Embeddings, and Fuzzy Logic,” IEEE Access, vol. 13, no. August, pp. 135957–135968, 2025, doi: 10.1109/ACCESS.2025.3594629.

[20] Y. Guo, S. Ji, N. Cao, D. K. W. Chiu, N. Su, and C. Zhang, “MDG : Fusion learning of the maximal diffusion, deep propagation and global structure features of fake news,” Expert Syst. Appl., vol. 213, no. November 2022, pp. 1–15, 2023, doi: 10.1016/j.eswa.2022.119291.

[21] S. Sarafian and Y. Aperstein, “Improving Deep Tabular Learning,” 2025.

[22] J. Chen, G. Zhou, M. Lan, S. Wang, S. Li, and J. Lu, “Semantic-aware fake news detection with heterogeneous graph attention,” J. Intell. Inf. Syst., vol. 63, pp. 1865–1890, 2025.

[23] C. J. Hutto and E. Gilbert, “VADER : A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,” in Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, Association for the Advancement of Artificial Intelligence, 2014, pp. 216–225.

[24] J. Wang, H. Kan, F. Meng, Q. Mu, G. Shi, and X. Xiao, “Fake Review Detection Based on Multiple Feature Fusion and Rolling Collaborative Training,” IEEE Access, vol. 8, pp. 182625–182639, 2020, doi: 10.1109/ACCESS.2020.3028588.

[25] J. F. Trevor Hastie, Robert Tibshirani, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. California: Springer, 2017.

[26] Hanafi and B. Mohd Aboobaider, “Word Sequential Using Deep LSTM and Matrix Factorization to Handle Rating Sparse Data for E-Commerce Recommender System,” Comput. Intell. Neurosci., vol. 2021, no. 1, 2021, doi: 10.1155/2021/8751173.

[27] T. Chen and C. Guestrin, “XGBoost : A Scalable Tree Boosting System,” in KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2026, pp. 785–794. doi: 10.1145/2939672.2939785.

[28] D. Zhang, W. Li, B. Niu, and C. Wu, “A deep learning approach for detecting fake reviewers : Exploiting reviewing behavior and textual information,” Decis. Support Syst., vol. 166, no. November 2022, p. 113911, 2023, doi: 10.1016/j.dss.2022.113911.

Detecting Deceptive Online Reviews Using a Semantic Reliability Index and Hybrid Text Representation

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

Most read articles by the same author(s)

publisher

sidebar

certificate

template

gs-citation

index

stat