Detecting Deceptive Online Reviews Using a Semantic Reliability Index and Hybrid Text Representation
DOI:
https://doi.org/10.63158/journalisi.v8i2.1576Keywords:
opinion spam detection, semantic similarity, stylometric features, XGBoost, hybrid feature representationAbstract
Online review platforms such as Yelp play an important role in consumer decision-making, but the growing prevalence of fake reviews undermines their reliability. This study proposes a hybrid approach for fake review detection by integrating stylometric features, language model signals, and semantic embeddings within a unified classification framework. The proposed method combines linguistic indicators, including GPT-2 perplexity, lexical diversity, sentence burstiness, punctuation ratio, and sentiment intensity, with TF-IDF representations and Sentence-BERT embeddings. A composite feature, namely the Semantic Reliability Index (SRI), is introduced to capture interactions between semantic similarity and linguistic characteristics, serving as an auxiliary feature within the hybrid model rather than a standalone classifier. Experiments on a Yelp hotel review dataset demonstrate that the hybrid model outperforms baseline methods in terms of F1-score and AUC, indicating improved discriminative capability. It should be noted that the classification setting is based on a binary transformation of ordinal labels, which may simplify the underlying label structure and influence performance interpretation. Overall, this work's contribution lies in a systematic feature-integration strategy that enhances fake review detection in the evaluated dataset.
Downloads
References
[1] Z. K. Nimra Mughal, Ghulam Mujtaba, Muhammad Hussain Mughal, Abdul Manaf, “Fake Reviews Detection on E-Commerce Websites Using Novel User Behavioral Features : An Experimental Study,” ACM Trans. Asian Low-Resource Lang. Inf. Process., vol. 24, no. 9, pp. 0–44, 2026, doi: 10.1145/3748493.
[2] A. Jakhar and A. Indian, “Explainable fake review detection : A hybrid deep learning model for E-commerce platforms to enhance customer trust,” J. Retail. Consum. Serv., vol. 92, no. March, pp. 1–15, 2026.
[3] P. Sun et al., “Fake Review Detection Model Based on Comment Content and Review Behavior,” Electronics, vol. 13, pp. 1–17, 2024.
[4] E. Elmurngi and A. Gherbi, “Detecting Fake Reviews through Sentiment Analysis Using Machine Learning Techniques,” in DATA ANALYTICS 2017 : The Sixth International Conference on Data Analytics, IARIA, 2017, pp. 65–72.
[5] J. Wang and J. Chen, “WF-CFRB : A Deep Learning Approach for Fake Review Detection Based on Weighted Fusion of Contextual Features and Reviewer Behaviors,” J SYST SCI SYST ENG, vol. 34, no. 5, pp. 558–575, 2025.
[6] M. J. Abd and M. H. Hussein, “Fake reviews detection in e-commerce using machine learning techniques : a comparative survey,” in BIO Web of Conferences 97, ISCKU 2024, 2024, pp. 1–12. doi: 10.1051/bioconf/20249700099.
[7] R. Mohawesh, H. Bany, Y. Jararweh, and M. Alkhalaileh, “International Journal of Cognitive Computing in Engineering Fake review detection using transformer-based enhanced LSTM and RoBERTa,” Int. J. Cogn. Comput. Eng., vol. 5, no. June, pp. 250–258, 2024, doi: 10.1016/j.ijcce.2024.06.001.
[8] J. Kumar, “Fake Review Detection Using Behavioral and Contextual Features Fake Review Detection Using Behavioral and Contextual Features,” QUAID-I-AZAM UNIVERSITY, 2018.
[9] J. Li, M. Ott, C. Cardie, and E. Hovy, “Towards a General Rule for Identifying Deceptive Opinion Spam,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, USA: Association for Computational Linguistics (ACL), 2014, pp. 1566–1576.
[10] C. Chen, H. Zhao, and Y. Yang, “Deceptive Opinion Spam Detection Using Deep Level Linguistic Features,” in International Joint Conference on Natural Language Processing (IJCNLP), ACL Anthology (Association for Computational Linguistics), 2015, pp. 465–474. doi: 10.1007/978-3-319-25207-0.
[11] S. Morgan and B. Liu, “Spotting Fake Reviewer Groups in Consumer Reviews,” in the International World Wide Web Conference Committee (IW3C2), Lyon, France: ACM, 2026, pp. 191–200. doi: 10.1145/2187836.2187863.
[12] H. Aghakhani, A. Machiry, S. Nilizadeh, C. Kruegel, and G. Vigna, “Detecting Deceptive Reviews using Generative Adversarial Networks,” in 2018 IEEE Symposium on Security and Privacy Workshops, 2018, pp. 89–95. doi: 10.1109/SPW.2018.00022.
[13] G. Bathla, P. Singh, R. Kumar, Erik Cambria, and Rajeev Tiwari, “Intelligent fake reviews detection based on aspect extraction and analysis using deep learning,” Neural Comput. Appl., vol. 34, no. 22, pp. 20213–20229, 2022, doi: 10.1007/s00521-022-07531-8.
[14] Y. C. Song Feng, Ritwik Banerjee, “Syntactic Stylometry for Deception Detection,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), 2012, pp. 171–175.
[15] K. K. Poojary, “Deciphering Deception - Detecting Fake Review using NLP by analysis of stylistic, sentiment-based, and semantic features,” Dublin Business School, 2024.
[16] N. Reimers and I. Gurevych, “Sentence-BERT : Sentence Embeddings using Siamese BERT-Networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China: Association for Computational Linguistics (ACL), 2019, pp. 3982–3992.
[17] T. Gao, X. Yao, and D. Chen, “SimCSE : Simple Contrastive Learning of Sentence Embeddings,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics (ACL), 2021, pp. 6894–6910.
[18] A. Mukherjee, B. Liu, and N. Glance, “Spotting fake reviewer groups in consumer reviews,” WWW’12 - Proc. 21st Annu. Conf. World Wide Web, pp. 191–200, 2012, doi: 10.1145/2187836.2187863.
[19] M. Ennaouri and A. Zellou, “Enhancing Fake Review Detection Using Linguistic Exaggeration, BERT Embeddings, and Fuzzy Logic,” IEEE Access, vol. 13, no. August, pp. 135957–135968, 2025, doi: 10.1109/ACCESS.2025.3594629.
[20] Y. Guo, S. Ji, N. Cao, D. K. W. Chiu, N. Su, and C. Zhang, “MDG : Fusion learning of the maximal diffusion, deep propagation and global structure features of fake news,” Expert Syst. Appl., vol. 213, no. November 2022, pp. 1–15, 2023, doi: 10.1016/j.eswa.2022.119291.
[21] S. Sarafian and Y. Aperstein, “Improving Deep Tabular Learning,” 2025.
[22] J. Chen, G. Zhou, M. Lan, S. Wang, S. Li, and J. Lu, “Semantic-aware fake news detection with heterogeneous graph attention,” J. Intell. Inf. Syst., vol. 63, pp. 1865–1890, 2025.
[23] C. J. Hutto and E. Gilbert, “VADER : A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text,” in Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media, Association for the Advancement of Artificial Intelligence, 2014, pp. 216–225.
[24] J. Wang, H. Kan, F. Meng, Q. Mu, G. Shi, and X. Xiao, “Fake Review Detection Based on Multiple Feature Fusion and Rolling Collaborative Training,” IEEE Access, vol. 8, pp. 182625–182639, 2020, doi: 10.1109/ACCESS.2020.3028588.
[25] J. F. Trevor Hastie, Robert Tibshirani, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. California: Springer, 2017.
[26] Hanafi and B. Mohd Aboobaider, “Word Sequential Using Deep LSTM and Matrix Factorization to Handle Rating Sparse Data for E-Commerce Recommender System,” Comput. Intell. Neurosci., vol. 2021, no. 1, 2021, doi: 10.1155/2021/8751173.
[27] T. Chen and C. Guestrin, “XGBoost : A Scalable Tree Boosting System,” in KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2026, pp. 785–794. doi: 10.1145/2939672.2939785.
[28] D. Zhang, W. Li, B. Niu, and C. Wu, “A deep learning approach for detecting fake reviewers : Exploiting reviewing behavior and textual information,” Decis. Support Syst., vol. 166, no. November 2022, p. 113911, 2023, doi: 10.1016/j.dss.2022.113911.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














