An Empirical Evaluation of Confidence Miscalibration in Vanilla BERT-Based Stress Detection on Social Media
DOI:
https://doi.org/10.63158/journalisi.v8i3.1634Keywords:
Stress detection, Vanilla BERT, expected calibration error, reliability diagram, uncertainty estimationAbstract
This study evaluates the reliability of confidence estimates produced by a Vanilla BERT classifier for stress detection using the Dreaddit benchmark. BERT-base-uncased was fine-tuned on 3,553 labeled text segments, following the standard split of 2,838 training samples and 715 test samples. The model was assessed as a single diagnostic baseline without additional linguistic features, label smoothing, post-hoc calibration, or other calibration interventions. Evaluation was conducted using discriminative performance metrics, including accuracy, precision, recall, and F1-score, as well as probabilistic reliability metrics, including Brier Score, Expected Calibration Error, Adaptive Calibration Error, and a reliability diagram. The Vanilla BERT model achieved 79.02% accuracy, 78.00% precision, 82.65% recall, and 80.26% F1-score, indicating competitive classification performance for stress detection. However, the calibration results revealed noticeable miscalibration, with a Brier Score of 0.1565, Expected Calibration Error of 0.0847, and Adaptive Calibration Error of 0.0880. The most prominent confidence mismatch occurred in the 0.8–0.9 confidence interval, while the 0.9–1.0 interval contributed the most to Expected Calibration Error due to its larger sample proportion. These findings show that although Vanilla BERT performs reasonably well in distinguishing stressed from non-stressed text, its confidence estimates are not fully reliable. Therefore, this study positions Vanilla BERT as a diagnostic reliability baseline and emphasizes the importance of evaluating stress detection models using both classification performance and probabilistic calibration criteria.
Downloads
References
[1] X. Sun, B. J. Li, H. Zhang, and G. Zhang, “Social media use for coping with stress and psychological adjustment: A transactional model of stress and coping perspective,” Front. Psychol., vol. 14, 2023, doi: 10.3389/fpsyg.2023.1140312.
[2] E. Turcan and K. McKeown, “Dreaddit: A Reddit Dataset for Stress Analysis in Social Media,” in Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), Hong Kong: Association for Computational Linguistics, Oct. 2019, pp. 97–107. doi: 10.18653/v1/D19-6213.
[3] A. Pourkeyvan, R. Safa, and A. Sorourkhah, “Harnessing the Power of Hugging Face Transformers for Predicting Mental Health Disorders in Social Networks,” IEEE Access, vol. 12, pp. 28025–28035, 2024, doi: 10.1109/ACCESS.2024.3366653.
[4] M. Sao and H. J. Lim, “MIRoBERTa: Mental Illness Text Classification With Transfer Learning on Subreddits,” IEEE Access, vol. 12, pp. 197454–197466, 2024, doi: 10.1109/ACCESS.2024.3522465.
[5] A. Karamat, M. Imran, M. U. Yaseen, R. Bukhsh, S. Aslam, and N. Ashraf, “A Hybrid Transformer Architecture for Multiclass Mental Illness Prediction Using Social Media Text,” IEEE Access, vol. 13, pp. 12148–12167, 2025, doi: 10.1109/ACCESS.2024.3519308.
[6] L. Ilias, S. Mouzakitis, and D. Askounis, “Calibration of Transformer-Based Models for Identifying Stress and Depression in Social Media,” IEEE Trans. Comput. Soc. Syst., vol. 11, no. 2, pp. 1979–1990, Apr. 2024, doi: 10.1109/TCSS.2023.3283009.
[7] N. Oryngozha, P. Shamoi, and A. Igali, “Detection and Analysis of Stress-Related Posts in Reddit’s Acamedic Communities,” IEEE Access, vol. 12, pp. 14932–14948, 2024, doi: 10.1109/ACCESS.2024.3357662.
[8] J. Gawlikowski et al., “A survey of uncertainty in deep neural networks,” Artif. Intell. Rev., vol. 56, pp. 1513–1589, Oct. 2023, doi: 10.1007/s10462-023-10562-9.
[9] J. Geng, F. Cai, Y. Wang, H. Koeppl, P. Nakov, and I. Gurevych, “A Survey of Confidence Estimation and Calibration in Large Language Models,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico: Association for Computational Linguistics, Jun. 2024, pp. 6577–6595. doi: 10.18653/v1/2024.naacl-long.366.
[10] S. Roohi, R. Skarbez, and H. D. Nguyen, “Reliable uncertainty estimation in emotion recognition in conversation using conformal prediction framework,” Natural Language Processing, vol. 31, no. 5, pp. 1163–1186, Sep. 2025, doi: 10.1017/nlp.2024.48.
[11] J.-Q. Yang, D.-C. Zhan, and L. Gan, “Beyond Probability Partitions: Calibrating Neural Networks with Semantic Aware Grouping Appendix,” in Advances in Neural Information Processing Systems, New Orleans, Louisiana, USA: Neural Information Processing Systems Foundation, 2023, pp. 58448–58460. Accessed: May 01, 2026.
[12] D. Angelov, “Top2Vec: Distributed Representations of Topics,” arXiv preprint arXiv:2008.09470, Aug. 2020, Accessed: May 17, 2026. [Online]. Available: https://arxiv.org/abs/2008.09470
[13] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural Networks,” in Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia: PMLR, 2017, pp. 1321–1330.
[14] J. Devlin, M.-W. Chang, and K. Lee, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. doi: 10.18653/v1/N19-1423.
[15] C. Sun, X. Qiu, Y. Xu, and X. Huang, “How to Fine-Tune BERT for Text Classification?” in Chinese Computational Linguistics, Cham, Switzerland: Springer, 2019, pp. 194–206. doi: 10.1007/978-3-030-32381-3_16.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors Declaration
- The Authors certify that they have read, understood, and agreed to the Journal of Information Systems and Informatics (JournalISI) submission guidelines, policies, and submission declaration. The submission has been prepared using the provided template.
- The Authors certify that all authors have approved the publication of this manuscript and that there is no conflict of interest.
- The Authors confirm that the manuscript is their original work, has not received prior publication, is not under consideration for publication elsewhere, and has not been previously published.
- The Authors confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- The Authors confirm that the manuscript is not copied from or plagiarized from any other published work.
- The Authors declare that the manuscript will not be submitted for publication in any other journal or magazine until a decision is made by the journal editors.
- If the manuscript is finally accepted for publication, the Authors confirm that they will either proceed with publication immediately or withdraw the manuscript in accordance with the journal’s withdrawal policies.
- The Authors agree that, upon publication of the manuscript in this journal, they transfer copyright or assign exclusive rights to the publisher, including commercial rights














