Integration of Hash Encoding Technique with Machine Learning for Employee Turnover Prediction
Abstract
Employee turnover refers to the replacement of employees within an organization, which can lead to losses such as recruitment costs and decreased productivity. Predicting turnover is crucial for companies to anticipate and take appropriate actions to retain potential employees. This study aims to optimize the employee turnover prediction model by integrating hash encoding techniques and machine learning. The dataset used in this study is an open-source dataset obtained from Kaggle dataset. It consists of 14,994 rows and 10 columns (features) representing employee-related information such as satisfaction level, evaluation score, number of projects, average monthly hours, and whether the employee left the company. Among these features, some are of object data type. Since machine learning algorithms generally cannot work directly with object-type features, the use of hash encoding is proposed. This technique converts object-type data into numerical data. It is part of the preprocessing stage, aiming to reduce memory usage, speed up data preprocessing, and improve model performance. After preprocessing is completed, the prediction model is trained using the Random Forest algorithm to predict employee turnover. The evaluation is conducted using accuracy, recall, precision, and F1-score metrics, which yielded results of 0.988, 0.961, 0.988, and 0.974, respectively. These results indicate that the integration of hash encoding techniques and machine learning can produce a well-performing model for predicting employee turnover.
Downloads
References
K. S. Andrews and T. Mohammed, “Strategies for Reducing Employee Turnover in Small- and Medium-Sized Enterprises,” Westcliff International Journal of Applied Research, vol. 4, no. 1, pp. 57–71, Nov. 2020, doi: 10.47670/wuwijar202041katm.
A. F. Lestari, Y. M. Fauzi, A. I. Wazdi, and A. M. Sarusu, “Pengaruh Komitmen Organisasi dan Stres Kerja terhadap Turnover Intention Karyawan di PT BPRS HIK Parahyangan Bandung,” Jurnal Dimamu, vol. 1, no. 1, pp. 23–36, 2021, doi: 10.32627.
A. Wijaya, Tannia, Handoko, J. Matthew Karsten, and S. J. Salim, “The Effect Of Authentic Leadership On Turnover Intention In Service Sector With Work Engagement As Mediator,” Jurnal Muara Ilmu Ekonomi dan Bisnis, vol. 8, no. 1, pp. 75–86, Apr. 2024, doi: 10.24912/jmieb.v8i1.28150.
D. Ningsih, Maftukhin, I. D. Mulyani, A. Niasari, A. Sholeha, “Pengaruh Turnover dan Inventory Turnover terhadap Perubahan Laba pada Perusahaan Pertambangan Turnover and Inventory Turnover on Profit Changes in Mining Companies”, Journal of Accounting and Finance, vol.1, no.1, 2019.
P. Kumar, S. B. Gaikwad, S. T. Ramya, T. Tiwari, M. Tiwari, and B. Kumar, “Predicting Employee Turnover: A Systematic Machine Learning Approach for Resource Conservation and Workforce Stability †,” Engineering Proceedings, vol. 59, no. 1, 2023, doi: 10.3390/engproc2023059117.
M. Atef, D. S. Elzanfaly, and S. Ouf, “Early Prediction of Employee Turnover Using Machine Learning Algorithms 135 Original Scientific Paper”, International Journal of Electrical and Computer Engineering Systems, vol.13, no.2, 2022.
Y. Zhang, Z. Cai, and H. Fei, “Predicting Employee Turnover in High-Tech Enterprises Using Machine Learning: Based on the Psychological Contract Perspective”, Atlantis Press, pp. 341–352, 2024, doi: 10.2991/978-94-6463-488-4_38.
M. Al Akasheh, O. Hujran, E. Faisal Malik, and N. Zaki, “Enhancing the Prediction of Employee Turnover with Knowledge Graphs and Explainable AI,” IEEE Access, vol. 12, pp. 77041–77053, 2024, doi: 10.1109/ACCESS.2024.3404829.
J. Park, Y. Feng, and S. P. Jeong, “Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-023-50593-4.
G. Obaido et al., “Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects,” Machine Learning with Applications, vol. 17, p. 100576, Sep. 2024, doi: 10.1016/j.mlwa.2024.100576.
X. Huang, H. Chen, and Z. Zhang, “Design and Application of Deep Hash Embedding Algorithm with Fusion Entity Attribute Information,” Entropy, vol. 25, no. 2, Feb. 2023, doi: 10.3390/e25020361.
P. Cerda and G. Varoquaux, “Encoding High-Cardinality String Categorical Variables,” IEEE Trans Knowl Data Eng, vol. 34, no. 3, pp. 1164–1176, Mar. 2022, doi: 10.1109/TKDE.2020.2992529.
K. R. Putra and M. A. Rachman, “Perbandingan Metode Content-based, Collaborative dan Hybrid Filtering pada Sistem Rekomendasi Lagu,” MIND Journal, vol. 9, no. 2, pp. 179–193, Dec. 2024, doi: 10.26760/mindjournal.v9i2.179-193.
L. N. Aina, V. R. S. Nastiti, C. S. K. Aditya, “Implementasi Extra Trees Classifier dengan Optimasi Grid Search CV pada Prediksi Tingkat Adaptasi”, MIND (Multimedia Artificial Intelligent Networking Database)”, 2024, doi: 10.26760/mindjournal.v9i1.78-88.
D. Breskuvien and G. Dzemyda, “Categorical Feature Encoding Techniques for Improved Classifier Performance when Dealing with Imbalanced Data of Fraudulent Transactions,” International Journal of Computers, Communications and Control, vol. 18, no. 3, 2023, doi: 10.15837/ijccc.2023.3.5433.
M. Andrecut, “Additive Feature Hashing,” 2021, doi: 10.48550/arXiv.2102.03943.
A. Zheng and A. Casari, “Feature engineering for machine learning : principles and techniques for data scientists”. O’Reilly Media, 2018.
C. García-Vicente et al., “Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors,” Applied Sciences (Switzerland), vol. 13, no. 7, Apr. 2023, doi: 10.3390/app13074119.
I. Moura, A. Teles, D. Viana, J. Marques, L. Coutinho, and F. Silva, “Digital Phenotyping of Mental Health using multimodal sensing of multiple situations of interest: A Systematic Literature Review,” Feb. 01, 2023, Academic Press Inc. doi: 10.1016/j.jbi.2022.104278.
A. R. Kamila, J. F. Andry, A. W. C. Kusuma, E. W. Prasetyo, and G. H. Derhass, “Analysis Comparison of K-Nearest Neighbor, Multi-Layer Perceptron, and Decision Tree Algorithms in Diamond Price Prediction,” COGITO Smart Journal, vol. 10, no. 2, 2024.
J. Park, Y. Feng, and S. P. Jeong, “Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques,” Sci Rep, vol. 14, no. 1, 2024, doi: 10.1038/s41598-023-50593-4.
M. Cabanillas-Carbonell and J. Zapata-Paulini, “Evaluation of machine learning models for the prediction of Alzheimer’s: In search of the best performance,” Brain Behav Immun Health, vol. 44, Mar. 2025, doi: 10.1016/j.bbih.2025.100957.
A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Jun. 15, 2024, Elsevier Ltd. doi: 10.1016/j.eswa.2023.122778.
A. R. Kamila, F. Adikara, C. Herdian, and Sutrisno, “Pengaruh Penambahan Fitur dengan Perbandingan Algoritma berbasis Bagging dan Boosting pada Deteksi Phishing Link”, JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol.10, no.3, 2024.
J. Brabec and L. Machlica, “Decision-Forest Voting Scheme for Classification of Rare Classes in Network Intrusion Detection”, IEEE International Conference on Systems, Man, and Cybernetics, pp. 3325–3330, 2018.


Copyright (c) 2025 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)