Integration of Hash Encoding Technique with Machine Learning for Employee Turnover Prediction

Ahya Radiatul Kamila; Johanes Fernandes Andry; Francka Sakti Lee; Felliks F. Tampinongkol

doi:10.51519/journalisi.v7i2.1129

Ahya Radiatul Kamila Bunda Mulia University, Indonesia
Johanes Fernandes Andry Bunda Mulia University, Indonesia
Francka Sakti Lee Bunda Mulia University, Indonesia
Felliks F. Tampinongkol Bunda Mulia University, Indonesia

DOI: 10.51519/journalisi.v7i2.1129

Keywords: Hash Encoding, Machine learning, Turnover Prediction, Random Forest

Abstract

Employee turnover refers to the replacement of employees within an organization, which can lead to losses such as recruitment costs and decreased productivity. Predicting turnover is crucial for companies to anticipate and take appropriate actions to retain potential employees. This study aims to optimize the employee turnover prediction model by integrating hash encoding techniques and machine learning. The dataset used in this study is an open-source dataset obtained from Kaggle dataset. It consists of 14,994 rows and 10 columns (features) representing employee-related information such as satisfaction level, evaluation score, number of projects, average monthly hours, and whether the employee left the company. Among these features, some are of object data type. Since machine learning algorithms generally cannot work directly with object-type features, the use of hash encoding is proposed. This technique converts object-type data into numerical data. It is part of the preprocessing stage, aiming to reduce memory usage, speed up data preprocessing, and improve model performance. After preprocessing is completed, the prediction model is trained using the Random Forest algorithm to predict employee turnover. The evaluation is conducted using accuracy, recall, precision, and F1-score metrics, which yielded results of 0.988, 0.961, 0.988, and 0.974, respectively. These results indicate that the integration of hash encoding techniques and machine learning can produce a well-performing model for predicting employee turnover.

Downloads

Download data is not yet available.

References

K. S. Andrews and T. Mohammed, “Strategies for Reducing Employee Turnover in Small- and Medium-Sized Enterprises,” Westcliff International Journal of Applied Research, vol. 4, no. 1, pp. 57–71, Nov. 2020, doi: 10.47670/wuwijar202041katm.

A. F. Lestari, Y. M. Fauzi, A. I. Wazdi, and A. M. Sarusu, “Pengaruh Komitmen Organisasi dan Stres Kerja terhadap Turnover Intention Karyawan di PT BPRS HIK Parahyangan Bandung,” Jurnal Dimamu, vol. 1, no. 1, pp. 23–36, 2021, doi: 10.32627.

A. Wijaya, Tannia, Handoko, J. Matthew Karsten, and S. J. Salim, “The Effect Of Authentic Leadership On Turnover Intention In Service Sector With Work Engagement As Mediator,” Jurnal Muara Ilmu Ekonomi dan Bisnis, vol. 8, no. 1, pp. 75–86, Apr. 2024, doi: 10.24912/jmieb.v8i1.28150.

D. Ningsih, Maftukhin, I. D. Mulyani, A. Niasari, A. Sholeha, “Pengaruh Turnover dan Inventory Turnover terhadap Perubahan Laba pada Perusahaan Pertambangan Turnover and Inventory Turnover on Profit Changes in Mining Companies”, Journal of Accounting and Finance, vol.1, no.1, 2019.

P. Kumar, S. B. Gaikwad, S. T. Ramya, T. Tiwari, M. Tiwari, and B. Kumar, “Predicting Employee Turnover: A Systematic Machine Learning Approach for Resource Conservation and Workforce Stability †,” Engineering Proceedings, vol. 59, no. 1, 2023, doi: 10.3390/engproc2023059117.

M. Atef, D. S. Elzanfaly, and S. Ouf, “Early Prediction of Employee Turnover Using Machine Learning Algorithms 135 Original Scientific Paper”, International Journal of Electrical and Computer Engineering Systems, vol.13, no.2, 2022.

Y. Zhang, Z. Cai, and H. Fei, “Predicting Employee Turnover in High-Tech Enterprises Using Machine Learning: Based on the Psychological Contract Perspective”, Atlantis Press, pp. 341–352, 2024, doi: 10.2991/978-94-6463-488-4_38.

M. Al Akasheh, O. Hujran, E. Faisal Malik, and N. Zaki, “Enhancing the Prediction of Employee Turnover with Knowledge Graphs and Explainable AI,” IEEE Access, vol. 12, pp. 77041–77053, 2024, doi: 10.1109/ACCESS.2024.3404829.

J. Park, Y. Feng, and S. P. Jeong, “Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-023-50593-4.

G. Obaido et al., “Supervised machine learning in drug discovery and development: Algorithms, applications, challenges, and prospects,” Machine Learning with Applications, vol. 17, p. 100576, Sep. 2024, doi: 10.1016/j.mlwa.2024.100576.

X. Huang, H. Chen, and Z. Zhang, “Design and Application of Deep Hash Embedding Algorithm with Fusion Entity Attribute Information,” Entropy, vol. 25, no. 2, Feb. 2023, doi: 10.3390/e25020361.

P. Cerda and G. Varoquaux, “Encoding High-Cardinality String Categorical Variables,” IEEE Trans Knowl Data Eng, vol. 34, no. 3, pp. 1164–1176, Mar. 2022, doi: 10.1109/TKDE.2020.2992529.

K. R. Putra and M. A. Rachman, “Perbandingan Metode Content-based, Collaborative dan Hybrid Filtering pada Sistem Rekomendasi Lagu,” MIND Journal, vol. 9, no. 2, pp. 179–193, Dec. 2024, doi: 10.26760/mindjournal.v9i2.179-193.

L. N. Aina, V. R. S. Nastiti, C. S. K. Aditya, “Implementasi Extra Trees Classifier dengan Optimasi Grid Search CV pada Prediksi Tingkat Adaptasi”, MIND (Multimedia Artificial Intelligent Networking Database)”, 2024, doi: 10.26760/mindjournal.v9i1.78-88.

D. Breskuvien and G. Dzemyda, “Categorical Feature Encoding Techniques for Improved Classifier Performance when Dealing with Imbalanced Data of Fraudulent Transactions,” International Journal of Computers, Communications and Control, vol. 18, no. 3, 2023, doi: 10.15837/ijccc.2023.3.5433.

M. Andrecut, “Additive Feature Hashing,” 2021, doi: 10.48550/arXiv.2102.03943.

A. Zheng and A. Casari, “Feature engineering for machine learning : principles and techniques for data scientists”. O’Reilly Media, 2018.

C. García-Vicente et al., “Evaluation of Synthetic Categorical Data Generation Techniques for Predicting Cardiovascular Diseases and Post-Hoc Interpretability of the Risk Factors,” Applied Sciences (Switzerland), vol. 13, no. 7, Apr. 2023, doi: 10.3390/app13074119.

I. Moura, A. Teles, D. Viana, J. Marques, L. Coutinho, and F. Silva, “Digital Phenotyping of Mental Health using multimodal sensing of multiple situations of interest: A Systematic Literature Review,” Feb. 01, 2023, Academic Press Inc. doi: 10.1016/j.jbi.2022.104278.

A. R. Kamila, J. F. Andry, A. W. C. Kusuma, E. W. Prasetyo, and G. H. Derhass, “Analysis Comparison of K-Nearest Neighbor, Multi-Layer Perceptron, and Decision Tree Algorithms in Diamond Price Prediction,” COGITO Smart Journal, vol. 10, no. 2, 2024.

J. Park, Y. Feng, and S. P. Jeong, “Developing an advanced prediction model for new employee turnover intention utilizing machine learning techniques,” Sci Rep, vol. 14, no. 1, 2024, doi: 10.1038/s41598-023-50593-4.

M. Cabanillas-Carbonell and J. Zapata-Paulini, “Evaluation of machine learning models for the prediction of Alzheimer’s: In search of the best performance,” Brain Behav Immun Health, vol. 44, Mar. 2025, doi: 10.1016/j.bbih.2025.100957.

A. A. Khan, O. Chaudhari, and R. Chandra, “A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation,” Jun. 15, 2024, Elsevier Ltd. doi: 10.1016/j.eswa.2023.122778.

A. R. Kamila, F. Adikara, C. Herdian, and Sutrisno, “Pengaruh Penambahan Fitur dengan Perbandingan Algoritma berbasis Bagging dan Boosting pada Deteksi Phishing Link”, JEPIN (Jurnal Edukasi dan Penelitian Informatika), vol.10, no.3, 2024.

J. Brabec and L. Machlica, “Decision-Forest Voting Scheme for Classification of Rare Classes in Network Intrusion Detection”, IEEE International Conference on Systems, Man, and Cybernetics, pp. 3325–3330, 2018.

Integration of Hash Encoding Technique with Machine Learning for Employee Turnover Prediction

Abstract

Downloads

References

Most read articles by the same author(s)