Real-Time Sign Language Recognition and Translation in Humanoid Robots Using Transformer-Based Model with a Knowledge Graph
Abstract
For millions of deaf-mute individuals, sign language is the only means of communication; this creates barriers in daily interactions with non-signers, leading to the exclusion of these individuals in many areas of daily life. To address this, we propose a real-time sign language translation system using a Transformer model enhanced with a knowledge graph, designed for Human-Robot Interaction (HRI) with NAO robots. Our system bridges the communication gap by translating gestures into natural language (text). We used the RWTH-PHOENIX-Weather 2014T dataset for initial training, achieving a BLEU score of 29.1 and a Word Error Rate (WER) of 18.2% surpassing the baseline model. Due to the domain shift between human gestures and NAO robot gestures, we created a NAO-specific dataset and fine-tuned the model using transfer learning to accommodate an adapted environment and kinematic constraints that do not match the environment in which the robot was deployed. This reduced the WER to 17.6% and increased the BLEU score to 29.9. We tested our model’s capability with dynamic and practical HRI scenarios through comparative experiments in Webots. Integrating a knowledge graph into our model improved contextual disambiguation, significantly enhancing translation accuracy for gestures that weren't clear. Through effectively translating gestures into natural language, our system demonstrates strong potential for practical robotic applications that promote social accessibility.
Downloads
References
P. Markellou, M. Rigou, and S. Sirmakessis, "A Web Adaptive Educational System for People with Hearing Difficulties," Educ. Inf. Technol., vol. 5, pp. 189–200, 2000, doi: 10.1023/A:1009606818900.
D. Avola, M. Bernardi, L. Cinque, G. L. Foresti, and C. Massaroni, "Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures," IEEE Trans. Multimedia, vol. 21, no. 1, pp. 234–245, Jan. 2019, doi: 10.1109/TMM.2018.2856094.
J. Li, J. Zhong, and N. Wang, "A Multimodal Human-Robot Sign Language Interaction Framework Applied in Social Robots," Front. Neurosci., vol. 17, 2023, doi: 10.3389/fnins.2023.1168888.
O. Koller, N. C. Camgoz, H. Ney, and R. Bowden, "Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 9, pp. 2306–2320, Sep. 2020, doi: 10.1109/TPAMI.2019.2911077.
N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, "Neural Sign Language Translation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, 2018, pp. 7784–7793, doi: 10.1109/CVPR.2018.00812.
J. Forster, C. Schmidt, and O. Koller, "Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather," in Proc. Int. Conf. Lang. Resour. Eval. (LREC), 2014, pp. 1911–1916.
S. Tamura and S. Kawasaki, "Recognition of Sign Language Motion Images," Pattern Recognit., vol. 21, no. 4, pp. 343–353, Jan. 1988, doi: 10.1016/0031-3203(88)90048-9.
T. Starner, J. Weaver, and A. Pentland, "Real-Time American Sign Language Recognition Using Desk and Wearable Computer-Based Video," IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 12, pp. 1371–1375, Dec. 1998, doi: 10.1109/34.735811.
T. W. Chong and B. G. Lee, "American Sign Language Recognition Using Leap Motion Controller with Machine Learning Approach," Sensors, vol. 18, no. 10, Oct. 2018, doi: 10.3390/s18103554.
W. Qi, S. E. Ovur, Z. Li, A. Marzullo, and R. Song, "Multi-Sensor Guided Hand Gesture Recognition for a Teleoperated Robot Using a Recurrent Neural Network," IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 6039–6045, Jul. 2021, doi: 10.1109/LRA.2021.3089999.
P. Kumar, H. Gauba, P. P. Roy, and D. P. Dogra, "A Multimodal Framework for Sensor-Based Sign Language Recognition," Neurocomputing, vol. 259, pp. 21–38, Oct. 2017, doi: 10.1016/j.neucom.2016.08.132.
J. J. Bird, A. Ekárt, and D. R. Faria, "British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language," Sensors, vol. 20, no. 18, Sep. 2020, doi: 10.3390/s20185151.
D. Wu et al., "Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1583–1597, Aug. 2016, doi: 10.1109/TPAMI.2016.2537340.
Y. Wu and T. S. Huang, "Vision-Based Gesture Recognition: A Review," in Proc. Int. Conf. Comput. Vis., 1999, pp. 103–115, doi: 10.1007/3-540-46616-9_10.
J. F. Lichtenauer, E. A. Hendriks, and M. J. T. Reinders, "Sign Language Recognition by Combining Statistical DTW and Independent Classification," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 2040–2046, 2008, doi: 10.1109/TPAMI.2008.123.
R. Kaluri and C. H. P. Reddy, "An Enhanced Framework for Sign Gesture Recognition Using Hidden Markov Model and Adaptive Histogram Technique," Int. J. Intell. Eng. Syst., vol. 10, no. 3, pp. 11–19, Jun. 2017, doi: 10.22266/ijies2017.0630.02.
A. Tharwat, T. Gaber, A. E. Hassanien, M. K. Shahin, and B. Refaat, "SIFT-Based Arabic Sign Language Recognition System," Adv. Intell. Syst. Comput., vol. 334, pp. 359–370, 2015, doi: 10.1007/978-3-319-13572-4_30.
R. Cui, H. Liu, and C. Zhang, "A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training," IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1880–1891, Jul. 2019, doi: 10.1109/TMM.2018.2889563.
W. Jintanachaiwat et al., "Using LSTM to Translate Thai Sign Language to Text in Real Time," Discover Artif. Intell., vol. 4, no. 1, Dec. 2024, doi: 10.1007/s44163-024-00113-8.
B. Saunders, N. C. Camgoz, and R. Bowden, "Progressive Transformers for End-to-End Sign Language Production," Apr. 2020.
X. Hei, C. Yu, H. Zhang, and A. Tapus, "A Bilingual Social Robot with Sign Language and Natural Language," in Proc. ACM/IEEE Int. Conf. Human-Robot Interact., IEEE Comput. Soc., Mar. 2024, pp. 526–529, doi: 10.1145/3610978.3640549.
S. Wang, X. Zuo, R. Wang, and R. Yang, "A Generative Human-Robot Motion Retargeting Approach Using a Single RGBD Sensor," IEEE Access, vol. 7, pp. 51499–51512, 2019, doi: 10.1109/ACCESS.2019.2911883.
B. Zhang, M. Müller, and R. Sennrich, "SLTUNET: A Simple Unified Model for Sign Language Translation," arXiv Preprint, May 2023.
P. Xie, T. Peng, Y. Du, and Q. Zhang, "Sign Language Production with Latent Motion Transformer," in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 3024–3034.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
Y. Hamidullah, J. van Genabith, and C. España-Bonet, "Sign Language Translation with Sentence Embedding Supervision," in Proc. 62nd Annu. Meeting Assoc. Comput. Linguistics (ACL), Bangkok, Thailand, Aug. 2024, pp. 425–434, doi: 10.18653/v1/2024.acl-short.40.
T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, "Complex Embeddings for Simple Link Prediction," in Proc. 33rd Int. Conf. Mach. Learn., 2016, vol. 48, pp. 2071–2080.
M. Gochoo et al., "Fine-Tuning Vision Transformer for Arabic Sign Language Video Recognition on Augmented Small-Scale Dataset," in Proc. IEEE Int. Conf. Syst. Man Cybern., IEEE, 2023, pp. 2880–2885, doi: 10.1109/SMC53992.2023.10394501.
M. Q. Li, B. C. M. Fung, and S.-C. Huang, "On the Effectiveness of Incremental Training of Large Language Models," in Proc. 12th Int. Conf. Large-Scale AI Systems (LSAIS), Nov. 2024, pp. 456–468, doi: 10.1145/lsais.2024.00113.
D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. 3rd Int. Conf. Learn. Representations (ICLR), Dec. 2014, pp. 1–15, doi: 10.48550/arXiv.1412.6980.
T. Sellam, D. Das, and A. P. Parikh, "BLEURT: Learning Robust Metrics for Text Generation," in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics (ACL), Apr. 2020, pp. 7881–7893, doi: 10.18653/v1/2020.acl-main.704.
C. Camargo, J. Gonçalves, M. Conde, F. J. Rodríguez-Sedano, P. Costa, and F. J. García-Peñalvo, "Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context," Jun. 02, 2021, MDPI AG, doi: 10.3390/s21124031.
L. H. Juang, "The Cooperation Modes for Two Humanoid Robots," Int. J. Soc. Robot., vol. 13, no. 7, pp. 1613–1623, Nov. 2021, doi: 10.1007/s12369-021-00753-1.
M. Q. Li, B. C. M. Fung, and S.-C. Huang, "On the Effectiveness of Incremental Training of Large Language Models," in Proc. 12th Int. Conf. Large-Scale AI Syst. (LSAIS), Nov. 2024, pp. 456–468, doi: 10.1145/lsais.2024.00113.
D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), Dec. 2014, pp. 1–15, doi: 10.48550/arXiv.1412.6980.
T. Sellam, D. Das, and A. P. Parikh, "BLEURT: Learning Robust Metrics for Text Generation," in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics (ACL), Apr. 2020, pp. 7881–7893, doi: 10.18653/v1/2020.acl-main.704.
C. Camargo, J. Gonçalves, M. Conde, F. J. Rodríguez-Sedano, P. Costa, and F. J. García-Peñalvo, "Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context," Sensors, vol. 21, no. 12, pp. 4031, Jun. 2021, doi: 10.3390/s21124031.
L. H. Juang, "The Cooperation Modes for Two Humanoid Robots," Int. J. Soc. Robot., vol. 13, no. 7, pp. 1613–1623, Nov. 2021, doi: 10.1007/s12369-021-00753-1.


Copyright (c) 2025 Journal of Information Systems and Informatics

This work is licensed under a Creative Commons Attribution 4.0 International License.
- I certify that I have read, understand and agreed to the Journal of Information Systems and Informatics (Journal-ISI) submission guidelines, policies and submission declaration. Submission already using the provided template.
- I certify that all authors have approved the publication of this and there is no conflict of interest.
- I confirm that the manuscript is the authors' original work and the manuscript has not received prior publication and is not under consideration for publication elsewhere and has not been previously published.
- I confirm that all authors listed on the title page have contributed significantly to the work, have read the manuscript, attest to the validity and legitimacy of the data and its interpretation, and agree to its submission.
- I confirm that the paper now submitted is not copied or plagiarized version of some other published work.
- I declare that I shall not submit the paper for publication in any other Journal or Magazine till the decision is made by journal editors.
- If the paper is finally accepted by the journal for publication, I confirm that I will either publish the paper immediately or withdraw it according to withdrawal policies
- I Agree that the paper published by this journal, I transfer copyright or assign exclusive rights to the publisher (including commercial rights)