Real-Time Sign Language Recognition and Translation in Humanoid Robots Using Transformer-Based Model with a Knowledge Graph

  • Erick Busuulwa Nanjing University of Information Science and Technology, China
  • Li-Hong Juang Nanjing University of Information Science and Technology, China
Keywords: Sign Language Translation, Human-Robot Interaction, NAO Robot, Transformer Model, Gesture Recognition, Knowledge Graph, Webots Simulation

Abstract

For millions of deaf-mute individuals, sign language is the only means of communication; this creates barriers in daily interactions with non-signers, leading to the exclusion of these individuals in many areas of daily life. To address this, we propose a real-time sign language translation system using a Transformer model enhanced with a knowledge graph, designed for Human-Robot Interaction (HRI) with NAO robots. Our system bridges the communication gap by translating gestures into natural language (text). We used the RWTH-PHOENIX-Weather 2014T dataset for initial training, achieving a BLEU score of 29.1 and a Word Error Rate (WER) of 18.2% surpassing the baseline model. Due to the domain shift between human gestures and NAO robot gestures, we created a NAO-specific dataset and fine-tuned the model using transfer learning to accommodate an adapted environment and kinematic constraints that do not match the environment in which the robot was deployed. This reduced the WER to 17.6% and increased the BLEU score to 29.9. We tested our model’s capability with dynamic and practical HRI scenarios through comparative experiments in Webots. Integrating a knowledge graph into our model improved contextual disambiguation, significantly enhancing translation accuracy for gestures that weren't clear. Through effectively translating gestures into natural language, our system demonstrates strong potential for practical robotic applications that promote social accessibility.

Downloads

Download data is not yet available.

References

P. Markellou, M. Rigou, and S. Sirmakessis, "A Web Adaptive Educational System for People with Hearing Difficulties," Educ. Inf. Technol., vol. 5, pp. 189–200, 2000, doi: 10.1023/A:1009606818900.

D. Avola, M. Bernardi, L. Cinque, G. L. Foresti, and C. Massaroni, "Exploiting Recurrent Neural Networks and Leap Motion Controller for the Recognition of Sign Language and Semaphoric Hand Gestures," IEEE Trans. Multimedia, vol. 21, no. 1, pp. 234–245, Jan. 2019, doi: 10.1109/TMM.2018.2856094.

J. Li, J. Zhong, and N. Wang, "A Multimodal Human-Robot Sign Language Interaction Framework Applied in Social Robots," Front. Neurosci., vol. 17, 2023, doi: 10.3389/fnins.2023.1168888.

O. Koller, N. C. Camgoz, H. Ney, and R. Bowden, "Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos," IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 9, pp. 2306–2320, Sep. 2020, doi: 10.1109/TPAMI.2019.2911077.

N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, "Neural Sign Language Translation," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Salt Lake City, UT, USA, 2018, pp. 7784–7793, doi: 10.1109/CVPR.2018.00812.

J. Forster, C. Schmidt, and O. Koller, "Extensions of the Sign Language Recognition and Translation Corpus RWTH-PHOENIX-Weather," in Proc. Int. Conf. Lang. Resour. Eval. (LREC), 2014, pp. 1911–1916.

S. Tamura and S. Kawasaki, "Recognition of Sign Language Motion Images," Pattern Recognit., vol. 21, no. 4, pp. 343–353, Jan. 1988, doi: 10.1016/0031-3203(88)90048-9.

T. Starner, J. Weaver, and A. Pentland, "Real-Time American Sign Language Recognition Using Desk and Wearable Computer-Based Video," IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 12, pp. 1371–1375, Dec. 1998, doi: 10.1109/34.735811.

T. W. Chong and B. G. Lee, "American Sign Language Recognition Using Leap Motion Controller with Machine Learning Approach," Sensors, vol. 18, no. 10, Oct. 2018, doi: 10.3390/s18103554.

W. Qi, S. E. Ovur, Z. Li, A. Marzullo, and R. Song, "Multi-Sensor Guided Hand Gesture Recognition for a Teleoperated Robot Using a Recurrent Neural Network," IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 6039–6045, Jul. 2021, doi: 10.1109/LRA.2021.3089999.

P. Kumar, H. Gauba, P. P. Roy, and D. P. Dogra, "A Multimodal Framework for Sensor-Based Sign Language Recognition," Neurocomputing, vol. 259, pp. 21–38, Oct. 2017, doi: 10.1016/j.neucom.2016.08.132.

J. J. Bird, A. Ekárt, and D. R. Faria, "British Sign Language Recognition via Late Fusion of Computer Vision and Leap Motion with Transfer Learning to American Sign Language," Sensors, vol. 20, no. 18, Sep. 2020, doi: 10.3390/s20185151.

D. Wu et al., "Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1583–1597, Aug. 2016, doi: 10.1109/TPAMI.2016.2537340.

Y. Wu and T. S. Huang, "Vision-Based Gesture Recognition: A Review," in Proc. Int. Conf. Comput. Vis., 1999, pp. 103–115, doi: 10.1007/3-540-46616-9_10.

J. F. Lichtenauer, E. A. Hendriks, and M. J. T. Reinders, "Sign Language Recognition by Combining Statistical DTW and Independent Classification," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 11, pp. 2040–2046, 2008, doi: 10.1109/TPAMI.2008.123.

R. Kaluri and C. H. P. Reddy, "An Enhanced Framework for Sign Gesture Recognition Using Hidden Markov Model and Adaptive Histogram Technique," Int. J. Intell. Eng. Syst., vol. 10, no. 3, pp. 11–19, Jun. 2017, doi: 10.22266/ijies2017.0630.02.

A. Tharwat, T. Gaber, A. E. Hassanien, M. K. Shahin, and B. Refaat, "SIFT-Based Arabic Sign Language Recognition System," Adv. Intell. Syst. Comput., vol. 334, pp. 359–370, 2015, doi: 10.1007/978-3-319-13572-4_30.

R. Cui, H. Liu, and C. Zhang, "A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training," IEEE Trans. Multimedia, vol. 21, no. 7, pp. 1880–1891, Jul. 2019, doi: 10.1109/TMM.2018.2889563.

W. Jintanachaiwat et al., "Using LSTM to Translate Thai Sign Language to Text in Real Time," Discover Artif. Intell., vol. 4, no. 1, Dec. 2024, doi: 10.1007/s44163-024-00113-8.

B. Saunders, N. C. Camgoz, and R. Bowden, "Progressive Transformers for End-to-End Sign Language Production," Apr. 2020.

X. Hei, C. Yu, H. Zhang, and A. Tapus, "A Bilingual Social Robot with Sign Language and Natural Language," in Proc. ACM/IEEE Int. Conf. Human-Robot Interact., IEEE Comput. Soc., Mar. 2024, pp. 526–529, doi: 10.1145/3610978.3640549.

S. Wang, X. Zuo, R. Wang, and R. Yang, "A Generative Human-Robot Motion Retargeting Approach Using a Single RGBD Sensor," IEEE Access, vol. 7, pp. 51499–51512, 2019, doi: 10.1109/ACCESS.2019.2911883.

B. Zhang, M. Müller, and R. Sennrich, "SLTUNET: A Simple Unified Model for Sign Language Translation," arXiv Preprint, May 2023.

P. Xie, T. Peng, Y. Du, and Q. Zhang, "Sign Language Production with Latent Motion Transformer," in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2024, pp. 3024–3034.

K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

Y. Hamidullah, J. van Genabith, and C. España-Bonet, "Sign Language Translation with Sentence Embedding Supervision," in Proc. 62nd Annu. Meeting Assoc. Comput. Linguistics (ACL), Bangkok, Thailand, Aug. 2024, pp. 425–434, doi: 10.18653/v1/2024.acl-short.40.

T. Trouillon, J. Welbl, S. Riedel, E. Gaussier, and G. Bouchard, "Complex Embeddings for Simple Link Prediction," in Proc. 33rd Int. Conf. Mach. Learn., 2016, vol. 48, pp. 2071–2080.

M. Gochoo et al., "Fine-Tuning Vision Transformer for Arabic Sign Language Video Recognition on Augmented Small-Scale Dataset," in Proc. IEEE Int. Conf. Syst. Man Cybern., IEEE, 2023, pp. 2880–2885, doi: 10.1109/SMC53992.2023.10394501.

M. Q. Li, B. C. M. Fung, and S.-C. Huang, "On the Effectiveness of Incremental Training of Large Language Models," in Proc. 12th Int. Conf. Large-Scale AI Systems (LSAIS), Nov. 2024, pp. 456–468, doi: 10.1145/lsais.2024.00113.

D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. 3rd Int. Conf. Learn. Representations (ICLR), Dec. 2014, pp. 1–15, doi: 10.48550/arXiv.1412.6980.

T. Sellam, D. Das, and A. P. Parikh, "BLEURT: Learning Robust Metrics for Text Generation," in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics (ACL), Apr. 2020, pp. 7881–7893, doi: 10.18653/v1/2020.acl-main.704.

C. Camargo, J. Gonçalves, M. Conde, F. J. Rodríguez-Sedano, P. Costa, and F. J. García-Peñalvo, "Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context," Jun. 02, 2021, MDPI AG, doi: 10.3390/s21124031.

L. H. Juang, "The Cooperation Modes for Two Humanoid Robots," Int. J. Soc. Robot., vol. 13, no. 7, pp. 1613–1623, Nov. 2021, doi: 10.1007/s12369-021-00753-1.

M. Q. Li, B. C. M. Fung, and S.-C. Huang, "On the Effectiveness of Incremental Training of Large Language Models," in Proc. 12th Int. Conf. Large-Scale AI Syst. (LSAIS), Nov. 2024, pp. 456–468, doi: 10.1145/lsais.2024.00113.

D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), Dec. 2014, pp. 1–15, doi: 10.48550/arXiv.1412.6980.

T. Sellam, D. Das, and A. P. Parikh, "BLEURT: Learning Robust Metrics for Text Generation," in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics (ACL), Apr. 2020, pp. 7881–7893, doi: 10.18653/v1/2020.acl-main.704.

C. Camargo, J. Gonçalves, M. Conde, F. J. Rodríguez-Sedano, P. Costa, and F. J. García-Peñalvo, "Systematic Literature Review of Realistic Simulators Applied in Educational Robotics Context," Sensors, vol. 21, no. 12, pp. 4031, Jun. 2021, doi: 10.3390/s21124031.

L. H. Juang, "The Cooperation Modes for Two Humanoid Robots," Int. J. Soc. Robot., vol. 13, no. 7, pp. 1613–1623, Nov. 2021, doi: 10.1007/s12369-021-00753-1.

Published
2025-03-18
Abstract views: 289 times
Download PDF: 176 times
How to Cite
Busuulwa, E., & Juang, L.-H. (2025). Real-Time Sign Language Recognition and Translation in Humanoid Robots Using Transformer-Based Model with a Knowledge Graph. Journal of Information Systems and Informatics, 7(1), 178-201. https://doi.org/10.51519/journalisi.v7i1.992
Section
Articles