Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation

  • Rizkial Achmad Universitas Sains dan Teknologi, Indonesia
  • Yokelin Tokoro Universitas Cenderawasih, Indonesia
  • Jusuf Haurissa Universitas Sains dan Teknologi, Indonesia
  • Andik Wijanarko Universitas Amikom Purwokerto, Indonesia
Keywords: Machine Translation, RNN-GRU, Papua, Indonesia

Abstract

The Papuan Sentani language is spoken in the city of Jayapura, Papua. The law states the need to preserve regional languages. One of them is by building an Indonesian-Sentani Papua translation machine. The problem is how to build a translation machine and what model to choose in doing so. The model chosen is Recurrent Neural Network – Gated Recurrent Units (RNN-GRU) which has been widely used to build regional languages in Indonesia. The method used is an experiment starting from creating a parallel corpus, followed by corpus training using the RNN-GRU model, and the final step is conducting an evaluation using Bilingual Evaluation Understudy (BLEU) to find out the score. The parallel corpus used contains 281 sentences, each sentence has an average length of 8 words. The training time required is 3 hours without using a GPU. The result of this research was that a fairly good BLEU score was obtained, namely 35.3, which means that the RNN-GRU model and parallel corpus produced sufficient translation quality and could still be improved.

Downloads

Download data is not yet available.

References

Badan Pengembangan Bahasa dan Perbukuan Kemdikbud, “Peta Bahasa,” Petabahasa.Kemdikbud.Go.Id. [Online]. Available: https://petabahasa.kemdikbud.go.id/

S. Luturmas, T. Berlianty, and B. Agustina, “Pelestarian Bahasa Daerah Tanimbar Sebagai Upaya Perlindungan Ekspresi Budaya Tradisional,” TATOHI: Jurnal Ilmu Hukum, vol. 2, no. 1, pp. 69–78, 2022, [Online]. Available: https://fhukum.unpatti.ac.id/jurnal/tatohi/article/view/897

G. J. M. Mantiri and H. Ch. Iwong, “Eksistensi Bahasa Namblong dan Faktor-faktor Penyebab Pergeseran di Kabupaten Jayapura,” Hasta Wiyata, vol. 6, no. 1, pp. 76–85, 2023, doi: 10.21776/ub.hastawiyata.2023.006.01.08.

Y. Fauziyah, R. Ilyas, and F. Kasyidi, “Mesin Penterjemah Bahasa Indonesia-Bahasa Sunda Menggunakan Recurrent Neural Networks,” Jurnal Teknoinfo, vol. 16, no. 2, p. 313, 2022, doi: 10.33365/jti.v16i2.1930.

A. A. Suryani, I. Arieshanti, B. W. Yohanes, M. Subair, S. D. Budiwati, and B. S. Rintyarna, “Enriching English into Sundanese and Javanese translation list using pivot language,” in 2016 International Conference on Information & Communication Technology and Systems (ICTS), 2016, pp. 167–171. doi: 10.1109/ICTS.2016.7910293.

Z. Abidin, “Translation of Sentence Lampung-Indonesian Languages with Neural Machine Translation Attention Based Approach,” Inovasi Pembangunan : Jurnal Kelitbangan, vol. 6, no. 02, pp. 191–206, 2018, doi: 10.35450/jip.v6i02.97.

H. Sujaini, “Meningkatkan Peran Model Bahasa dalam Mesin Penerjemah Statistik (Studi Kasus Bahasa Indonesia-Dayak Kanayatn),” Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika, vol. 3, no. 2, pp. 51–56, 2017, doi: 10.23917/khif.v3i2.4398.

I. Hadi, “Uji Akurasi Mesin Penerjemah Statistik (MPS) Bahasa Indonesia Ke Bahasa Melayu Sambas Dan Mesin Penerjemah Statistik (MPS) Bahasa Melayu Sambas Ke Bahasa Indonesia,” Jurnal Sistem dan Teknologi Informasi, vol. 2, pp. 1–6, 2014.

Z. Abidin, “Penerapan Neural Machine Translation untuk Eksperimen Penerjemahan secara Otomatis pada Bahasa Lampung – Indonesia,” Prosiding Seminar Nasional Metode Kuantitatif 2017, no. 978, pp. 53–68, 2017.

W. Gunawan, H. Sujaini, and T. Tursina, “Analisis Perbandingan Nilai Akurasi Mekanisme Attention Bahdanau dan Luong pada Neural Machine Translation Bahasa Indonesia ke Bahasa Melayu Ketapang dengan Arsitektur Recurrent Neural Network,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 7, no. 3, p. 488, 2021, doi: 10.26418/jp.v7i3.50287.

I. G. B. A. Budaya, M. W. A. Kesiman, and I. M. G. Sunarya, “Perancangan Mesin Translasi berbasis Neural dari Bahasa Kawi ke dalam Bahasa Indonesia menggunakan Microframework Flask,” Jurnal Sistem dan Informatika (JSI), vol. 16, no. 2, pp. 94–103, 2022.

H. Sujaini, “Peningkatan Akurasi Penerjemah Bahasa Daerah dengan Optimasi Korpus Paralel,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI), vol. 7, no. 1, 2018, doi: 10.22146/jnteti.v7i1.394.

A. Faqih, “Penggunaan Google Translate Dalam Penerjemahan Teks Bahasa Arab Ke Dalam Bahasa Indonesia,” ALSUNIYAT: Jurnal Penelitian Bahasa, Sastra, dan Budaya Arab, vol. 1, no. 2, pp. 88–97, 2018, doi: 10.17509/alsuniyat.v1i2.24216.

J. Sheny et al., “The source-target domain mismatch problem in machine translation,” ArXiv, 2019.

A. Alqudsi, N. Omar, and K. Shaker, “A Hybrid Rules and Statistical Method for Arabic to English Machine Translation,” in 2nd International Conference on Computer Applications and Information Security, ICCAIS 2019, IEEE, 2019. doi: 10.1109/CAIS.2019.8769545.

M. Singh, R. Kumar, and I. Chana, “Improving Neural Machine Translation Using Rule-Based Machine Translation,” 2019 7th International Conference on Smart Computing and Communications, ICSCC 2019, pp. 1–5, 2019, doi: 10.1109/ICSCC.2019.8843685.

J. Zhang, M. Utiyama, E. Sumita, G. Neubig, and S. Nakamura, “Improving neural machine translation through phrase-based soft forced decoding,” Machine Translation, vol. 34, no. 1, pp. 21–39, 2020, doi: 10.1007/s10590-020-09244-y.

L. Li, C. Parra Escartín, A. Way, and Q. Liu, “Combining translation memories and statistical machine translation using sparse features,” Machine Translation, vol. 30, no. 3–4, pp. 183–202, 2016, doi: 10.1007/s10590-016-9187-6.

P. Koehn, Statistical Machine Translation, no. 2. 2017. doi: 10.5565/rev/tradumatica.203.

H. Cuong and K. Sima’an, A survey of domain adaptation for statistical machine translation, vol. 31, no. 4. Springer Netherlands, 2017. doi: 10.1007/s10590-018-9216-8.

M. A. Haji Sismat, “Neural and Statistical Machine Translation: A comparative error analysis,” in Conference: 17th International Conference on Translation, 2019.

Y. Zhang and G. Liu, “Paragraph-Parallel based Neural Machine Translation Model with Hierarchical Attention,” J Phys Conf Ser, vol. 1453, no. 1, 2020, doi: 10.1088/1742-6596/1453/1/012006.

J. E. Ortega, R. Castro Mamani, and K. Cho, “Neural machine translation with a polysynthetic low resource language,” Machine Translation, vol. 34, no. 4, pp. 325–346, 2021, doi: 10.1007/s10590-020-09255-9.

I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” Adv Neural Inf Process Syst, vol. 4, no. January, pp. 3104–3112, 2014.

B. Van Merri and C. S. Fellow, “Learning Phrase Representations using RNN Encoder – Decoder for Statistical Machine Translation,” in Proceedings ofthe 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734.

S. Garg, S. Peitz, U. Nallasamy, and M. Paulik, “Jointly learning to align and translate with transformer models,” EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 4453–4462, 2019, doi: 10.18653/v1/d19-1453.

D. Britz, A. Goldie, M. T. Luong, and Q. V. Le, “Massive exploration of neural machine translation architectures,” EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 1442–1451, 2017, doi: 10.18653/v1/d17-1151.

Y. Liu, D. Zhang, L. Du, Z. Gu, J. Qiu, and Q. Tan, “A Simple but Effective Way to Improve the Performance of RNN-Based Encoder in Neural Machine Translation Task,” in 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC), 2019, pp. 416–421. doi: 10.1109/DSC.2019.00069.

X. Wang, C. Chen, and Z. Xing, “Domain-specific machine translation with recurrent neural network for software localization,” Empir Softw Eng, vol. 24, no. 6, pp. 3514–3545, 2019, doi: 10.1007/s10664-019-09702-z.

A. Khan and A. Sarfaraz, “RNN-LSTM-GRU based language transformation,” Soft comput, vol. 23, no. 24, pp. 13007–13024, 2019, doi: 10.1007/s00500-019-04281-z.

A. Nilsen, “Perbandingan Model RNN, Model LSTM, dan Model GRU dalam Memprediksi Harga Saham-Saham LQ45,” Jurnal Statistika dan Aplikasinya, vol. 6, no. 1, pp. 137–147, 2022, doi: 10.21009/jsa.06113.

A. Lawi, H. Mesra, and S. Amir, “Implementation of Long Short-Term Memory and Gated Recurrent Units on Grouped Time-Series Data to Predict Stock Prices Accurately _ Research Square.pdf,” Res Sq, 2021.

L. Corallo, G. Li, K. Reagan, A. Saxena, A. S. Varde, and B. Wilde, “A Framework for German-English Machine Translation with GRU RNN,” in CEUR Workshop Proceedings, 2022.

S. Chauhan and P. Daniel, “A Comprehensive Survey on Various Fully Automatic Machine Translation Evaluation Metrics,” Neural Process Lett, 2022, doi: 10.1007/s11063-022-10835-4.

Published
2023-12-02
Abstract views: 1161 times
Download PDF: 582 times
How to Cite
Achmad, R., Tokoro, Y., Haurissa, J., & Wijanarko, A. (2023). Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation. Journal of Information Systems and Informatics, 5(4), 1449-1460. https://doi.org/10.51519/journalisi.v5i4.597