A dynamic Malaysian sign language dataset for sign language recognition and translation

Chong Yuan Ting; Yessane Shrrie Nagendhra Rao; Rehman Ullah Khan; Teh Chee Siong; Mohamad Hardyman Barawi; Mohd Shahrizal Sunar; Joan Sim Jo Jo

A dynamic Malaysian sign language dataset for sign language recognition and translation

dc.citation.epage	11
dc.citation.spage	1
dc.citation.volume	65
dc.contributor.author	Chong Yuan Ting
dc.contributor.author	Yessane Shrrie Nagendhra Rao
dc.contributor.author	Rehman Ullah Khan
dc.contributor.author	Teh Chee Siong
dc.contributor.author	Mohamad Hardyman Barawi
dc.contributor.author	Mohd Shahrizal Sunar
dc.contributor.author	Joan Sim Jo Jo
dc.contributor.department	Faculty of Social Sciences and Humanities
dc.date.accessioned	2026-03-09T04:30:13Z
dc.date.issued	2026
dc.description.abstract	Sign languages all around the world are unique and diverse. Each sign language shows the differences in cultural nuances of its origin locale giving it is distinctive nature. Thus, despite the positive outcomes of sign language recognition and translation research that has been widely conducted worldwide, there are still notable limitations to each system which are mainly caused by data limitations. The sign language recognition and translation research in Malaysia especially has been set back by the limited size and nature of datasets available that are concurrent with current technological developments. The current datasets available for Malaysian Sign Language (BIM – Bahasa Isyarat Malaysia) are small and limited to fingerspelling of alphanumeric characters and several dynamic words and short phrases. However, given the continuous nature of the sign language communication, these data are not enough to properly train machine learning models to recognize and translate continuous real-world signs. Therefore, in order to address this issue, we introduce a dynamic BIM dataset which comprises of video, gloss, and translation data consisting of alphanumeric characters, dynamic words and short phrases, and continuous sentences. The dataset is split into two versions. The first version, BIM-SSD-V1 dataset comprises of 4,858 parallel video (RGB frames), gloss, and translation data while the second version, BIM-SSD-V2 dataset comprises of 3,143 parallel video (RGB frames), keypoints and gloss data for recognition purposes, and 4,900 parallel gloss and translation data for translation purposes. The raw videos are also available in the dataset. The dataset was developed and compiled with the help of the Deaf and Hard-of-Hearing community. This process also included the development of a Sign Language Module (translations for the video and gloss data) to assist in the development of the dataset. The image and video data were collected using smartphones and the respective gloss annotations for the data were prepared with the help of a BIM expert. The data collection process was designed to reflect everyday communication scenarios by incorporating varied sentence constructions, repeated signing instances, and recordings under different backgrounds and contextual conditions to introduce data-level variability relevant to real-world use. The total number of participants involved in the data collection process was four. There are also four samples for every character, word, phrase or sentence in the Sign Language Module. The dataset can mainly be reused by researchers who would like to conduct sign language recognition and translation research using the Sign-to-Gloss-to-Text framework. However, the dataset is not limited to only one framework and can be used for other sign language recognition and translation research frameworks accordingly.
dc.description.references	Uncontrolled Keywords: Deaf and hard-of-hearing, Sign language dataset, Media, Pipe keypoints, Gloss, Sign-to-gloss-to-text framework.
dc.description.status	Published
dc.identifier.citation	Chong, Y. T., Rao, Y. S. N., Khan, R. U., Teh, C. S., Barawi, M. H., Sunar, M. S., & Sim, J. J. J. (2026). A dynamic Malaysian sign language dataset for sign language recognition and translation. Data in brief, 65, 1-11. https://doi.org/10.1016/j.dib.2026.112511
dc.identifier.doi	https://doi.org/10.1016/j.dib.2026.112511
dc.identifier.email	krullah@unimas.my
dc.identifier.email	csteh@unimas.my
dc.identifier.email	bmhardyman@unimas.my
dc.identifier.email	sjjoan@unimas.my
dc.identifier.issn	2352-3409
dc.identifier.uri	https://www.sciencedirect.com/science/article/pii/S2352340926000648
dc.identifier.uri	https://scholarhub.unimas.my/handle/123456789/157
dc.publisher	Elsevier Inc.
dc.relation.ispartof	Data in Brief
dc.title	A dynamic Malaysian sign language dataset for sign language recognition and translation
dc.type	Articles
dc.type.status	Yes

Files

Original bundle

Now showing 1 - 1 of 1

Name:: A dynamic Malaysian.pdf
Size:: 1.14 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Journal Articles