A dynamic Malaysian sign language dataset for sign language recognition and translation

dc.citation.epage11
dc.citation.spage1
dc.citation.volume65
dc.contributor.authorChong Yuan Ting
dc.contributor.authorYessane Shrrie Nagendhra Rao
dc.contributor.authorRehman Ullah Khan
dc.contributor.authorTeh Chee Siong
dc.contributor.authorMohamad Hardyman Barawi
dc.contributor.authorMohd Shahrizal Sunar
dc.contributor.authorJoan Sim Jo Jo
dc.contributor.departmentFaculty of Social Sciences and Humanities
dc.date.accessioned2026-03-09T04:30:13Z
dc.date.issued2026
dc.description.abstractSign languages all around the world are unique and diverse. Each sign language shows the differences in cultural nuances of its origin locale giving it is distinctive nature. Thus, despite the positive outcomes of sign language recognition and translation research that has been widely conducted worldwide, there are still notable limitations to each system which are mainly caused by data limitations. The sign language recognition and translation research in Malaysia especially has been set back by the limited size and nature of datasets available that are concurrent with current technological developments. The current datasets available for Malaysian Sign Language (BIM – Bahasa Isyarat Malaysia) are small and limited to fingerspelling of alphanumeric characters and several dynamic words and short phrases. However, given the continuous nature of the sign language communication, these data are not enough to properly train machine learning models to recognize and translate continuous real-world signs. Therefore, in order to address this issue, we introduce a dynamic BIM dataset which comprises of video, gloss, and translation data consisting of alphanumeric characters, dynamic words and short phrases, and continuous sentences. The dataset is split into two versions. The first version, BIM-SSD-V1 dataset comprises of 4,858 parallel video (RGB frames), gloss, and translation data while the second version, BIM-SSD-V2 dataset comprises of 3,143 parallel video (RGB frames), keypoints and gloss data for recognition purposes, and 4,900 parallel gloss and translation data for translation purposes. The raw videos are also available in the dataset. The dataset was developed and compiled with the help of the Deaf and Hard-of-Hearing community. This process also included the development of a Sign Language Module (translations for the video and gloss data) to assist in the development of the dataset. The image and video data were collected using smartphones and the respective gloss annotations for the data were prepared with the help of a BIM expert. The data collection process was designed to reflect everyday communication scenarios by incorporating varied sentence constructions, repeated signing instances, and recordings under different backgrounds and contextual conditions to introduce data-level variability relevant to real-world use. The total number of participants involved in the data collection process was four. There are also four samples for every character, word, phrase or sentence in the Sign Language Module. The dataset can mainly be reused by researchers who would like to conduct sign language recognition and translation research using the Sign-to-Gloss-to-Text framework. However, the dataset is not limited to only one framework and can be used for other sign language recognition and translation research frameworks accordingly.
dc.description.referencesUncontrolled Keywords: Deaf and hard-of-hearing, Sign language dataset, Media, Pipe keypoints, Gloss, Sign-to-gloss-to-text framework.
dc.description.statusPublished
dc.identifier.citationChong, Y. T., Rao, Y. S. N., Khan, R. U., Teh, C. S., Barawi, M. H., Sunar, M. S., & Sim, J. J. J. (2026). A dynamic Malaysian sign language dataset for sign language recognition and translation. Data in brief, 65, 1-11. https://doi.org/10.1016/j.dib.2026.112511
dc.identifier.doihttps://doi.org/10.1016/j.dib.2026.112511
dc.identifier.emailkrullah@unimas.my
dc.identifier.emailcsteh@unimas.my
dc.identifier.emailbmhardyman@unimas.my
dc.identifier.emailsjjoan@unimas.my
dc.identifier.issn2352-3409
dc.identifier.urihttps://www.sciencedirect.com/science/article/pii/S2352340926000648
dc.identifier.urihttps://scholarhub.unimas.my/handle/123456789/157
dc.publisherElsevier Inc.
dc.relation.ispartofData in Brief
dc.titleA dynamic Malaysian sign language dataset for sign language recognition and translation
dc.typeArticles
dc.type.statusYes

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A dynamic Malaysian.pdf
Size:
1.14 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections