Emotion Recognition from Adult Speech Using Machine Learning: A Study Based on the CREMA-D Dataset

dc.contributor.authorEric Anak Naweam
dc.date.accessioned2026-04-20T07:19:16Z
dc.date.issued2024
dc.descriptionSpeech Emotion Recognition (SER) is the ability of a machine to understand and interpret human emotions. This ability has emerged as an important component in enhancing human-computer interaction (HCI), enabling systems to understand and respond to human emotions effectively. This research addresses several challenges in SER, including variability in emotional expression due to factors such as background noise, accents, and linguistic diversity, the difficulty in selecting suitable deep learning architectures and the existing datasets used for training and testing emotion recognition models often do not reflect the variety of emotional expressions found in real-world speech. This study focuses on analysing deep learning techniques commonly used for speech emotion recognition, including CNNs, LSTMs, Transformers, and hybrid models. A hybrid model integrating CNN and LSTM architectures is proposed, leveraging the CREMA-D dataset for training and testing. Feature extraction techniques such as Mel spectrograms are employed, and the model's performance is evaluated using metrics including accuracy, precision, recall, and F1-score. Python, TensorFlow, and Keras are used to implement the models, with development conducted on platforms like Visual Studio Code. Through comprehensive evaluation and analysis, the proposed 2D CNN-LSTM model achieved an accuracy of 59.1%, surpassing the baseline 2D CNN and demonstrating enhanced recognition of emotional states. However, despite this improvement, the overall accuracy remains relatively low compared to previous SER studies. Key challenges include difficulty in detecting subtle emotions caused by dataset imbalance. Future work suggests the further study on integrating transformer-based architectures, advanced data augmentation techniques and cross-validation methods to improve model generalization and performance. IX Keywords: Speech Emotion Recognition, Deep Learning, Convolutional Neural Networks, Long Short-Term Memory (LSTM) Networks, CREMA-D, Feature Extraction, Cross Validation.
dc.identifier.urihttps://scholarhub.unimas.my/handle/123456789/385
dc.language.isoEnglish
dc.publisherUniversiti Malaysia Sarawak (UNIMAS)
dc.relation.ispartofseriesFaculty of Computer Science and Information Technology
dc.subjectAdult Speech Using Machine , CREMA-D Dataset
dc.titleEmotion Recognition from Adult Speech Using Machine Learning: A Study Based on the CREMA-D Dataset
dc.typeFinal Year Project

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
79325.pdf
Size:
2.02 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: