Emotion Recognition from Adult Speech Using Machine Learning: A Study Based on the CREMA-D Dataset

Eric Anak Naweam

Emotion Recognition from Adult Speech Using Machine Learning: A Study Based on the CREMA-D Dataset

dc.contributor.author	Eric Anak Naweam
dc.date.accessioned	2026-04-20T07:19:16Z
dc.date.issued	2024
dc.description	Speech Emotion Recognition (SER) is the ability of a machine to understand and interpret human emotions. This ability has emerged as an important component in enhancing human-computer interaction (HCI), enabling systems to understand and respond to human emotions effectively. This research addresses several challenges in SER, including variability in emotional expression due to factors such as background noise, accents, and linguistic diversity, the difficulty in selecting suitable deep learning architectures and the existing datasets used for training and testing emotion recognition models often do not reflect the variety of emotional expressions found in real-world speech. This study focuses on analysing deep learning techniques commonly used for speech emotion recognition, including CNNs, LSTMs, Transformers, and hybrid models. A hybrid model integrating CNN and LSTM architectures is proposed, leveraging the CREMA-D dataset for training and testing. Feature extraction techniques such as Mel spectrograms are employed, and the model's performance is evaluated using metrics including accuracy, precision, recall, and F1-score. Python, TensorFlow, and Keras are used to implement the models, with development conducted on platforms like Visual Studio Code. Through comprehensive evaluation and analysis, the proposed 2D CNN-LSTM model achieved an accuracy of 59.1%, surpassing the baseline 2D CNN and demonstrating enhanced recognition of emotional states. However, despite this improvement, the overall accuracy remains relatively low compared to previous SER studies. Key challenges include difficulty in detecting subtle emotions caused by dataset imbalance. Future work suggests the further study on integrating transformer-based architectures, advanced data augmentation techniques and cross-validation methods to improve model generalization and performance. IX Keywords: Speech Emotion Recognition, Deep Learning, Convolutional Neural Networks, Long Short-Term Memory (LSTM) Networks, CREMA-D, Feature Extraction, Cross Validation.
dc.identifier.uri	https://scholarhub.unimas.my/handle/123456789/385
dc.language.iso	English
dc.publisher	Universiti Malaysia Sarawak (UNIMAS)
dc.relation.ispartofseries	Faculty of Computer Science and Information Technology
dc.subject	Adult Speech Using Machine , CREMA-D Dataset
dc.title	Emotion Recognition from Adult Speech Using Machine Learning: A Study Based on the CREMA-D Dataset
dc.type	Final Year Project

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 79325.pdf
Size:: 2.02 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Final Year Project Report/IMRaD