Home / Articles
Multimodal Emotion Recognition using Deep Learning in Audio and Text |
![]() |
Author Name Arjun. S , Soorya Prakash S, Karthikeyan A and Vimal M Abstract Emotion recognition plays a vital role in human-computer interaction and affective computing. Multimodal approaches combining audio and text data have demonstrated significant advancements in recognizing emotions with higher accuracy. This paper explores the use of deep learning models to integrate audio and textual modalities for emotion recognition. By leveraging advanced architectures such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for audio, and transformer-based language models for text, this study highlights the benefits of multimodal data fusion. The proposed method achieves improved performance compared to unimodal systems, paving the way for robust emotion-aware applications.
Keywords—Emotion recognition, multimodal deep learning, audio-text fusion, convolutional neural networks, transformers, affective computing.
Published On : 2024-12-13 Article Download : ![]() |