Skip to main content

Research Repository

Advanced Search

Cross-modal gated feature enhancement for multimodal emotion recognition in conversations.

Zhao, Shiyun; Ren, Jinchang; Zhou, Xiaojuan

Authors

Shiyun Zhao

Xiaojuan Zhou



Abstract

Emotion recognition in conversations (ERC), which involves identifying the emotional state of each utterance within a dialogue, plays a vital role in developing empathetic artificial intelligence systems. In practical applications, such as video-based recruitment interviews, customer service, health monitoring, intelligent personal assistants, and online education, ERC can facilitate the analysis of emotional cues, improve decision-making processes, and enhance user interaction and satisfaction. Current multimodal emotion recognition research faces several challenges, such as ineffective emotional information extraction from single modalities, underused complementary features, and inter-modal redundancy. To tackle these issues, this paper introduces a cross-modal gated attention mechanism for emotion recognition. The method extracts and fuses visual, textual, and auditory features to enhance accuracy and stability. A cross-modal guided gating mechanism is designed to strengthen single-modality features and utilize a third modality to improve bimodal feature fusion, boosting multimodal feature representation. Furthermore, a cross-modal distillation loss function is proposed to reduce redundancy and improve feature discrimination. This function employs a dual-supervision mechanism with teacher and student models, ensuring consistency in single-modal, bimodal, and trimodal feature representations. Experimental results on the IEMOCAP and MELD datasets indicate that the proposed method achieves higher accuracy and comparable F1 scores than existing approaches, highlighting its effectiveness in capturing multimodal dependencies and balancing modality contributions.

Citation

ZHAO, S., REN, J. and ZHOU, X. 2025. Cross-modal gated feature enhancement for multimodal emotion recognition in conversations. Scientific reports [online], 15, article number 30004. Available from: https://doi.org/10.1038/s41598-025-11989-6

Journal Article Type Article
Acceptance Date Jul 14, 2025
Online Publication Date Aug 16, 2025
Publication Date Dec 31, 2025
Deposit Date Aug 29, 2025
Publicly Available Date Aug 29, 2025
Journal Scientific reports
Electronic ISSN 2045-2322
Publisher Springer
Peer Reviewed Peer Reviewed
Volume 15
Article Number 30004
DOI https://doi.org/10.1038/s41598-025-11989-6
Keywords Cross modal; Emotion recognition in conversations; Transformer; Deep learning; Gated attention
Public URL https://rgu-repository.worktribe.com/output/2988398

Files




You might also like



Downloadable Citations