Yan Zhao
Multi-head attention-based long short-term memory for depression detection from speech.
Zhao, Yan; Liang, Zhenlin; Du, Jing; Zhang, Li; Liu, Chengyu; Zhao, Li
Authors
Zhenlin Liang
Jing Du
Li Zhang
Chengyu Liu
Li Zhao
Abstract
Depression is a mental disorder that threatens the health and normal life of people. Hence, it is essential to provide an effective way to detect depression. However, research on depression detection mainly focuses on utilizing different parallel features from audio, video, and text for performance enhancement regardless of making full usage of the inherent information from speech. To focus on more emotionally salient regions of depression speech, in this research, we propose a multi-head time-dimension attention-based long short-term memory (LSTM) model. We first extract frame-level features to store the original temporal relationship of a speech sequence and then analyze their difference between speeches of depression and those of health status. Then, we study the performance of various features and use a modified feature set as the input of the LSTM layer. Instead of using the output of the traditional LSTM, multi-head time-dimension attention is employed to obtain more key time information related to depression detection by projecting the output into different subspaces. The experimental results show the proposed model leads to improvements of 2.3 and 10.3% over the LSTM model on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) and the Multi-modal Open Dataset for Mental-disorder Analysis (MODMA) corpus, respectively.
Citation
ZHAO, Y., LIANG, Z., DU, J., ZHANG, L., LIU, C. and ZHAO, L. 2021. Multi-head attention-based long short-term memory for depression detection from speech. Frontiers in neurorobotics [online], 15, article 684037. Available from: https://doi.org/10.3389/fnbot.2021.684037
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 19, 2021 |
Online Publication Date | Aug 26, 2021 |
Publication Date | Dec 31, 2021 |
Deposit Date | Sep 21, 2021 |
Publicly Available Date | Sep 21, 2021 |
Journal | Frontiers in Neurorobotics |
Electronic ISSN | 1662-5218 |
Publisher | Frontiers Media |
Peer Reviewed | Peer Reviewed |
Volume | 15 |
Article Number | 684037 |
DOI | https://doi.org/10.3389/fnbot.2021.684037 |
Keywords | Depression; LSTM; Multi-head attention; Frame-level feature; Deep learning |
Public URL | https://rgu-repository.worktribe.com/output/1465132 |
Files
ZHANG 2021 Multi-head (VOR)
(3.7 Mb)
PDF
Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/
Copyright Statement
© 2021 Zhao, Liang, Du, Zhang, Liu and Zhao.
You might also like
Feature selection using enhanced particle swarm optimisation for classification models.
(2021)
Journal Article
In-house deep environmental sentience for smart homecare solutions toward ageing society.
(2020)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search