Voice spoofing countermeasure for voice replay attacks using deep learning.

Zhou, Jincheng; Hai, Tao; Jawawi, Dayang N.A.; Wang, Dan; Ibeke, Ebuka; Biamba, Cresantus

doi:10.1186/s13677-022-00306-5

Voice spoofing countermeasure for voice replay attacks using deep learning.

Zhou, Jincheng; Hai, Tao; Jawawi, Dayang N.A.; Wang, Dan; Ibeke, Ebuka; Biamba, Cresantus

Authors

Jincheng Zhou

Tao Hai

Dayang N.A. Jawawi

Dan Wang

Dr Ebuka Ibeke e.ibeke@rgu.ac.uk
Senior Lecturer

Cresantus Biamba

Abstract

In our everyday lives, we communicate with each other using several means and channels of communication, as communication is crucial in the lives of humans. Listening and speaking are the primary forms of communication. For listening and speaking, the human voice is indispensable. Voice communication is the simplest type of communication. The Automatic Speaker Verification (ASV) system verifies users with their voices. These systems are susceptible to voice spoofing attacks - logical and physical access attacks. Recently, there has been a notable development in the detection of these attacks. Attackers use enhanced gadgets to record users’ voices, replay them for the ASV system, and be granted access for harmful purposes. In this work, we propose a secure voice spoofing countermeasure to detect voice replay attacks. We enhanced the ASV system security by building a spoofing countermeasure dependent on the decomposed signals that consist of prominent information. We used two main features— the Gammatone Cepstral Coefficients and Mel-Frequency Cepstral Coefficients— for the audio representation. For the classification of the features, we used Bi-directional Long-Short Term Memory Network in the cloud, a deep learning classifier. We investigated numerous audio features and examined each feature’s capability to obtain the most vital details from the audio for it to be labelled genuine or a spoof speech. Furthermore, we use various machine learning algorithms to illustrate the superiority of our system compared to the traditional classifiers. The results of the experiments were classified according to the parameters of accuracy, precision rate, recall, F1-score, and Equal Error Rate (EER). The results were 97%, 100%, 90.19% and 94.84%, and 2.95%, respectively.

Citation

ZHOU, J., HAI, T., JAWAWI, D.N.A., WANG, D., IBEKE, E. and BIAMBA, C. 2022. Voice spoofing countermeasure for voice replay attacks using deep learning. Journal of cloud computing: advances, systems and applications [online], 11, article number 51. Available from: https://doi.org/10.1186/s13677-022-00306-5

Journal Article Type	Article
Acceptance Date	Jul 29, 2022
Online Publication Date	Sep 24, 2022
Publication Date	Dec 31, 2022
Deposit Date	Aug 1, 2022
Publicly Available Date	Aug 1, 2022
Journal	Journal of cloud computing
Electronic ISSN	2192-113X
Publisher	Springer
Peer Reviewed	Peer Reviewed
Volume	11
Article Number	51
DOI	https://doi.org/10.1186/s13677-022-00306-5
Keywords	Automatic Speaker Verification (ASV); Spoofing voice biometrics; Deep learning neural network; Machine learning
Public URL	https://rgu-repository.worktribe.com/output/1724410

Files

ZHOU 2022 Voice spoofing countermeasure (VOR) (2.3 Mb)
PDF

Publisher Licence URL
https://creativecommons.org/licenses/by/4.0/

Copyright Statement
© The Author(s) 2022. The version of record of this article, first published in Journal of Cloud Computing, is available online at Publisher’s website: https://doi.org/10.1186/s13677-022-00306-5