Skip to main content

Research Repository

Advanced Search

Speaker model adaptation in automatic speech recognition.

Chan, Carlos Chun Ming

Authors

Carlos Chun Ming Chan



Contributors

Jon Tyler
Supervisor

Iain McKenzie
Supervisor

Stephen Cox
Supervisor

Abstract

One of the main obstacles of automatic speech recognition is to achieve speaker independence. It is generally believed that the main difficulty is the inter-speaker variability in which the acoustic characteristics of different speakers are not the same. There are mainly three approaches to overcome this problem; extract invariant information, multiple template representation and speaker adaptation. This thesis describes a study of employing different speaker adaptation techniques in an attempt to improve the performance of a continuous density HMM-based speaker independent speech recogniser. Two alternative approaches which compensate for speaker difference are described. Speaker normalisation in which different speakers are transformed into a common parameter space is briefly outlined. Speaker adaptation in which a speech recogniser is tuned into the new speaker’s acoustic characteristics is examined in detail and some of the previously proposed techniques are outlined. Several different speaker adaptation techniques are investigated in detail. Three of these techniques are based on transformation of spectral parameters of the recogniser into the new speaker’s domain. The first method uses a spectral probabilistic mapping, the second method uses a neural network architecture as non-linear transform. The third method is based on statistical regression analysis. Apart from transform adaptation, two more adaptation techniques which are based on statistical estimation are also described in detail. The first technique is based on Bayesian inference estimation. The second technique, not previously applied in speaker adaptation, is based on statistical time series analysis via a set of recursive Kalman filtering equations. The speaker adaptation techniques are evaluated using two large population speech databases. Each of the techniques is compared with a speaker independent recogniser and a comparison between different techniques is also made. The experimental results have shown that recognition performance improves when only a small amount of data is given from the new speaker for adaptation.

Citation

CHAN, C.C.M. 1993. Speaker model adaptation in automatic speech recognition. Robert Gordon University, PhD thesis. Hosted on OpenAIR [online]. Available from: https://doi.org/10.48526/rgu-wt-2807323

Thesis Type Thesis
Deposit Date Jun 23, 2025
Publicly Available Date Jun 23, 2025
DOI https://doi.org/10.48526/rgu-wt-2807323
Public URL https://rgu-repository.worktribe.com/output/2807323
Award Date Sep 29, 1993

Files




Downloadable Citations