Skip to main content

Research Repository

Advanced Search

A comparative study of models for automatic speech recognition.

Dai, Jianing

Authors

Jianing Dai



Contributors

Jon Tyler
Supervisor

Iain MacKenzie
Supervisor

Abstract

This thesis describes a study of the most popular techniques for speech modelling at present, the Dynamic Programming approach, the Hidden Markov Model (HMM), and the Neural Network, which are also evaluated by experiments. The reason why the HMM outperforms the other techniques is examined rigorously in the light of decision theory. The study firstly concludes that the success of the HMM approach is due to its probabilistic representation for the acoustic variability of speech signals, and the transient property of this representation that makes it possible to model the evolution of speech spectra in the course of time. However, the HMM has a limitation when applied to speech in that it assumes that the output observations of this model, which correspond to the spectral vectors of a spoken word, are state-dependent only, implying that these observations are independent of each other when generated by the same state. An effect of this assumption is that the time-ordering information of the spectral vectors within a state is disregarded. As a consequence, the loss of this information limits the performance of HMM-based recognition systems. To account for this time-ordering information, this thesis presents two alternative approaches, namely the Markov model (MM), and the HMM-MM hybrid. Both approaches make use of the Markov property, that the present observation of the Markov process depends on the immediate preceding observation, to model the time-ordering information of speech vectors. Both methods, along with the HMM, have been tested extensively on a task of isolated word recognition using a widely distributed English Alphabet database. The results suggest: (1) the time-ordering of the spectral vectors of a spoken word is important information which may be used to improve the performance of HMM recognition systems; (2) the Markov model attains the performance almost identical to, or better than the HMM (depending on the data used for experiment), and offers a substantial saving in computation time compared with the HMM; and (3) the HMM-MM hybrid has successfully addressed the problem of the temporal information modelling posed by the HMM and the improvement made by this approach is statistically significant. The software developed in this study is included as an appendix.

Citation

DAI, J. 1992. A comparative study of models for automatic speech recognition. Robert Gordon's Institute of Technology, PhD thesis. Hosted on OpenAIR [online]. Available from: https://doi.org/10.48526/rgu-wt-2807351

Thesis Type Thesis
Deposit Date Aug 11, 2025
Publicly Available Date Aug 11, 2025
DOI https://doi.org/10.48526/rgu-wt-2807351
Public URL https://rgu-repository.worktribe.com/output/2807351
Award Date Mar 31, 1992

Files




Downloadable Citations