Jianing Dai
A comparative study of models for automatic speech recognition.
Dai, Jianing
Authors
Contributors
Jon Tyler
Supervisor
Iain MacKenzie
Supervisor
Abstract
This thesis describes a study of the most popular techniques for speech modelling at present, the Dynamic Programming approach, the Hidden Markov Model (HMM), and the Neural Network, which are also evaluated by experiments. The reason why the HMM outperforms the other techniques is examined rigorously in the light of decision theory. The study firstly concludes that the success of the HMM approach is due to its probabilistic representation for the acoustic variability of speech signals, and the transient property of this representation that makes it possible to model the evolution of speech spectra in the course of time. However, the HMM has a limitation when applied to speech in that it assumes that the output observations of this model, which correspond to the spectral vectors of a spoken word, are state-dependent only, implying that these observations are independent of each other when generated by the same state. An effect of this assumption is that the time-ordering information of the spectral vectors within a state is disregarded. As a consequence, the loss of this information limits the performance of HMM-based recognition systems. To account for this time-ordering information, this thesis presents two alternative approaches, namely the Markov model (MM), and the HMM-MM hybrid. Both approaches make use of the Markov property, that the present observation of the Markov process depends on the immediate preceding observation, to model the time-ordering information of speech vectors. Both methods, along with the HMM, have been tested extensively on a task of isolated word recognition using a widely distributed English Alphabet database. The results suggest: (1) the time-ordering of the spectral vectors of a spoken word is important information which may be used to improve the performance of HMM recognition systems; (2) the Markov model attains the performance almost identical to, or better than the HMM (depending on the data used for experiment), and offers a substantial saving in computation time compared with the HMM; and (3) the HMM-MM hybrid has successfully addressed the problem of the temporal information modelling posed by the HMM and the improvement made by this approach is statistically significant. The software developed in this study is included as an appendix.
Citation
DAI, J. 1992. A comparative study of models for automatic speech recognition. Robert Gordon's Institute of Technology, PhD thesis. Hosted on OpenAIR [online]. Available from: https://doi.org/10.48526/rgu-wt-2807351
Thesis Type | Thesis |
---|---|
Deposit Date | Aug 11, 2025 |
Publicly Available Date | Aug 11, 2025 |
DOI | https://doi.org/10.48526/rgu-wt-2807351 |
Public URL | https://rgu-repository.worktribe.com/output/2807351 |
Award Date | Mar 31, 1992 |
Files
DAI 1992 A comparative study of models
(70 Mb)
PDF
Licence
https://creativecommons.org/licenses/by-nc/4.0/
Copyright Statement
© The Author.
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search