|Table of Contents|

[1] Xu Xianghua, Zhu Jie, Guo Qiang,. Speaker-independent speech recognitionbased on HMM state-restructuring method [J]. Journal of Southeast University (English Edition), 2004, 20 (4): 427-430. [doi:10.3969/j.issn.1003-7985.2004.04.007]
Copy

Speaker-independent speech recognitionbased on HMM state-restructuring method()
基于HMM状态结构调整的非特定人语音识别
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
20
Issue:
2004 4
Page:
427-430
Research Field:
Information and Communication Engineering
Publishing date:
2004-12-30

Info

Title:
Speaker-independent speech recognitionbased on HMM state-restructuring method
基于HMM状态结构调整的非特定人语音识别
Author(s):
Xu Xianghua Zhu Jie Guo Qiang
Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200030, China
徐向华 朱杰 郭强
上海交通大学电子工程系, 上海 200030
Keywords:
speech recognition hidden Markov model expectation maximization algorithm HMM Tookit(HTK)
语音识别 HMM EM算法 HTK
PACS:
TN912.34;TP391.42
DOI:
10.3969/j.issn.1003-7985.2004.04.007
Abstract:
Based on confusions between hidden Markov model(HMM)states, a state-restructuring method is proposed. In the method, HMM states are restructured by sharing Gaussian components with their related states, and the re-estimation to the increased-parameters, i.e., the inter-state weights, is derived under the expectation maximization(EM)framework. Experiments are performed on speaker-independent, large vocabulary, continuous Mandarin speech recognition. Experimental results show that the state-restructured systems outperform the baseline, and achieve significant improvement on recognition accuracy compared with the conventional parameter-increasing method. Such comparative results confirm that the state-restructuring method is efficient.
利用HMM模型状态间的混淆度, 提出了一种新的状态结构调整算法, 使不同的状态可以共享相同的高斯混合函数, 并在EM算法的框架下推导出对状态结构调整后的增加参数, 即状态间权值的重估公式. 并对非特定人进行大词汇量汉语连续语音识别实验, 实验结果表明状态结构调整后的系统不仅优于基线系统, 还获得了比传统的参数增加方法更高的识别率, 由此证明了状态结构调整方法的有效性.

References:

[1] Young S, Jansen J, Odell J, et al. The HTK book [EB/OL].http: //htk.eng.cam.ac.uk/. 2003-10-03/2004-02-16.
[2] Luo X O, Jelinek F. Probabilistic classification of HMM states for large vocabulary continuous speech recognition [A]. In: Proc of ICASSP [C]. Phoenix, Arizona, 1999, 1: 353-356.
[3] Rabiner L, Juang B H. Fundamentals of speech recognition [M]. New Jersey: Prentice Hall, 1993. 339-342.
[4] Moon T K. The expectation-maximization algorithm [J]. IEEE Signal Processing Magazine, 1996, 13(1): 47-60.
[5] Chang E, Shi Y, Zhou J L, et al. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research [A]. In: Proc of Eurospeech [C]. Aalborg, Denmark, 2001, 3: 2779-2782.
[6] Reichl W, Chou W. Robust decision tree state tying for continuous speech recognition [J]. IEEE Trans Speech and Audio Processing, 2000, 8(5): 555-566.

Memo

Memo:
Biographies: Xu Xianghua(1977—), female, graduate; Zhu Jie(corresponding author), male, doctor, professor, zhujie@sjtu.edu.cn.
Last Update: 2004-12-20