«Previous Article|Table of Contents|Next Article»

[1] Xu Xianghua, Zhu Jie, Guo Qiang,. Speaker-independent speech recognitionbased on HMM state-restructuring method [J]. Journal of Southeast University (English Edition), 2004, 20 (4): 427-430. [doi:10.3969/j.issn.1003-7985.2004.04.007]
Copy

Speaker-independent speech recognitionbased on HMM state-restructuring method()

基于HMM状态结构调整的非特定人语音识别

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 20
Issue:: 2004 4

Page:: 427-430

Research Field:: Information and Communication Engineering

Publishing date:: 2004-12-30

Info

Title:: Speaker-independent speech recognitionbased on HMM state-restructuring method

: 基于HMM状态结构调整的非特定人语音识别

Author(s):: Xu Xianghua; Zhu Jie; Guo Qiang; Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200030, China

: 徐向华; 朱杰; 郭强; 上海交通大学电子工程系, 上海 200030

Keywords:: speech recognition; hidden Markov model; expectation maximization algorithm; HMM Tookit(HTK)

: 语音识别; HMM; EM算法; HTK

PACS:: TN912.34;TP391.42

DOI:: 10.3969/j.issn.1003-7985.2004.04.007

Abstract:: Based on confusions between hidden Markov model(HMM)states, a state-restructuring method is proposed. In the method, HMM states are restructured by sharing Gaussian components with their related states, and the re-estimation to the increased-parameters, i.e., the inter-state weights, is derived under the expectation maximization(EM)framework. Experiments are performed on speaker-independent, large vocabulary, continuous Mandarin speech recognition. Experimental results show that the state-restructured systems outperform the baseline, and achieve significant improvement on recognition accuracy compared with the conventional parameter-increasing method. Such comparative results confirm that the state-restructuring method is efficient.

: 利用HMM模型状态间的混淆度, 提出了一种新的状态结构调整算法, 使不同的状态可以共享相同的高斯混合函数, 并在EM算法的框架下推导出对状态结构调整后的增加参数, 即状态间权值的重估公式. 并对非特定人进行大词汇量汉语连续语音识别实验, 实验结果表明状态结构调整后的系统不仅优于基线系统, 还获得了比传统的参数增加方法更高的识别率, 由此证明了状态结构调整方法的有效性.

References:

[1] Young S, Jansen J, Odell J, et al. The HTK book [EB/OL].http: //htk.eng.cam.ac.uk/. 2003-10-03/2004-02-16.
[2] Luo X O, Jelinek F. Probabilistic classification of HMM states for large vocabulary continuous speech recognition [A]. In: Proc of ICASSP [C]. Phoenix, Arizona, 1999, 1: 353-356.
[3] Rabiner L, Juang B H. Fundamentals of speech recognition [M]. New Jersey: Prentice Hall, 1993. 339-342.
[4] Moon T K. The expectation-maximization algorithm [J]. IEEE Signal Processing Magazine, 1996, 13(1): 47-60.
[5] Chang E, Shi Y, Zhou J L, et al. Speech lab in a box: a Mandarin speech toolbox to jumpstart speech related research [A]. In: Proc of Eurospeech [C]. Aalborg, Denmark, 2001, 3: 2779-2782.
[6] Reichl W, Chou W. Robust decision tree state tying for continuous speech recognition [J]. IEEE Trans Speech and Audio Processing, 2000, 8(5): 555-566.

Memo

Memo:: Biographies: Xu Xianghua(1977—), female, graduate; Zhu Jie(corresponding author), male, doctor, professor, zhujie@sjtu.edu.cn.

Last Update: 2004-12-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics