«Previous Article|Table of Contents|Next Article»

[1] Huang Hao, Zhu Jie,. Discriminative tone model training and optimal integrationfor Mandarin speech recognition [J]. Journal of Southeast University (English Edition), 2007, 23 (2): 174-178. [doi:10.3969/j.issn.1003-7985.2007.02.005]
Copy

Discriminative tone model training and optimal integrationfor Mandarin speech recognition()

汉语语音识别中区分性声调模型及最优集成方法

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 23
Issue:: 2007 2

Page:: 174-178

Research Field:: Information and Communication Engineering

Publishing date:: 2007-06-30

Info

Title:: Discriminative tone model training and optimal integrationfor Mandarin speech recognition

: 汉语语音识别中区分性声调模型及最优集成方法

Author(s):: Huang Hao, Zhu Jie; Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200240, China

: 黄浩, 朱杰; 上海交通大学电子工程系, 上海 200240

Keywords:: discriminative training; minimum phone error; tone modeling; Mandarin speech recognition

: 区分性训练; 最小音子错误; 声调模型; 汉语语音识别

PACS:: TN912

DOI:: 10.3969/j.issn.1003-7985.2007.02.005

Abstract:: Two discriminative methods for solving tone problems in Mandarin speech recognition are presented.First, discriminative training on the HMM(hidden Markov model)based tone models is proposed.Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented.Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models.The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores.Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model.Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.

: 提出了2种解决汉语语音识别中声调问题的方法:利用区分性方法对基于隐马尔可夫模型(HMM)的声调模型进行训练;提出将区分性训练的声调模型加入大词汇量连续语音识别系统的最优方法, 该方法根据最小音子错误的训练准则以及利用扩展Baum-Welch算法区分性训练与模型相关的概率权重, 对声学模型以及声调模型概率进行加权.实验结果表明区分性训练的声调模型能够显著地提高连续语音声调识别率以及大词汇量语音识别系统的识别率, 同时区分性的模型权重训练能够在区分性声调模型加入连续语音识别系统之后进一步提高系统的识别性能.

References:

[1] Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition [C]//Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing.Istanbul, Turkey, 2000:1523-1526.
[2] Cao Yang, Zhang Shuwu, Huang Taiyi, et al.Tone modeling for continuous Mandarin speech recognition [J].International Journal of Speech Technology, 2004, 7(2/3):115-128.
[3] Wong P F, Siu M H. Decision tree based tone modeling for Chinese speech recognition [C]//Proceedings of the 29th International Conference on Acoustics, Speech and Signal Processing.Montreal, Canada, 2004:905-908.
[4] Bahl L R, Brown P F, Souza P, et al.Maximum mutual information estimation of hidden Markov model parameters for speech recognition[C]//Proceedings of the 11th International Conference on Acoustics, Speech and Signal Processing.Tokyo, Japan, 1986:49-52.
[5] Juang B H, Chou W, Lee C H.Minimum classification error rate methods for speech recognition [J].IEEE Transactions on Speech Audio Processing, 1997, 5(2):266-277.
[6] Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training [C]//Proceedings of the 27th International Conference on Acoustics, Speech and Signal Processing.Florida, USA, 2002:105-108.
[7] Povey D.Discriminative training for large vocabulary speech recognition [D].Peterhouse:Cambridge University, 2004.
[8] Liu Peng, Wang Zuoying.Stream weight training based on MCE for audio-visual LVSCR [J].Tsinghua Science and Technology, 2005, 10(2):141-144.
[9] Kuo J W, Chen B.Minimum word error based discriminative training of language models[C]//Proceedings of the 9th European Conference on Speech Communication and Technology.Lisbon, Portugal, 2005:1277-1280.
[10] Gopalakrishnan P S, Kanevsky D, Nadas A, et al.A generalization of the Baum algorithm to rational objective functions [C]//Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing.Glasgow, Scotland, 1989:631-634.
[11] Chang Eric, Shi Yu, Zhou Jianlai, et al.Speech lab in a box:a Mandarin speech toolbox to jumpstart speech related research [C]//Proceedings of the 7th European Conference on Speech Communication and Technology.Aalborg, Denmark, 2001:2779-2782.
[12] Wang Hsin Min, Ho Tai Hsuan, Yang Rung Chiung, et al.Complete language with very large vocabulary but limited training data [J].IEEE Transactions on Speech and Audio Processing, 1997, 5(2):196-201.

Memo

Memo:: Biographies: Huang Hao(1976—), male, graduate;Zhu Jie(corresponding author), male, doctor, professor, zhujie@sjtu.edu.cn.

Last Update: 2007-06-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics