«Previous Article|Table of Contents|Next Article»

[1] Huang Hao, Zhu Jie,. Discriminative tone model training and optimal integrationfor Mandarin speech recognition [J]. Journal of Southeast University (English Edition), 2007, 23 (2): 174-178. [doi:10.3969/j.issn.1003-7985.2007.02.005]
Copy

Discriminative tone model training and optimal integrationfor Mandarin speech recognition()

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 23
Issue:: 2007 2

Page:: 174-178

Research Field:: Information and Communication Engineering

Publishing date:: 2007-06-30

Info

Title:: Discriminative tone model training and optimal integrationfor Mandarin speech recognition

Author(s):: Huang Hao; Zhu Jie; Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200240, China

Keywords:: discriminative training; minimum phone error; tone modeling; Mandarin speech recognition

PACS:: TN912

DOI:: 10.3969/j.issn.1003-7985.2007.02.005

Abstract:: Two discriminative methods for solving tone problems in Mandarin speech recognition are presented.First, discriminative training on the HMM(hidden Markov model)based tone models is proposed.Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented.Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models.The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores.Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model.Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.

References:

[1] Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition [C]//Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing.Istanbul, Turkey, 2000:1523-1526.
[2] Cao Yang, Zhang Shuwu, Huang Taiyi, et al.Tone modeling for continuous Mandarin speech recognition [J].International Journal of Speech Technology, 2004, 7(2/3):115-128.
[3] Wong P F, Siu M H. Decision tree based tone modeling for Chinese speech recognition [C]//Proceedings of the 29th International Conference on Acoustics, Speech and Signal Processing.Montreal, Canada, 2004:905-908.
[4] Bahl L R, Brown P F, Souza P, et al.Maximum mutual information estimation of hidden Markov model parameters for speech recognition[C]//Proceedings of the 11th International Conference on Acoustics, Speech and Signal Processing.Tokyo, Japan, 1986:49-52.
[5] Juang B H, Chou W, Lee C H.Minimum classification error rate methods for speech recognition [J].IEEE Transactions on Speech Audio Processing, 1997, 5(2):266-277.
[6] Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training [C]//Proceedings of the 27th International Conference on Acoustics, Speech and Signal Processing.Florida, USA, 2002:105-108.
[7] Povey D.Discriminative training for large vocabulary speech recognition [D].Peterhouse:Cambridge University, 2004.
[8] Liu Peng, Wang Zuoying.Stream weight training based on MCE for audio-visual LVSCR [J].Tsinghua Science and Technology, 2005, 10(2):141-144.
[9] Kuo J W, Chen B.Minimum word error based discriminative training of language models[C]//Proceedings of the 9th European Conference on Speech Communication and Technology.Lisbon, Portugal, 2005:1277-1280.
[10] Gopalakrishnan P S, Kanevsky D, Nadas A, et al.A generalization of the Baum algorithm to rational objective functions [C]//Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing.Glasgow, Scotland, 1989:631-634.
[11] Chang Eric, Shi Yu, Zhou Jianlai, et al.Speech lab in a box:a Mandarin speech toolbox to jumpstart speech related research [C]//Proceedings of the 7th European Conference on Speech Communication and Technology.Aalborg, Denmark, 2001:2779-2782.
[12] Wang Hsin Min, Ho Tai Hsuan, Yang Rung Chiung, et al.Complete language with very large vocabulary but limited training data [J].IEEE Transactions on Speech and Audio Processing, 1997, 5(2):196-201.

Memo

Memo:: Biographies: Huang Hao(1976—), male, graduate;Zhu Jie(corresponding author), male, doctor, professor, zhujie@sjtu.edu.cn.

Last Update: 2007-06-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics