|Table of Contents|

[1] Huang Hao, Zhu Jie,. Discriminative tone model training and optimal integrationfor Mandarin speech recognition [J]. Journal of Southeast University (English Edition), 2007, 23 (2): 174-178. [doi:10.3969/j.issn.1003-7985.2007.02.005]

Discriminative tone model training and optimal integrationfor Mandarin speech recognition()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2007 2
Research Field:
Information and Communication Engineering
Publishing date:


Discriminative tone model training and optimal integrationfor Mandarin speech recognition
Huang Hao Zhu Jie
Department of Electronic Engineering, Shanghai Jiaotong University, Shanghai 200240, China
discriminative training minimum phone error tone modeling Mandarin speech recognition
Two discriminative methods for solving tone problems in Mandarin speech recognition are presented.First, discriminative training on the HMM(hidden Markov model)based tone models is proposed.Then an integration technique of tone models into a large vocabulary continuous speech recognition system is presented.Discriminative model weight training based on minimum phone error criteria is adopted aiming at optimal integration of the tone models.The extended Baum Welch algorithm is applied to find the model-dependent weights to scale the acoustic scores and tone scores.Experimental results show that tone recognition rates and continuous speech recognition accuracy can be improved by the discriminatively trained tone model.Performance of a large vocabulary continuous Mandarin speech recognition system can be further enhanced by the discriminatively trained weight combinations due to a better interpolation of the given models.


[1] Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition [C]//Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing.Istanbul, Turkey, 2000:1523-1526.
[2] Cao Yang, Zhang Shuwu, Huang Taiyi, et al.Tone modeling for continuous Mandarin speech recognition [J].International Journal of Speech Technology, 2004, 7(2/3):115-128.
[3] Wong P F, Siu M H. Decision tree based tone modeling for Chinese speech recognition [C]//Proceedings of the 29th International Conference on Acoustics, Speech and Signal Processing.Montreal, Canada, 2004:905-908.
[4] Bahl L R, Brown P F, Souza P, et al.Maximum mutual information estimation of hidden Markov model parameters for speech recognition[C]//Proceedings of the 11th International Conference on Acoustics, Speech and Signal Processing.Tokyo, Japan, 1986:49-52.
[5] Juang B H, Chou W, Lee C H.Minimum classification error rate methods for speech recognition [J].IEEE Transactions on Speech Audio Processing, 1997, 5(2):266-277.
[6] Povey D, Woodland P C. Minimum phone error and I-smoothing for improved discriminative training [C]//Proceedings of the 27th International Conference on Acoustics, Speech and Signal Processing.Florida, USA, 2002:105-108.
[7] Povey D.Discriminative training for large vocabulary speech recognition [D].Peterhouse:Cambridge University, 2004.
[8] Liu Peng, Wang Zuoying.Stream weight training based on MCE for audio-visual LVSCR [J].Tsinghua Science and Technology, 2005, 10(2):141-144.
[9] Kuo J W, Chen B.Minimum word error based discriminative training of language models[C]//Proceedings of the 9th European Conference on Speech Communication and Technology.Lisbon, Portugal, 2005:1277-1280.
[10] Gopalakrishnan P S, Kanevsky D, Nadas A, et al.A generalization of the Baum algorithm to rational objective functions [C]//Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing.Glasgow, Scotland, 1989:631-634.
[11] Chang Eric, Shi Yu, Zhou Jianlai, et al.Speech lab in a box:a Mandarin speech toolbox to jumpstart speech related research [C]//Proceedings of the 7th European Conference on Speech Communication and Technology.Aalborg, Denmark, 2001:2779-2782.
[12] Wang Hsin Min, Ho Tai Hsuan, Yang Rung Chiung, et al.Complete language with very large vocabulary but limited training data [J].IEEE Transactions on Speech and Audio Processing, 1997, 5(2):196-201.


Biographies: Huang Hao(1976—), male, graduate;Zhu Jie(corresponding author), male, doctor, professor, zhujie@sjtu.edu.cn.
Last Update: 2007-06-20