|Table of Contents|

[1] Zhou Jian, Zhao Li, Liang Ruiyu, et al. Whisper intelligibility enhancementbased on noise robust feature and SVM [J]. Journal of Southeast University (English Edition), 2012, 28 (3): 261-265. [doi:10.3969/j.issn.1003-7985.2012.03.001]
Copy

Whisper intelligibility enhancementbased on noise robust feature and SVM()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
28
Issue:
2012 3
Page:
261-265
Research Field:
Information and Communication Engineering
Publishing date:
2012-09-30

Info

Title:
Whisper intelligibility enhancementbased on noise robust feature and SVM
Author(s):
Zhou Jian1 2 Zhao Li1 Liang Ruiyu1 Fang Xianyong2
1Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Anhui Unive
Keywords:
whispered speech intelligibility enhancement noise robust feature machine learning
PACS:
TN912.35
DOI:
10.3969/j.issn.1003-7985.2012.03.001
Abstract:
A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine(SVM)classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients(GFCCs)extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio(SNR)environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.

References:

[1] Tartter V C. What’s in a whisper? [J]. The Journal of the Acoustical Society of America, 1989, 86(5):1678-1683.
[2] Ito T, Takeda K, Takura F. Analysis and recognition of whispered speech [J]. Speech Communication, 2005, 45(2):139-152.
[3] McAulay R, Malpass M. Speech enhancement using a soft-decision noise suppression filter [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1980, 28(2):137-145.
[4] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator [J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1985, 33(2):443-445.
[5] Loizou P C, Kim G. Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(1):47-56.
[6] Cooke M, Ellis D P W. The auditory organization of speech and other sources in listeners and computational models [J]. Speech Communication, 2001, 35(3/4):141-177.
[7] Bregman A S. Auditory scene analysis: the perceptual organization of sound [M]. Cambridge: The MIT Press, 1994.
[8] Wang D L, Kjems U, Pedersen M S, et al. Speech intelligibility in background noise with ideal binary time-frequency masking [J]. The Journal of the Acoustical Society of America, 2009, 125(4): 2336-2347.
[9] Li N, Loizou P C. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction [J]. The Journal of the Acoustical Society of America, 2008, 123(3): 1673-1682.
[10] Varga A, Steeneken H. Assessment for automatic speech recognition: Ⅱ. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech Communication, 1993, 12(3): 247-251.
[11] Taal C, Hendriks R, Heusdens R, et al. An algorithm for intelligibility prediction of time-frequency weighted noisy speech [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(7):2125-2136.
[12] Hu Y, Loizou P C. A comparative intelligibility study of single-microphone noise reduction algorithms [J]. The Journal of the Acoustical Society of America, 2007, 122(3): 1777-1786.

Memo

Memo:
Biographies: Zhou Jian(1981—), male, graduate, lecturer; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61231002, 61273266, 51075068, 60872073, 60975017, 61003131), the Ph.D. Programs Foundation of the Ministry of Education of China(No.20110092130004), the Science Foundation for Young Talents in the Educational Committee of Anhui Province(No.2010SQRL018), the 211 Project of Anhui University(No.2009QN027B).
Citation: Zhou Jian, Zhao Li, Liang Ruiyu, et al. Whisper intelligibility enhancement based on noise robust feature and SVM[J].Journal of Southeast University(English Edition), 2012, 28(3):261-265.[doi:10.3969/j.issn.1003-7985.2012.03.001]
Last Update: 2012-09-20