|Table of Contents|

[1] Song Peng, Zhao Li, Zou Cairong, Emotional speaker recognition based on prosody transformation [J]. Journal of Southeast University (English Edition), 2011, 27 (4): 357-360. [doi:10.3969/j.issn.1003-7985.2011.04.002]
Copy

Emotional speaker recognition based on prosody transformation()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
27
Issue:
2011 4
Page:
357-360
Research Field:
Information and Communication Engineering
Publishing date:
2011-12-31

Info

Title:
Emotional speaker recognition based on prosody transformation
Author(s):
Song Peng1 Zhao Li1 Zou Cairong1 2
1 Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2 Foshan University, Foshan 528000, China
Keywords:
emotion recognition speaker recognition F0 transformation duration modification
PACS:
TN912.3
DOI:
10.3969/j.issn.1003-7985.2011.04.002
Abstract:
A novel emotional speaker recognition system(ESRS)is proposed to compensate for emotion variability. First, the emotion recognition is adopted as a pre-processing part to classify the neutral and emotional speech. Then, the recognized emotion speech is adjusted by prosody modification. Different methods including Gaussian normalization, the Gaussian mixture model(GMM)and support vector regression(SVR)are adopted to define the mapping rules of F0s between emotional and neutral speech, and the average linear ratio is used for the duration modification. Finally, the modified emotional speech is employed for the speaker recognition. The experimental results show that the proposed ESRS can significantly improve the performance of emotional speaker recognition, and the identification rate(IR)is higher than that of the traditional recognition system. The emotional speech with F0 and duration modifications is closer to the neutral one.

References:

[1] Scherer K L, Johnstone T, Klasmeyer G, et al. Can automatic speaker verification be improved by training the algorithms on emotional speech? [C]//International Conference on Spoken Language Processing. Beijing, China, 2000: 807-810.
[2] Shan Z Y, Yang Y C, Ye R Z. Natural-emotion GMM transformation algorithm for emotional speaker recognition [C]//8th Annual Conference of the International Speech Communication Association. Antwerp, Belgium, 2007: 782-785.
[3] Tao J H, Kang Y G, Li A J. Prosody conversion from neutral speech to emotional speech [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1145-1154.
[4] Wu Z H, Li D D, Yang Y C. Rules based feature modification for affective speaker recognition [C]//International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France, 2006: 661-664.
[5] Reynolds D A, Quatieri T F, Dunn R B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10(1): 19-41.
[6] Campbell W M, Sturim D E, Reynolds D A, et al. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation [C]//International Conference on Acoustics, Speech, and Signal Processing. Toulouse, France, 2006: 97-100.
[7] Hu H, Xu M M, Wu W. GMM supervector based SVM with spectral features for speech emotion recognition [C]//International Conference on Acoustics, Speech, and Signal Processing. Honolulu, Hawaii, USA, 2007: 413-416.
[8] Sinha R, Ghai S. On the use of pitch normalization for improving children’s speech recognition [C]//10th Annual Conference of the International Speech Communication Association. Brighton, UK, 2009: 568-571.
[9] Wu Z Z, Kinnunen T, Chng E S, et al. Text-independent F0 transformation with non-parallel data for voice conversion [C]//11th Annual Conference of the International Speech Communication Association. Makuhari, Japan, 2010: 1732-1735.
[10] Basak D, Pal S, Patranabis D C. Support vector regression [J]. Neural Information Processing—Letters and Reviews, 2007, 11(10): 203-224.

Memo

Memo:
Biographies: Song Peng(1983—), male, graduate; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.60872073, 60975017, 51075068), the Natural Science Foundation of Guangdong Province(No.10252800001000001), the Natural Science Foundation of Jiangsu Province(No.BK2010546).
Citation: Song Peng, Zhao Li, Zou Cairong. Emotional speaker recognition based on prosody transformation[J].Journal of Southeast University(English Edition), 2011, 27(4):357-360.[doi:10.3969/j.issn.1003-7985.2011.04.002]
Last Update: 2011-12-20